Observability in Modern Applications.pptx

AneeshKumar54 79 views 18 slides May 24, 2024
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

Observability in Modern Applications


Slide Content

Observability in Modern Applications

What We’ll Discuss Today Introduction to Observability Understanding Logs, Metrics, Traces, and Alerts Key Challenges in Observability Proposed Solutions and Best Practices 2

What is observability?

Monitoring Vs Observability Monitoring is the process of collecting data and generating reports on different metrics that define system health. Observability is a more investigative approach. It looks closely at distributed system component interactions and data collected by monitoring to find the root cause of issues. It includes activities like trace path analysis, a process that follows the path of a request through the system to identify integration failures. Monitoring collects data on individual components, and observability looks at the distributed system as a whole. 4

Logs, Metrics, Traces and Alerts Fundamental components of observability

Logs Logs capture discrete events within components, providing detailed activity records over time. While rich in information, log data is often larger and can cause processing challenges, particularly with verbose logging. Filtered log data helps understand the context and details of potential problems identified through metric telemetry. 6

Metrics Metrics serve to provide insights into the health and operations of components or systems, offering point-in-time measurements of specific sources. Despite their small size, metrics enable efficient collection even in large-scale systems, facilitating effective monitoring. Pre-aggregation of metrics within components reduces computational overhead and storage requirements, particularly useful for processing numerous metric time series. The efficiency of metric processing and storage makes them ideal for automated alerting, serving as a reliable source of health data for all system components. 7

Traces Logging offers a snapshot of discrete events, while tracing provides a broader, continuous view of an application's flow. Tracing aims at understanding program flow and data progression, often tracking a user's journey through an entire application stack. Tracing focuses on optimization rather than reactive troubleshooting, helping developers identify and address performance bottlenecks. When troubleshooting, tracing reveals detailed information such as function details, duration, parameters, and the depth of user interaction within functions. 8

Alerts Alerting systems enable teams to detect issues or anomalies proactively by setting alerts based on predefined thresholds or conditions. This real-time notification allows for prompt investigation and resolution Alerts can be based on static thresholds or dynamic patterns in metric data, including machine learning for anomaly detection. Analyzing alerting data and incident response metrics allows teams to refine configurations, optimize thresholds, and enhance system resilience, driving continuous improvement in monitoring effectiveness and operational efficiency. 9

Key Challenges in Observability A look at infra as a whole

Key Challenges Lack of meaningful Metrics Lack of efficient Monitoring Dashboards No traces. Good to use. helps to optimize performance Alerts: No proper Runbooks for each alerts Too many micro services 11

Key Challenges Lack of proper disaster recovery plans Lack of proper release management Lack of Performance testing Normalize the stack Remove redundant software and tools Periodic review of existing software and tools for improvement 12

Dedicated Platform team? Foundation Builders

Dedicated Platform team Specialized Expertise : A platform team consists of individuals with specialized skills and knowledge in building, maintaining, and optimizing the underlying infrastructure, frameworks, and tools used by other teams within the organization. Their expertise ensures that the platform is robust, scalable, and efficient, facilitating the development and deployment of applications. Tooling and Automation : A platform team develops and maintains tools and automation workflows to streamline development, deployment, and operations processes. They create reusable components, templates, and scripts that enable teams to build and deploy applications more efficiently and consistently. 14

Dedicated Platform team Specialized Expertise : A platform team consists of individuals with specialized skills and knowledge in building, maintaining, and optimizing the underlying infrastructure, frameworks, and tools used by other teams within the organization. Their expertise ensures that the platform is robust, scalable, and efficient, facilitating the development and deployment of applications. Tooling and Automation : A platform team develops and maintains tools and automation workflows to streamline development, deployment, and operations processes. They create reusable components, templates, and scripts that enable teams to build and deploy applications more efficiently and consistently. 15

Dedicated Platform team Security and Compliance : Ensuring the security and compliance of the platform is a critical responsibility of the platform team. They implement security best practices, monitor for vulnerabilities, and enforce compliance standards to protect data and mitigate risks. Performance Optimization: The platform team continuously monitors and optimizes the performance of the infrastructure and applications running on the platform. They identify bottlenecks, fine-tune configurations, and implement performance improvements to enhance the reliability and efficiency of the platfor m. 16

Dedicated Platform team Support and Troubleshooting: Inevitably, issues and challenges arise with the platform and its underlying infrastructure. The platform team provides support and troubleshooting assistance to other teams, helping them diagnose and resolve problems quickly to minimize downtime and disruptions . Innovation and Evolution: A dedicated platform team is responsible for driving innovation and evolution of the platform. They research and evaluate new technologies, identify opportunities for improvement, and lead initiatives to modernize and enhance the platform to meet changing business needs and technological advancements. 17

Thank you Aneesh Kumar @ aneeshep | [email protected]