KCD Porto: Choose Your Own Adventure - Cloud Naive Observability Pitfalls
eschabell
79 views
63 slides
Sep 08, 2024
Slide 1 of 63
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
About This Presentation
Are you looking at your organization's efforts to enter or expand into the cloud native landscape and feeling a bit daunted by the vast expanse of information surrounding cloud native observability? When you're moving so fast with agile practices across your DevOps, SRE's, and platform e...
Are you looking at your organization's efforts to enter or expand into the cloud native landscape and feeling a bit daunted by the vast expanse of information surrounding cloud native observability? When you're moving so fast with agile practices across your DevOps, SRE's, and platform engineering teams, it's no wonder this can seem a bit confusing. Unfortunately, the choices being made have a great impact on both your business, your budgets, and the ultimate success of your cloud native initiatives. That hasty decision up front leads to big headaches very quickly down the road. In this talk, I'll introduce the problem facing everyone with cloud native observability followed by 3 common mistakes that I'm seeing organizations make and how you can avoid them!
Key takeaways - This session is never the same twice as you the audience / attendees choose from a list of cloud native observability pitfalls that DevOps have to contend with in their daily cloud native lives! Super engaging and fun to tour the challenges that interest you most!
Size: 39.51 MB
Language: en
Added: Sep 08, 2024
Slides: 63 pages
Slide Content
chronosphere.io Choose Your Own Adventure Eric D. Schabell Director Evangelism @ericschabell{@fosstodon.org} KCD Porto, 27-28 Sep 2024 Cloud Native Observability Pitfalls
Cloud Native Observability
Cloud Native
Data volume Experiment: Hello World app on 4 node Kubernetes cluster with Tracing, End User Metrics (EUM), Logs, Metrics (containers / nodes) 30 days == +450 GB
Picking Your Pitfalls Ignoring existing landscape Focusing on The Pillars Sneaky sprawling mess Controlling costs T he protocol jungle Underestimating cardinality (Click on a pitfall to jump to that section)
Ignoring existing landscape
If they can’t see me… they can’t hurt me ...
Prometheus for metrics, alerting, queries
Prometheus auto discovery
Manual instrumentation (java client lib)
Short link: bit.ly/prom-workshop
Applications (Java) OTel Auto Instrumentation (libraries) OTel API OTel SDK OTel Collector OTLP OTLP OTLP OpenTelemetry (Auto) instrumentation
Ignoring existing landscape Focusing on The Pillars Sneaky sprawling mess Controlling costs T he protocol jungle Underestimating cardinality (Click on a pitfall to jump to that section, or jump to end ) Picking Your Next Pitfall
Phase 1 Know something is happening as fast as possible…
Phase 2 Triage with specific information…
Phase 3 Understand to ensure never happens again…
Ignoring existing landscape Focusing on The Pillars Sneaky sprawling mess Controlling costs T he protocol jungle Underestimating cardinality (Click on a pitfall to jump to that section, or jump to end ) Picking Your Next Pitfall
3. Sneaky sprawling mess
Over 66% of organizations use more than 10 different observability tools – ESG report over exploding data volumes
Know Triage Understand
Ignoring existing landscape Focusing on The Pillars Sneaky sprawling mess Controlling costs T he protocol jungle Underestimating cardinality (Click on a pitfall to jump to that section, or jump to end ) Picking Your Next Pitfall
4. Controlling costs
“It’s remarkable how common this situation is, where an organization is paying more for their observability data, than they do for their production infrastructure.” ?
O11y data storage costs are broken. Keeping everything model?
Know the cost of observability metrics data?
Control costs and improve productivity Observability Platform DATA COLLECTION CONTROL PLANE STORE LENS Telemetry Pipeline Reduce Enrich Secure TRANSFORM AND ROUTE DATA IN YOUR ENVIRONMENT STORE DATA IN THIRD PARTY LOG & SIEM SOLUTIONS
Chronosphere named a Leader in the 2024 Gartner® Magic Quadrant™ for Observability Platforms Gartner, Magic Quadrant for Observability Platforms: By Gregg Siegfried, Padraig Byrne, Mrudula Bangera, Matt Crossley (12 August 2024) GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and MAGIC QUADRANT is a registered trademark of Gartner, Inc. and/or its affiliates and are used herein with permission. All rights reserved. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose https://chronosphere.io/2024-gartner-magic-quadrant
Ignoring existing landscape Focusing on The Pillars Sneaky sprawling mess Controlling costs T he protocol jungle Underestimating cardinality (Click on a pitfall to jump to that section, or jump to end ) Picking Your Next Pitfall
5. The protocol jungle
Without open standards, you’ll not find a way back…
Ignoring existing landscape Focusing on The Pillars Sneaky sprawling mess Controlling costs T he protocol jungle Underestimating cardinality (Click on a pitfall to jump to that section, or jump to end ) Picking Your Next Pitfall
6. Underestimating cardinality
The struggle is real “I don't yet collect spans/traces because I can hardly get our devs to care about basic metrics, let alone traces.” “This is a large enterprise with approx. 1000 developers. Cultivating a culture of engineering that cares about availability is a challenge that we need to solve alongside any technical implementations.”
10 hours on average, per week, trying to triage and understand incidents - a quarter of a 40 hour work week
33% said those issues disrupted their personal life 39% admitting they are frequently stressed out
Cloud Native Observability at Scale
Control costs and improve productivity Observability Platform DATA COLLECTION CONTROL PLANE STORE LENS Telemetry Pipeline Reduce Enrich Secure TRANSFORM AND ROUTE DATA IN YOUR ENVIRONMENT STORE DATA IN THIRD PARTY LOG & SIEM SOLUTIONS
Ignoring existing landscape Focusing on The Pillars Sneaky sprawling mess Controlling costs T he protocol jungle Underestimating cardinality (Click on a pitfall to jump to that section, or jump to end ) Picking Your Next Pitfall
What should be the #1 item on cloud wishlist ? What should be #1 item on your cloud native observability wishlist?
chronosphere.io Questions? Eric D. Schabell Director Evangelism @ericschabell{@fosstodon.org}