Choose Your Own Adventure - Cloud Native Observability Pitfalls

eschabell 48 views 62 slides May 16, 2024
Slide 1
Slide 1 of 62
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62

About This Presentation

Are you looking at your organization's efforts to enter or expand into the cloud native landscape and feeling a bit daunted by the vast expanse of information surrounding cloud native observability? When you're moving so fast with agile practices across your DevOps, SRE's, and platform e...


Slide Content

chronosphere.io Choose Your Own Adventure Eric D. Schabell Director Evangelism @ericschabell{@fosstodon.org} Cloud Native Observability Pitfalls

Cloud Native Observability

Cloud Native

Data volume Experiment: Hello World app on 4 node Kubernetes cluster with Tracing, End User Metrics (EUM), Logs, Metrics (containers / nodes) 30 days == +450 GB

Retention Retention Retention Retention Retention Retention Retention

Cloud Native at Scale

Observability…

Cloud Native Observability at Scale

O11y at Scale (need)

Picking Your Pitfalls Ignoring existing landscape Focusing on The Pillars Sneaky sprawling mess Controlling costs T he protocol jungle Underestimating cardinality (Click on a pitfall to jump to that section)

Ignoring existing landscape

If they can’t see me… they can’t hurt me ...

Prometheus for metrics, alerting, queries

Prometheus auto discovery

Manual instrumentation (java client lib)

Short link: bit.ly/prom-workshop

Applications (Java) OTel Auto Instrumentation (libraries) OTel API OTel SDK OTel Collector OTLP OTLP OTLP OpenTelemetry (Auto) instrumentation

Host Observability Backend (Prometheus, Jaeger, Fluent Bit, etc.) , Applications OTel Auto Instrumentation OTel API OTel SDK OTel Collector Agent OTLP OTLP OTLP OTLP OTLP OpenTelemetry Collector (agent)

Host Host Host Observability Backend (Prometheus, Jaeger, Fluent Bit, etc.) , Applications OTel Auto Instrumentation OTel API OTel SDK OTel Collector Agent OTLP OTLP OTLP OTLP Collector (gateway) OTel Collector Gateway

Short link: bit.ly/opentelemetry-workshop

Ignoring existing landscape Focusing on The Pillars Sneaky sprawling mess Controlling costs T he protocol jungle Underestimating cardinality (Click on a pitfall to jump to that section, or jump to end ) Picking Your Next Pitfall

2. Focusing on The Pillars

Pillars Phases

Developer Technology Bottom up

Pillar problems…

Car is on fire…

Better outcomes… Faster remediation… Easier detection… Happier customers…

Phase 1 Know something is happening as fast as possible…

Phase 2 Triage with specific information…

Phase 3 Understand to ensure never happens again…

Ignoring existing landscape Focusing on The Pillars Sneaky sprawling mess Controlling costs T he protocol jungle Underestimating cardinality (Click on a pitfall to jump to that section, or jump to end ) Picking Your Next Pitfall

3. Sneaky sprawling mess

Over 66% of organizations use more than 10 different observability tools – ESG report over exploding data volumes

Know Triage Understand

Ignoring existing landscape Focusing on The Pillars Sneaky sprawling mess Controlling costs T he protocol jungle Underestimating cardinality (Click on a pitfall to jump to that section, or jump to end ) Picking Your Next Pitfall

4. Controlling costs

“It’s remarkable how common this situation is, where an organization is paying more for their observability data, than they do for their production infrastructure.” ?

O11y data storage costs are broken. Keeping everything model?

Know the cost of observability metrics data?

DATA COLLECTION CONTROL PLANE PURPOSE-BUILT DATA STORES PER TELEMETRY TYPE CHRONOSPHERE LENS Align cost to value Single Tenanted Architecture w/ 99.99% Reliability Turns raw data into generated insights for each user Customer Environment Chronosphere SaaS Platform METRICS | LOGS | TRACES | EVENTS Ingest all your data from any source

Ignoring existing landscape Focusing on The Pillars Sneaky sprawling mess Controlling costs T he protocol jungle Underestimating cardinality (Click on a pitfall to jump to that section, or jump to end ) Picking Your Next Pitfall

5. The protocol jungle

Without open standards, you’ll not find a way back…

Host Observability Backend (Prometheus, Jaeger, Fluent Bit, etc.) , Applications OTel Auto Instrumentation OTel API OTel SDK OTel Collector Agent OTLP OTLP OTLP OTLP OTLP OpenTelemetry Collector (agent)

Prometheus for metrics, alerting, queries

Ignoring existing landscape Focusing on The Pillars Sneaky sprawling mess Controlling costs T he protocol jungle Underestimating cardinality (Click on a pitfall to jump to that section, or jump to end ) Picking Your Next Pitfall

6. Underestimating cardinality

The struggle is real “I don't yet collect spans/traces because I can hardly get our devs to care about basic metrics, let alone traces.” “This is a large enterprise with approx. 1000 developers. Cultivating a culture of engineering that cares about availability is a challenge that we need to solve alongside any technical implementations.”

10 hours on average, per week, trying to triage and understand incidents - a quarter of a 40 hour work week

33% said those issues disrupted their personal life 39% admitting they are frequently stressed out

Cloud Native Observability at Scale

DATA COLLECTION CONTROL PLANE PURPOSE-BUILT DATA STORES PER TELEMETRY TYPE CHRONOSPHERE LENS Align cost to value Single Tenanted Architecture w/ 99.99% Reliability Turns raw data into generated insights for each user Customer Environment Chronosphere SaaS Platform METRICS | LOGS | TRACES | EVENTS Ingest all your data from any source

Ignoring existing landscape Focusing on The Pillars Sneaky sprawling mess Controlling costs T he protocol jungle Underestimating cardinality (Click on a pitfall to jump to that section, or jump to end ) Picking Your Next Pitfall

What should be the #1 item on cloud wishlist ? What should be #1 item on your cloud native observability wishlist?

chronosphere.io Questions? Eric D. Schabell Director Evangelism @ericschabell{@fosstodon.org}