Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf

PaigeBernier 171 views 46 slides Jun 05, 2024
Slide 1
Slide 1 of 46
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46

About This Presentation

Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.

While the dev and ops silo continues ...


Slide Content

@paigerduty
Observability
Concepts
EVERY
Developer
Should Know

@paigerduty 2
SHOULD

@paigerduty

@paigerduty

@paigerduty

@paigerduty
O11y
Knowledge
Gaps
6

@paigerduty
O11y
Knowledge
Gaps
7
Tool Sprawl

@paigerduty
O11y
Knowledge
Gaps
8
Tool Sprawl
Hidden
Processing

@paigerduty
O11y
Knowledge
Gaps
9
Tool Sprawl
No / Shallow
Onboarding
Hidden
Processing

@paigerduty
Auto-magic
O11y
Knowledge
Gaps
10
Tool Sprawl
Hidden
Processing
No / Shallow
Onboarding

@paigerduty 11
You Build It

You Run It

@paigerduty 12

@paigerduty
•Types of
Telemetry

•Your Telemetry
Journey

•Interpreting
Visualizations
13
Key
Concepts

@paigerduty
Understand
Telemetry
Types

@paigerduty 15
Scenario
Budgets are tight and Eng was asked to
significantly reduce o11y spend ASAP.

Ops notified your team that the Ashford Service is
the top emitter of telemetry and tasked you with
taking care of it. Where do you start? What’s safe
to drop, aggregate, or sample?

@paigerduty
Telemetry Types
16

Events

Traces

Metrics

Logs

@paigerduty
EVENTS
-Deployments

-Alerts

-Feature Flags

-Infrastructure Updates

17

@paigerduty
EVENTS
18

@paigerduty
METRICS
19
-Aggregations

-Downsampling

-Deriving

@paigerduty
METRICS
20

@paigerduty
TRACES
-Sampling strategy

-Adding metadata to
spans




21

@paigerduty
TRACES
22

@paigerduty
LOGS
23
-Indexing

-Structuring

-Querying unindexed
logs

@paigerduty
LOGS
24
{
"zip":"94025",
"phone":"408 496-7223",
“birthdate”: “1990/09/24”,
"fname":"Johnson",
"cc_cvc":"123",
"city":"Menlo Park",
"lname":"Smith",
"email":"[email protected]",
"address":"10932 Bigge Rd",
"cc_number":"5270 4267 6450
5516",
"state":"CA",
"cc_expiredate":"2010/06/25",
}


{
"zip":"94025",
"phone":"408 496-7223",
“birthdate”: “**********”,
"fname":"Johnson",
"cc_cvc":"123",
"city":"Menlo Park",
"lname":"Smith",
"email":"[email protected]",
"address":"10932 Bigge Rd",
"cc_number":"5270 4267 6450
5516",
"state":"CA",
"cc_expiredate":"2010/06/25",
}

@paigerduty
THE “HOW”
25

@paigerduty
Follow
Telemetry
Journey

@paigerduty 27
Scenario
You just survived another re-org and
noticed a typo for your team name on the
label team-owner:Teckel should be Texel

As far as you know the field is
auto-magically added…where do you go to
fix it?

@paigerduty 28

@paigerduty
Generate
29
Collect
Transform
Export
Store

@paigerduty
Generate
30
Collect
Transform
Export
Store

@paigerduty
Generate
31
Collect
Transform
Export
Store

@paigerduty
Generate
32
Collect
Transform
Export
Store

@paigerduty
Generate
33
Collect
Transform
Export
Store

@paigerduty
THE “HOW”
34

@paigerduty
Deciphering
Visualizations

@paigerduty 36
Scenario
It’s time to prep for the major traffic event of
the year and you’re working with a
performance engineer to set up load testing.

What do you need to keep in mind when
reviewing data from 1+ years ago in your o11y
tool?

@paigerduty 37

@paigerduty 38

@paigerduty 39

@paigerduty 40

@paigerduty
THE “HOW”
41
NYT “What’s Going On In This
Graph?”
Cognitive Apprenticeship SRECon

@paigerduty
NOW WHAT?

@paigerduty
-Slawek Ligus, Effective Monitoring and Alerting
43
Accept the reality —
the system as
perceived is not the
system as found

@paigerduty
O11y Knowledge Check
Where are all
the places I can
send data?
What is the
retention period for
logs? Metrics?
What metadata
(tags/labels) are
added
automatically?
How do we
sample
distributed
traces?
Do we use a pull
or push based
system?
44
When and where are
aggregations applied to
metrics?

@paigerduty
Learning Resources
45
Software Telemetry
Jamie Riedesel
https://o11y-workshops.gitlab.io/

@paigerduty 46
THANK YOU!
•Presentation template by SlidesCarnival
•Photographs by Pexels
•Icons by Freepik