Metrics, logs, traces, and mayhem_ introducing an observability adventure game powered by Grafana Alloy and OTel
ImmaValls
8 views
45 slides
Oct 18, 2025
Slide 1 of 45
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
About This Presentation
This hands-on workshop transforms learning about observability into an engaging adventure!
Step into a text-based game where you’ll master essential observability tools—metrics, logs, and traces—and discover how to leverage them effectively for real-world troubleshooting and gaining critical ...
This hands-on workshop transforms learning about observability into an engaging adventure!
Step into a text-based game where you’ll master essential observability tools—metrics, logs, and traces—and discover how to leverage them effectively for real-world troubleshooting and gaining critical insights into your applications.
Powered by OpenTelemetry (OTel) and the Grafana stack, this session offers a unique, interactive environment for exploring the critical components of a robust observability strategy. You’ll gain practical experience identifying, understanding, and resolving complex system issues.
Whether you’re a developer, operator, or just curious about understanding your system’s health, you’ll leave with a solid foundation on how OpenTelemetry and core observability tools can empower you to solve complex problems confidently.
You’re also welcome to work in pairs and collaborate on your observability adventure!
Pre-requisites: Docker and Git are installed on your machine, or access to a web browser to run the workshop on a Killercoda playground
Size: 9.68 MB
Language: en
Added: Oct 18, 2025
Slides: 45 pages
Slide Content
Logs, Metrics, Traces and Mayhem:
An observability adventure game powered by Grafana Alloy and OTel
Who am I?
Imma Valls
Staff Developer Advocate
Grafana Labs
https://eyeveebee.dev/
https://github.com/immavalls
https://www.linkedin.com/in/imma-valls
Setting the scene
1.What is observability?
2.What can go wrong?
3.Some telemetry signals: metrics, logs & traces
4.Let’s play!
5.Not just playing …
6.More observability adventures, please!
7.Takeaways
Black Box
Input Output
1.What is observability?
print(“Why is this broken?!” )
2. Observability Lessons
Matchmaking DB
Failures
2021
Performance
Problems
2020
Login & Scaling
Issues
2023
When things go wrong…
2. Why does this happen?
●Complexity
○Dawn of Microservices
●Companies are victim of their own success
○Demand
●We are all human…
○Bad code
○Bad package
○Users do User things
3.1. Metrics - What happened?
●Quantifiable data
●Time series
●Unique name
●Label key-value pairs
●Metric types (counter, gauge, etc)
http_requests_total{method="post",code="200",path="/api/v1/users"} 1234
node_memory_usage_bytes{instance="server-01",job="node-exporter"} 8432678912
3.2. Logs - Why did it happen?
●Bread crumbs…
●Log entries can take many
forms.
[2024-04-09T12:44:24.076] WARN: One of your party members has fallen ill! We should find medicine soon.
[2024-04-09T13:44:24.076] ERROR: Your party member has died…
Log Entry
Timestamp
Severity
Message
Metadata
3.3. Traces - Where did it happen?
●Track requests across services
●Traces are made of spans
●Help pinpoint slowdowns and
failures
●Essential for distributed systems
3.4. Microservices Example
Customers’ interface for placing orders
Kitchen Service
Manage order preparation
Delivery System
Tracking and final delivery
Pizza Delivery Application
3.5. Exemplars
●A rich data point that links a high-level
metric observation (e.g. a latency spike) to
a specific trace (span / trace ID).
●Bridges the gap between Metrics and
Traces.
●In Grafana, exemplars appear as markers
on your metric graphs / link .
https://grafana.com/docs/grafana/latest/fundamentals/exemplars
3.4. Exemplars
●Pizzas are arriving late (metric), why?
3.4. Exemplars
●Pizzas are arriving late (metric), why?
●From the latency graph, go to traces
associated with those delays, and check
where the delay happens (exemplar)
3.4. Exemplars
●Pizzas are arriving late (metric), why?
●From the latency graph, go to traces
associated with those delays, and check
where the delay happens (exemplar)
●It’s raining where orders are delayed!
How do all these telemetry signals
come together?
4. Let’s play!
●Forge an awesome sword
●Make it more powerful
●Take on a quest
●Defeat the wizard
●Save the town!
Teleprinter
4.2. OpenTelemetry
●A method of standardizing telemetry signals:
○Framework
○API
○SDK
○Collector
●We use the Python SDK to standardize how we
write metrics and logs
main.py
4.3. Grafana Alloy
●OpenTelemetry collector distribution
●Alloy performs two tasks in our adventure
○Receives
○Sends
4.4. Telemetry Storage
Loki
●Logs Store
●LogQL
Prometheus
●Metrics Store
●PromQL
4. Let’s play!
1.What metrics are useful to play the game?
2.What useful log appears when we accept the offer
from the mysterious man in the village?
3.What dashboard allows us to view traces?
4.What does an exemplar allow us to view?
5.How do we navigate from logs to traces?
6.How do we navigate telemetry signals without using
logQL or promQL?
7.Did we have any useful alerts set up?
8.Did anyone spot an Easter egg???????
1.What metrics are useful to play the game?
4. Let’s play!
2. What useful log appears when we accept the offer
from the mysterious man in the village?
4. Let’s play!
3. What dashboard allows us to view traces?
4. Let’s play!
4. What does an exemplar allow us to view?
4. Let’s play!
5. How do we navigate from logs to traces?
4. Let’s play!
4. Let’s play!
6. How do we navigate telemetry signals without using
logQL or promQL?
6. How do we navigate telemetry signals without using
logQL or promQL?
https://grafana.com/blog/2025/02/20/grafana-drilldown-apps-the-improved-queryless-experience-formerly-known-as-the-explore-apps/
4. Let’s play!
7. Did we have any useful alerts set up?
4. Let’s play!
4. Let’s play!
7. Did we have any useful alerts set up?
8. Did anyone spot an Easter egg???????
4. Let’s play!
5. Not just playing …
1.Create a cheat counter
2.Modify the dashboard to add the counter
https://opentelemetry.io/docs/specs/otel/metrics/api
5. Not just playing…
1.Create a cheat counter
5. Not just playing …
2. Modify the dashboard to add the counter
6. More observability adventures, please!
https://github.com/grafana/alloy-scenarios/tree/main/game-of-tracing
●OpenTelemetry tracing
fundamentals
●Context propagation & span links
●Correlation for debugging
6. More observability adventures, please!
https://grafana.com/blog/2025/08/11/learn-opentelemetry-traci
ng-through-a-grand-strategy-game-introducing-game-of-traces/
https://killercoda.com/grafana-labs/course/workshops/game-of-traces
6. More observability adventures, please!
6. More observability adventures, please!
7. Takeaways
●Observability is no longer a luxury
●What, Why, Where
●OpenTelemetry unifies telemetry signals
●Learning Observability doesn’t have to be a chore
Thanks for listening, adventurer!
Have ye any questions?