stackconf 2024 | Bring Your Chaos to Work Day by Dionysios Tsoumas
NETWAYS
20 views
24 slides
Jul 02, 2024
Slide 1 of 24
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
About This Presentation
Join us for an emulated Chaos at Work Day. We’ll take over our pre-production environment and run a series of chaos experiments to test the resilience of our systems. We’ll be using a combination of LinkerD, Chaos Mesh, and K6 fault injections to simulate real-world scenarios and see how our mon...
Join us for an emulated Chaos at Work Day. We’ll take over our pre-production environment and run a series of chaos experiments to test the resilience of our systems. We’ll be using a combination of LinkerD, Chaos Mesh, and K6 fault injections to simulate real-world scenarios and see how our monitoring systems respond. We want to be confident in answering some of the following questions:
How does our monitoring and alerting stack (Grafana, Loki, Tempo + Thanos) behave under unexpected behaviors?
How do our systems respond to network latency and failures? How quickly will our on-call team be notified?
Does our observability stack provide the right level of detail to understand what’s happening?
Can we build any interesting dashboards to help us understand the impact of chaos on our systems?
Moreover, can we make it fun? We’ll be running a series of guess games throughout the talk to keep everyone engaged and entertained.
Size: 471.37 KB
Language: en
Added: Jul 02, 2024
Slides: 24 pages
Slide Content
Bring your chaos
To work Day
Dionysis Tsoumas, Director of DevOps
Ok, let’s do this!
01
02
03
04
05
Our stack
How did we get here?
Our tools
Tests and dashboards
Gamifying results
Bonus
What is chaos
engineering?
Our stack
Monitoring & observability
●Prometheus operator
●Thanos
●Grafana
●Loki
●Tempo
●K6
●OpsGenie
The Rest
●Kubernetes clusters on GCP
●GCP monitoring for fallback
●Sentry
●LinkerD
●Chaos mesh
How did we get here?
Is chaos engineering still relevant?
Right-click image to replace
Login service
Almost real life example
Bundles service
Subscriptions
service
User request
2 days event, where we tried
to bring down our staging
and monitoring platforms
Bring your chaos to work
Possibilities were endless
A few special words
about LinkerD
https://linkerd.io/