$ whoami
Co-founder of robusta.dev
Multi-cluster Kubernetes observability
Add-on to Prometheus
Substack newsletter: Why this Kubernetes thing?
Natan Yellin aantnrobusta-dev
How should I gather
Prometheus metrics from
all my tenants?
Natan Yellin aantnrobusta-dev
Assumptions
Natan Yellin aantn
Clusters
Namespaces
Virtual clusters (e.g. capsule, kamaji, vcluster)
etc...
1. Many Kubernetes tenants
2. Tenants need some form of isolation
3. We want to monitor with Prometheus
robusta-dev
What should I use?
Natan Yellin aantnrobusta-dev
In the beginning there was one
Natan Yellin aantnrobusta-dev
In the beginning there was one
Natan Yellin aantn
Simple
No security isolation/RBAC
No performance isolation
If tenants are clusters, discovery is
annoying
Advantages:
Disadvantages:
"One team broke Prometheus for
everyone else"
robusta-dev
Then there were many
Natan Yellin aantnrobusta-dev
Then there were many
Natan Yellin aantn
Simple
Security isolation
Performance isolation
Scalable?
No unified queries
No unified management
More resources?
Advantages:
Major Disadvantage:
Minor Disadvantages:
"If you break it, it only breaks for your
product line."
robusta-dev
What we want
Natan Yellin aantn
Isolation
Scalability
Decentralized:
Query all Prometheuses at once
Centralized:
robusta-dev
What else we want?
Natan Yellin aantn
Scalability
Long term storage of metrics
1.
2.
robusta-dev
Three approaches
Natan Yellin aantnrobusta-dev
Solve it outside Prometheus
Natan Yellin aantnrobusta-dev
Solve it outside Prometheus
Natan Yellin aantn
Doesn't touch Prometheus itself
Delegates problem to other tool
Queries need to address one
Prometheus at a time
Key advantages:
Key disadvantage:
robusta-dev
Multiple + central (take 1)
Natan Yellin aantn
Reuses existing Prometheus
Federated can do roll-up
Federated can selectively scrape
With roll-up/selective you can't
actually query all Prometheuses
Scaling
Key advantages:
Key disadvantages:
robusta-dev
Natan Yellin aantn
Disclaimer: Thanos has lots of options, I'm simplifying a little
robusta-dev
Multiple + central (take 2)
Natan Yellin aantnrobusta-dev
Multiple Prometheuses + central Prometheus (take 2)
Natan Yellin aantn
Super scalable!
Reuses existing Prometheus
Very common solution, lots of tooling
No RBAC built-in
Key advantages:
Key disadvantages:
"Most mature option" - most people
robusta-dev
One Prometheus to Rule them All
Natan Yellin aantnrobusta-dev
One Prometheus to Rule them All
Natan Yellin aantnrobusta-dev
Cortex
Grafana Mimir
VictoriaMetrics
TimescaleDB
M3DB
Options:
...