Prometheus Multi Tenancy

NatanYellin 868 views 25 slides Feb 14, 2023
Slide 1
Slide 1 of 25
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25

About This Presentation

Monitoring Kubernetes multi-tenant environments. Thanos vs Cortex vs Mimir vs Federation.


Slide Content

Multi-tenant Kubernetes
observability with Prometheus
robusta-dev Natan Yellin aantn
Natan Yellin, robusta.dev

$ whoami
Co-founder of robusta.dev
Multi-cluster Kubernetes observability
Add-on to Prometheus
Substack newsletter: Why this Kubernetes thing?
Natan Yellin aantnrobusta-dev

How should I gather
Prometheus metrics from
all my tenants?
Natan Yellin aantnrobusta-dev

Assumptions
Natan Yellin aantn
Clusters
Namespaces
Virtual clusters (e.g. capsule, kamaji, vcluster)
etc...
1. Many Kubernetes tenants
2. Tenants need some form of isolation
3. We want to monitor with Prometheus
robusta-dev

What should I use?
Natan Yellin aantnrobusta-dev

In the beginning there was one
Natan Yellin aantnrobusta-dev

In the beginning there was one
Natan Yellin aantn
Simple
No security isolation/RBAC
No performance isolation
If tenants are clusters, discovery is
annoying
Advantages:
Disadvantages:
"One team broke Prometheus for
everyone else"
robusta-dev

Then there were many
Natan Yellin aantnrobusta-dev

Then there were many
Natan Yellin aantn
Simple
Security isolation
Performance isolation
Scalable?
No unified queries
No unified management
More resources?
Advantages:
Major Disadvantage:
Minor Disadvantages:
"If you break it, it only breaks for your
product line."
robusta-dev

What we want
Natan Yellin aantn
Isolation
Scalability
Decentralized:
Query all Prometheuses at once
Centralized:
robusta-dev

What else we want?
Natan Yellin aantn
Scalability
Long term storage of metrics
1.
2.
robusta-dev

Three approaches
Natan Yellin aantnrobusta-dev

Solve it outside Prometheus
Natan Yellin aantnrobusta-dev

Solve it outside Prometheus
Natan Yellin aantn
Doesn't touch Prometheus itself
Delegates problem to other tool
Queries need to address one
Prometheus at a time
Key advantages:
Key disadvantage:
robusta-dev

Multiple + Centralized (take 1)
Natan Yellin aantnrobusta-dev

Multiple + central (take 1)
Natan Yellin aantn
Reuses existing Prometheus
Federated can do roll-up
Federated can selectively scrape
With roll-up/selective you can't
actually query all Prometheuses
Scaling
Key advantages:
Key disadvantages:
robusta-dev

Natan Yellin aantn
Disclaimer: Thanos has lots of options, I'm simplifying a little
robusta-dev

Multiple + central (take 2)
Natan Yellin aantnrobusta-dev

Multiple Prometheuses + central Prometheus (take 2)
Natan Yellin aantn
Super scalable!
Reuses existing Prometheus
Very common solution, lots of tooling
No RBAC built-in
Key advantages:
Key disadvantages:
"Most mature option" - most people
robusta-dev

One Prometheus to Rule them All
Natan Yellin aantnrobusta-dev

One Prometheus to Rule them All
Natan Yellin aantnrobusta-dev
Cortex
Grafana Mimir
VictoriaMetrics
TimescaleDB
M3DB
Options:
...

Grafana Mimir
Natan Yellin aantnrobusta-dev
Native multi-tenancy!
Backed by Grafana
Complexity
Key advantages:
Key disadvantages:

Other useful tools
Natan Yellin aantn
Add prom-label-proxy to Thanos
(and others) to enforce RBAC
robusta-dev

Thank you!
Natan Yellin aantn
A special thank you to Shalom Cohen and Evgeny Uklist + Racoons team for
providing inputs
robusta-dev

Questions?
Natan Yellin aantnrobusta-dev