The Nile Approach: Re-engineering Postgres for Millions of Tenants by Gwen Shapira

ScyllaDB 178 views 33 slides Mar 10, 2025
Slide 1
Slide 1 of 33
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33

About This Presentation

Scaling relational databases is tough, especially for multi-tenant apps that need isolation and consistent performance. Nile’s “virtual tenant databases” re-engineer Postgres to scale to millions of tenants efficiently. This talk explores how they solved the challenge.


Slide Content

A ScyllaDB Community
The Nile Approach:
Re-engineering Postgres
for Millions of Tenants
Gwen Shapira
Co-founder of Nile

What do these have in common?

Multi-tenant apps
are everywhere.

Scaling them is still a
challenging engineering
problem

The problem is data.
Relational data.

•Challenges in scaling Postgres
•Multi-tenant OLTP workloads
•Nile architecture walkthrough

Agenda

Nice to meet you!
I’m Gwen Shapira
•Co-founder of Nile:
Postgres re-engineered for B2B Applications
•Presenting on behalf of a smart team
•Previously: Cloud Native Kafka lead @
Confluent, Apache Kafka core committer
•Previously: Data architect (Hadoop, MySQL,
Oracle)
•Wrote books, tweet a lot @gwenshap
6

Hardware today
•Mostly cloud
•Mostly VMs with specific sizes
•Network storage

Postgres has the usual bottlenecks

•More CPU
•Larger working set
•More IOPS
•Data scans cause
excessive noise
More traffic

Postgres issues under load:
•WAL is latency-sensitive
•Long & noisy vacuums
•Connection overhead
•Cache inefficiencies
•Contention

Distributing queries adds latency
todos users
todos users
todos users

Do few milliseconds matter?
•Postgres protocol doesn’t
support pipelining.
•ORMs tend to be chatty
•throughput =
# connections / latency

HR Application

SaaS HR – Schema / DB per tenant
16

SaaS HR – Multi-tenant schema
17

Typical Query

90% of queries are local
Scale to any number of tenants
Any number of nodes
Security and privacy

By isolating each tenant’s data to its own
Virtual Tenant Database
We get:

Nile model: Virtual tenant databases
21

Nile
Gateway
Metadata,
schemas
Tenant
1
Tenant 2
Tenant 3 Tenant 5
Tenant 4
Serverles
s
compute
Serverles
s
compute
Provisione
d
compute
Page
Service
Object Storage
(S3)
WAL
Service

Tenant Capabilities
Data Isolation
Performance
Isolation
Independent
schema
Backup /
Restore
Access
controls
Metrics,
Statistics
Branches
Placement

Nile
Gateway
Metadata,
schemas
Tenant
1
Tenant 2
Tenant 3 Tenant 5
Tenant 4
Serverles
s
compute
Serverles
s
compute
Provisione
d
compute
place_tenant(“customer 1”)
Customer 1

Nile
Gateway
Metadata,
schemas
Tenant
1
Tenant 2
Tenant 3 Tenant 5
Tenant 4
Serverles
s
compute
Serverles
s
compute
Provisione
d
compute
place_tenant(…)
Customer 2
Customer 1

Nile
Gateway
Metadata,
schemas
Tenant
1
Tenant 2
Tenant 3 Tenant 5
Tenant 4
Serverles
s
compute
Serverles
s
compute
Provisione
d
compute
route()

Nile
Gateway
Metadata,
schemas
Tenant
1
Tenant 2
Tenant 3 Tenant 5
Tenant 4
Serverles
s
compute
Serverles
s
compute
Provisione
d
compute
parse(); route();

Nile
Gateway
Metadata,
schemas
Tenant
1
Tenant 2
Tenant 3 Tenant 5
Tenant 4
Serverles
s
compute
Serverles
s
compute
Provisione
d
compute

Nile
Gateway
Metadata,
schemas
Tenant
1
Tenant 2
Tenant 3 Tenant 5
Tenant 4
Serverles
s
compute
Serverles
s
compute
Provisione
d
compute
route(random);

Nile
Gateway
Metadata,
schemas
Tenant
1
Tenant 2
Tenant 3 Tenant 5
Serverles
s
compute
Serverles
s
compute
Page
Service
Object Storage
(S3)
WAL
Service
move_tenant(…)
1. Freeze &
Detach
Tenant
1
2. Load
& Unfreeze

Tenant Isolation: Each tenant
data is separated – in storage and
memory.

Tenant placement: Pick a physical
DB for each tenant and store all
the tenant data there.

Tenant rebalancing: Move
tenants transparently to balance
load and resolve noisy neighbors.

What’s left?
•Branching, PITR
•Postgres compatibility gaps
•Partition efficiency
•Flying cars

To sum it up
Postgres Scalability
Postgres scales fairly well to
large machines and with low
latency storage. Beyond that, it
can get messy.
Luckily, multi-tenant workloads
can be split by tenant. This
provides data locality,
scale-out and data privacy.
Tenant-awareMulti-tenant apps
By separating compute and storage,
distributing each and isolating
tenants from the storage up we can
route workloads and move tenants
around to minimize query latency.
Tags