Postgres on Kubernetes - Dos and Donts.pdf

ChristophEngelbert 102 views 43 slides Jun 30, 2024
Slide 1
Slide 1 of 118
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118

About This Presentation

Running databases in containers has been the biggest anti-pattern of the last decade. The world, however, moves on and stateful container workloads become more common, and so do databases in Kubernetes. People love the additional convenience when it comes to deployment, scalability, and operation.

...


Slide Content

https://www.marketingdonut.co.uk/pr-and-promotion/exhibitions/dos-and-don-ts-when-exhibiting

PostgreSQL Kubernetes❤

Chris Engelbert
Devrel @ simplyblock
Previous fun companies:
-Ubisoft / Blue Byte
-Hazelcast
-Instana
-clevabit
-Timescale
Interests:
-Developer Relations
-Anything Performance Engineering
-Backend Technologies
-Fairy Tales (AMD, Intel, Nvidia)
@noctarius2k
@[email protected]
@noctarius.com

Question 01

Question 01
Why you shouldn't run a
database in Kubernetes?

Why not to run a database in Kubernetes?

Why not to run a database in Kubernetes?

Why not to run a database in Kubernetes?
K8s is not designed with Databases in mind!

Why not to run a database in Kubernetes?
K8s is not designed with Databases in mind!
Never run Stateful Workloads in k8s!

Why not to run a database in Kubernetes?
K8s is not designed with Databases in mind!
Never run Stateful Workloads in k8s!
Persistent Data will kill you! Too slow!

Why not to run a database in Kubernetes?
K8s is not designed with Databases in mind!
Never run Stateful Workloads in k8s!
Persistent Data will kill you! Too slow!
Nobody understands Kubernetes!

Why not to run a database in Kubernetes?
K8s is not designed with Databases in mind!
Never run Stateful Workloads in k8s!
Persistent Data will kill you! Too slow!
Nobody understands Kubernetes!
What’s the benefit; databases don’t need autoscaling!

Why not to run a database in Kubernetes?
K8s is not designed with Databases in mind!
Never run Stateful Workloads in k8s!
Persistent Data will kill you! Too slow!
Nobody understands Kubernetes!
What’s the benefit; databases don’t need autoscaling!
Databases and applications should be separated!

Why not to run a database in Kubernetes?
K8s is not designed with Databases in mind!
Never run Stateful Workloads in k8s!
Persistent Data will kill you! Too slow!
Nobody understands Kubernetes!
What’s the benefit; databases don’t need autoscaling!
Databases and applications should be separated!
Not another layer of indirection / abstraction!

Why not to run a database in Kubernetes?
K8s is not designed with Databases in mind!
Never run Stateful Workloads in k8s!
Persistent Data will kill you! Too slow!
Nobody understands Kubernetes!
What’s the benefit; databases don’t need autoscaling!
Databases and applications should be separated!
Not another layer of indirection / abstraction!

Why not to run a database in Kubernetes?
BURN IN HELL!

The Happy Place

Where are my gamers at?
So we need to cheat!?

The Happy Place

Why?

No Cloud-Vendor Lock-In
Why?

No Cloud-Vendor Lock-In
Faster Time To Market
Why?

No Cloud-Vendor Lock-In
Faster Time To Market
Decreasing cost
Why?

No Cloud-Vendor Lock-In
Faster Time To Market
Decreasing cost
Automation
Why?

No Cloud-Vendor Lock-In
Faster Time To Market
Decreasing cost
Automation
Unified deployment architecture
Why?

No Cloud-Vendor Lock-In
Faster Time To Market
Decreasing cost
Automation
Unified deployment architecture
Need read-only replicas
Why?

Let’s get something
out of the way first!

Call the Police!

Enable TLS
Call the Police!

Enable TLS
Use Kubernetes Secrets
Call the Police!

Enable TLS
Use Kubernetes Secrets
Use Cert-Manager
Call the Police!

Enable TLS
Use Kubernetes Secrets
Use Cert-Manager
Encrypt Data-At-Rest
Call the Police!

Enable TLS
Use Kubernetes Secrets
Use Cert-Manager
Encrypt Data-At-Rest
Call the Police!

Backup and Recovery
https://www.ovhcloud.com/de/bare-metal/backup-storage/

You want Continuous Backup and PITR
Backup and Recovery
https://www.ovhcloud.com/de/bare-metal/backup-storage/

You want Continuous Backup and PITR
Roll your own pg_basebackup or pg_dump (don’t!)
Backup and Recovery
https://www.ovhcloud.com/de/bare-metal/backup-storage/

You want Continuous Backup and PITR
Roll your own pg_basebackup or pg_dump (don’t!)
Use tools like pgbackrest, barman, PGHoard, …
Backup and Recovery
https://www.ovhcloud.com/de/bare-metal/backup-storage/

You want Continuous Backup and PITR
Roll your own pg_basebackup or pg_dump (don’t!)
Use tools like pgbackrest, barman, PGHoard, …
Upload backups to S3? Cost!
Backup and Recovery
https://www.ovhcloud.com/de/bare-metal/backup-storage/

You want Continuous Backup and PITR
Roll your own pg_basebackup or pg_dump (don’t!)
Use tools like pgbackrest, barman, PGHoard, …
Upload backups to S3? Cost!
Backup and Recovery
https://www.ovhcloud.com/de/bare-metal/backup-storage/
" Test Your Backups "

PostgreSQL Configuration

PostgreSQL Configuration
The PostgreSQL Configuration isn’t too much influenced

shared_buffers
(maintenance_)work_mem
effective_cache_size
PostgreSQL Configuration
The PostgreSQL Configuration isn’t too much influenced

shared_buffers
(maintenance_)work_mem
effective_cache_size
PostgreSQL Configuration
The PostgreSQL Configuration isn’t too much influenced
Use Huge Pages!

PostgreSQL Configuration
https://www.youtube.com/watch?v=S0LEDGbAnn8
https://www.crunchydata.com/blog/optimize-postgresql-server-performance
https://www.percona.com/blog/using-huge-pages-with-postgresql-running-inside-kubernetes/

Do you need more? Extensions!

Do you need PG Extensions?
Do you need more? Extensions!

Do you need PG Extensions?
Do you need more? Extensions!
Is the extension part of the container image?

Do you need PG Extensions?
Do you need more? Extensions!
Is the extension part of the container image?
If not, you need to build your own layer…

Do you need PG Extensions?
Do you need more? Extensions!
Is the extension part of the container image?
If not, you need to build your own layer…
or use some magic (more on this later).

Keep an Eye on PG and Kubernetes Versions
Versions and Updates

So What is important or different?

Storage

Storage

Use Persistent Volumes
Storage

Use Persistent Volumes
Storage
(local volumes are a bad idea)

Use Persistent Volumes
Storage
Should be dynamically provisioned
(local volumes are a bad idea)

Use Persistent Volumes
Storage
Should be dynamically provisioned
CSI provider enables encryption at rest
(local volumes are a bad idea)

Use Persistent Volumes
Storage
Should be dynamically provisioned
CSI provider enables encryption at rest
High IOPS (SSD or NVMe)
(local volumes are a bad idea)

Use Persistent Volumes
Storage
Should be dynamically provisioned
CSI provider enables encryption at rest
High IOPS (SSD or NVMe)
Low Latency
(local volumes are a bad idea)

Use Persistent Volumes
Storage
Should be dynamically provisioned
CSI provider enables encryption at rest
High IOPS (SSD or NVMe)
Low Latency
Database performance is as fast as your storage
(local volumes are a bad idea)

Use Persistent Volumes
Storage
Should be dynamically provisioned
CSI provider enables encryption at rest
High IOPS (SSD or NVMe)
Low Latency
Database performance is as fast as your storage
(local volumes are a bad idea)
I’d recommend a disaggregated storage!

Storage
www.storageclass.info/csidrivers

Requests, Limits, and Quotas
Capacity
Limits
Requests
Used

Requests, Limits, and Quotas
Capacity
Limits
Requests
Used
Use Resource Requests, Limits, Quotas

Requests, Limits, and Quotas
CPU and memory requests need to be accurate

to prevent contention and ensure predictable performance
Capacity
Limits
Requests
Used
Use Resource Requests, Limits, Quotas

Requests, Limits, and Quotas
CPU and memory requests need to be accurate

to prevent contention and ensure predictable performance
Capacity
Limits
Requests
Used
https://codimite.ai/blog/kubernetes-resources-and-scaling-a-beginners-guide/
Use Resource Requests, Limits, Quotas

Make it big!
Enable Huge Pages!

Make it big!
Enable Huge Pages!
In your OS and the Resource Descriptor.

Make it big!
Enable Huge Pages!
In your OS and the Resource Descriptor.
https://www.percona.com/blog/using-huge-pages-with-postgresql-running-inside-kubernetes/

Resiliency and Overhead

Resiliency and Overhead
High Availability

Patroni, repmgr, pg_auto_failover, …
Resiliency and Overhead
High Availability

Patroni, repmgr, pg_auto_failover, …
Resiliency and Overhead
High Availability
https://medium.com/@kristi.anderson/whats-the-best-postgresql-high-availability-framework...

Resiliency and Overhead

Resiliency and Overhead
Connection Pooling

Never use PostgreSQL without Connection Pooling!
Resiliency and Overhead
Connection Pooling

Never use PostgreSQL without Connection Pooling!
Optimizes Overhead and Resource Utilization
Resiliency and Overhead
Connection Pooling

Never use PostgreSQL without Connection Pooling!
Optimizes Overhead and Resource Utilization
Handles failovers, central switching of Primary
Resiliency and Overhead
Connection Pooling

Never use PostgreSQL without Connection Pooling!
Optimizes Overhead and Resource Utilization
Handles failovers, central switching of Primary
Enables easy use of Read-Replicas
Resiliency and Overhead
Connection Pooling

Never use PostgreSQL without Connection Pooling!
Optimizes Overhead and Resource Utilization
Handles failovers, central switching of Primary
Enables easy use of Read-Replicas
Resiliency and Overhead
Connection Pooling
PgBouncer, PgPool-II, pgagroal, PgCat, Odyssey, …

Never use PostgreSQL without Connection Pooling!
Optimizes Overhead and Resource Utilization
Handles failovers, central switching of Primary
Enables easy use of Read-Replicas
Resiliency and Overhead
Connection Pooling
PgBouncer, PgPool-II, pgagroal, PgCat, Odyssey, …
https://tembo.io/blog/postgres-connection-poolers

Where’s my Replicant?

Where’s my Replicant?
Use available Kubernetes features

Where’s my Replicant?
Use available Kubernetes features
StatefulSet

Where’s my Replicant?
Use available Kubernetes features
StatefulSet

Networking and Access Control
https://timeclock365.com/tc22-door-access-controller/

Use Network Policies
Networking and Access Control
https://timeclock365.com/tc22-door-access-controller/

Use Network Policies
Enable TLS (you remember?!)
Networking and Access Control
https://timeclock365.com/tc22-door-access-controller/

Use Network Policies
Enable TLS (you remember?!)
Setup Security Policies
Networking and Access Control
https://timeclock365.com/tc22-door-access-controller/

Use Network Policies
Enable TLS (you remember?!)
Setup Security Policies
Configure RBAC (Role-Based Access Control)
Networking and Access Control
https://timeclock365.com/tc22-door-access-controller/

Use Network Policies
Enable TLS (you remember?!)
Setup Security Policies
Configure RBAC (Role-Based Access Control)
Networking and Access Control
Think about a policy manager such as OPA or kyverno
https://timeclock365.com/tc22-door-access-controller/

Observability and Alerting

Observability and Alerting
Like anything cloud, make sure you have
monitoring (meaning observability) and alerting!

Prometheus Exporter, Log Collector, Aggregation, Analysis, Traceability, …
Observability and Alerting
Like anything cloud, make sure you have
monitoring (meaning observability) and alerting!

Prometheus Exporter, Log Collector, Aggregation, Analysis, Traceability, …
Observability and Alerting
Like anything cloud, make sure you have
monitoring (meaning observability) and alerting!
Datadog, Instana, DynaTrace, Grafana, …

Operator

Operator
Use a Postgres Kubernetes Operator

Operator
Use a Postgres Kubernetes Operator
Handles or configures many of the typical tasks (HA, backup, …)

Operator
Use a Postgres Kubernetes Operator
Handles or configures many of the typical tasks (HA, backup, …)
Brings cloud-nativeness to PG

Operator
Use a Postgres Kubernetes Operator
Handles or configures many of the typical tasks (HA, backup, …)
Brings cloud-nativeness to PG
Integrates PG into k8s

Operator
If not, use Helm Charts
Use a Postgres Kubernetes Operator
Handles or configures many of the typical tasks (HA, backup, …)
Brings cloud-nativeness to PG
Integrates PG into k8s

Operator
CloudNativePG
Crunchy Postgres
for Kubernetes
OnGres
StackGres
KubeDB
Zalando Postgres
Operator
Supported
versions
12, 13, 14, 15, 1611, 12, 13, 14, 15, 1612, 13, 14, 15, 169.6, 10, 11, 12, 13, 1411, 12, 13, 14, 15, 16
Postgres
Clusters
✔ ✔ ✔ ✔ ✔
Streaming
replication
✔ ✔ ✔ ✔ ✔
Supports
Extensions
✔ ✔ ✔ ✔ ✔

Operator
CloudNativePG
Crunchy Postgres
for Kubernetes
OnGres
StackGres
KubeDB
Zalando Postgres
Operator
Hot Standby ✔ ✔ ✔ ✔ ✔
Warm Standby ✔ ✔ ✔ ✔ ✔
Automatic Failover ✔ ✔ ✔ ✔ ✔
Continuous
Archiving
✔ ✔ ✔ ✔ ✔
Restore from

WAL archive
✔ ✔ ✔ ✔ ✔
Supports PITR ✔ ✔ ✔ ✔ ✔
Manual backups ✔ ✔ ✔ ✔ ✔
Scheduled backups ✔ ✔ ✔ ✔ ✔

Operator
CloudNativePG
Crunchy Postgres
for Kubernetes
OnGres
StackGres
KubeDB
Zalando Postgres
Operator
Backups via
Kubernetes
✔ ✘ ✔ ✔ ✘
Custom
resources
✔ ✔ ✔ ✔ ✔
Uses default
PG images
✘ ✔ ✔ ✘ ✘
CLI access ✔ ✔ ✔ ✔ ✘
WebUI ✘ ✘ ✔ ✔ ✘
Tolerations ✔ ✔ ✔ ✔ ✔
Node affinity ✔ ✔ ✔ ✔ ✔

Operator
https://www.simplyblock.io/post/choosing-a-postgres-kubernetes-operator
https://operatorhub.io/?keyword=postgres

Pinning and Tainting

Always use specific, dedicated machines for your database.
Pinning and Tainting

Always use specific, dedicated machines for your database.
Pinning and Tainting
(except you’re running super small databases)

Always use specific, dedicated machines for your database.
Pin your database containers to those hosts.
Pinning and Tainting
(except you’re running super small databases)

Always use specific, dedicated machines for your database.
Pin your database containers to those hosts.
Taint the hosts to prevent anything else from running on it.
Pinning and Tainting
(except you’re running super small databases)

Always use specific, dedicated machines for your database.
Pin your database containers to those hosts.
Taint the hosts to prevent anything else from running on it.
Pinning and Tainting
(except you’re running super small databases)
(except the minimum necessary Kubernetes services, like KubeProxy)

Trust me, I’m Kelsey!
https://x.com/kelseyhightower/status/1624081136073994240

Trust me, I’m Kelsey!
https://x.com/kelseyhightower/status/1624081136073994240

More Resources
Data on Kubernetes Community: https://dok.community
Data on Kubernetes Whitepaper

Thank you very much!
Questions?
@noctarius2k
@[email protected]
@noctarius.com