stackconf 2024 | Insights into Managed Service Provision A STACKIT Retrospective by Patrick Koss.pdf
NETWAYS
40 views
25 slides
Jul 25, 2024
Slide 1 of 25
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
About This Presentation
Embark on the innovative journey of STACKIT, a premier European cloud provider, as we showcase our expertise in Kubernetes-based managed services. This talk will delve into the dynamic processes of deploying and managing robust services like PostgreSQL, InfluxDB and MongoDB on Kubernetes clusters, a...
Embark on the innovative journey of STACKIT, a premier European cloud provider, as we showcase our expertise in Kubernetes-based managed services. This talk will delve into the dynamic processes of deploying and managing robust services like PostgreSQL, InfluxDB and MongoDB on Kubernetes clusters, a testament to our technological prowess. We’ll unravel our multifaceted strategies for processing customer requests, from the initial API call to the final deployment stage. Our discussion will highlight diverse methodologies, including the integration of databases with message queues and the direct creation of Kubernetes resources, offering insights into their unique efficiencies and challenges. Join us to deeply understand the trade-offs of each approach. We’ll address vital issues such as scaling capabilities, backup strategies and effective resource management.
Size: 796.75 KB
Language: en
Added: Jul 25, 2024
Slides: 25 pages
Slide Content
INSIGHTS INTO MANAGED
SERVICE PROVISION: A
STACKITRETROSPECTIVE
WHO AM I?
-Senior Software Engineer / Architect
-Building PaaS in the Cloud
-Created an Observability Service, DNS and now an AI Platform
-Giving Lectures about Databases
-Like to write blog posts about programming, system design and
architecture
KUBERNETES OPERATORS
apiVersion: acid.zalan.do/v1
Kind: Postgresql
Kube API
Postgres
Operator
Watch
postgresql
StatefulSet Service
ownsowns
- Single Leader System (cached client, in-memory
adjustable queue)
- Owns: Get notified if changes occur
- Bundle operative knowledge
- Bundle functionality -> testing
- Idempotency, state transition built into the philosophy
DEPLOY AN INSTANCE (POSTGRES, GRAFANA, INFLUX)
apiVersion: v1
Kind: Ingress
---
apiVersion: v1
Kind: Service
---
apiVersion: acid.zalan.do/v1
Kind: Postgresql
yamls
N Kubernetes Cluster
???
Challenges:
-Parameterize yamls
-Get yamls in n Kubernetes Cluster
-Propagate the status of the
instance to the customer
DEPLOY AN INSTANCE (POSTGRES, GRAFANA, INFLUX)
Client (TF, UI) Restful API
Datastore
Name, Description,
Version, ACL
Store metadata
Message Queue
Publish Message
Consumer
Consume
Kubernetes
Cluster
Generate yamls and apply
DEPLOY AN INSTANCE (POSTGRES, GRAFANA, INFLUX)
Client (TF, UI) Restful API
Datastore
Name, Description,
Version, ACL
Store metadata
Outbox Worker
Message Queue
Take out
outbox
Publish
Message
Consumer
Consume
Kubernetes
Cluster
Generate yamls and apply
Cluster
N Cluster
DEPLOY AN INSTANCE (POSTGRES, GRAFANA, INFLUX)
Client (TF, UI) Restful API
Datastore
Name, Description,
Version, ACL
Store metadata
Outbox Worker
Message Queue
Take out
outbox
Publish
Agent (Message
Consumer)
Consume
Kubernetes
Cluster
Generate yamls and apply
Cluster
N Cluster
DEPLOY AN INSTANCE (POSTGRES, GRAFANA, INFLUX)
Client (TF, UI) Restful API
Datastore
Name, Description,
Version, ACL
Store metadata
Outbox Worker
Message Queue
Take out
outbox
Publish
Message
Consumer
Consume
Git Repo
Generate yamls and push
Kubernetes
Cluster
pull
apply
Cluster
N Cluster
DEPLOY AN INSTANCE (POSTGRES, GRAFANA, INFLUX)
Client (TF, UI) Restful API
Name, Description,
Version, ACL
Agent(let)
(Operator)
Kubernetes
Cluster
Generate yamls and apply
Cluster N Cluster
Instance CRD
Store
Metadata
Watch
STATUS PROPAGATION
Client (TF, UI) Restful API
Datastore
Is my instance
Ready / updated?
Desired State = Current State -> healthy
Desired State != Current State -> reconciling
Error in State -> Error
Desired ACL 1.2.3.4 != Current ACL 0.0.0.0/0 ->
reconciling
Desired healthy pods 1 > Current healthy pods 3 ->
healthy
STATUS PROPAGATION
Client (TF, UI) Restful API
Datastore
Is my instance
Ready / updated?
Cluster
Status Check
Worker
Get last check timestamp
Check Status Field
Update status
STATUS PROPAGATION
Client (TF, UI) Restful API
Datastore
Is my instance
Ready / updated?
Message Queue
Status Check
Worker
Consumer status message
Update status
Agent(let)
(Operator)
Cluster
Query / Watch
resources
Publish status
change message
STATUS PROPAGATION
Client (TF, UI) Restful API
Is my instance
Ready / updated?
Update status
Agent(let)
(Operator)
Cluster
Watch
resources
Instance CRD
Get CRD and calculate status
OTHER MANAGEMENT RESOURCES (BACKUPS, ACLS)
Client (TF, UI) Restful API
1. Create Backup
3. Reconcile
Agent
Instance
(Postgres)
4. Pg_dump
Backup CRD
2. Create Backup CRD
Object Storage
5. Upload Backup
6. Update Status
CreatedAt
8. Get Status7. Get Backups
(Status)
TAKE AWAYS
-All approaches can be valid depending on the use case
-Aiven uses a DB+Queue and host on VMs
-Measure / Monitor potential bottlenecks
-Kubernetes is a powerful platform and operators are great to manage components
-Be careful to not overload the kube api
-Commit to one approach and use it in the entire system