stackconf 2024 | Insights into Managed Service Provision A STACKIT Retrospective by Patrick Koss.pdf

NETWAYS 40 views 25 slides Jul 25, 2024
Slide 1
Slide 1 of 25
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25

About This Presentation

Embark on the innovative journey of STACKIT, a premier European cloud provider, as we showcase our expertise in Kubernetes-based managed services. This talk will delve into the dynamic processes of deploying and managing robust services like PostgreSQL, InfluxDB and MongoDB on Kubernetes clusters, a...


Slide Content

INSIGHTS INTO MANAGED
SERVICE PROVISION: A
STACKITRETROSPECTIVE

WHO AM I?
-Senior Software Engineer / Architect
-Building PaaS in the Cloud
-Created an Observability Service, DNS and now an AI Platform
-Giving Lectures about Databases
-Like to write blog posts about programming, system design and
architecture

SOCIALS
https://www.linkedin.com/in/patrick-koss-a129071a1/
https://medium.com/@patrickkoss

PROVIDERS EVERYWHERE

WHAT IS A MANAGED SERVICE?
Updates & Upgrades Scaling Up & Down High Available
Backups & Restore Monitoring & Status & OnCall Operational Knowledge

MANAGED SERVICE EXAMPLES

WHAT FUNCTIONALITY NEED WE PROVIDE?
Instance provision, update, deprovision
Backup, Backupschedule CRUD
Restore
ACL CRUD
Config Management

THE CHALLENGE
Client (TF, UI) Restful API
Instance (DB,
LLM)
Compute
Service

COMPUTE
Virtual Machines Kubernetes
Comparison Criteria:
- Scaling
- Management
- Updates/Upgrades
- Zero Downtime -
Deployments
- Spin Up Time
- Complexity
- Team experience
- Resource Management
of Instances(GB/CPU)
- Environment

THE CHALLENGE REVISED
Client (TF, UI) Restful API
Instance (DB,
LLM)
Kubernetes Cluster
Service
Instance (DB,
LLM)
Kubernetes Cluster

INITIALIZE A KUBERNETES CLUSTER
Cert-Manager
External-DNS
External-Secrets
Prometheus
OTEL
Git Repository
watch
apply

KUBERNETES OPERATORS
apiVersion: acid.zalan.do/v1
Kind: Postgresql
Kube API
Postgres
Operator
Watch
postgresql
StatefulSet Service
ownsowns
- Single Leader System (cached client, in-memory
adjustable queue)
- Owns: Get notified if changes occur
- Bundle operative knowledge
- Bundle functionality -> testing
- Idempotency, state transition built into the philosophy

DEPLOY AN INSTANCE (POSTGRES, GRAFANA, INFLUX)
apiVersion: v1
Kind: Ingress
---
apiVersion: v1
Kind: Service
---
apiVersion: acid.zalan.do/v1
Kind: Postgresql
yamls
N Kubernetes Cluster
???
Challenges:
-Parameterize yamls
-Get yamls in n Kubernetes Cluster
-Propagate the status of the
instance to the customer

DEPLOY AN INSTANCE (POSTGRES, GRAFANA, INFLUX)
Client (TF, UI) Restful API
Datastore
Name, Description,
Version, ACL
Store metadata
Message Queue
Publish Message
Consumer
Consume
Kubernetes
Cluster
Generate yamls and apply

DEPLOY AN INSTANCE (POSTGRES, GRAFANA, INFLUX)
Client (TF, UI) Restful API
Datastore
Name, Description,
Version, ACL
Store metadata
Outbox Worker
Message Queue
Take out
outbox
Publish
Message
Consumer
Consume
Kubernetes
Cluster
Generate yamls and apply
Cluster
N Cluster

DEPLOY AN INSTANCE (POSTGRES, GRAFANA, INFLUX)
Client (TF, UI) Restful API
Datastore
Name, Description,
Version, ACL
Store metadata
Outbox Worker
Message Queue
Take out
outbox
Publish
Agent (Message
Consumer)
Consume
Kubernetes
Cluster
Generate yamls and apply
Cluster
N Cluster

DEPLOY AN INSTANCE (POSTGRES, GRAFANA, INFLUX)
Client (TF, UI) Restful API
Datastore
Name, Description,
Version, ACL
Store metadata
Outbox Worker
Message Queue
Take out
outbox
Publish
Message
Consumer
Consume
Git Repo
Generate yamls and push
Kubernetes
Cluster
pull
apply
Cluster
N Cluster

DEPLOY AN INSTANCE (POSTGRES, GRAFANA, INFLUX)
Client (TF, UI) Restful API
Name, Description,
Version, ACL
Agent(let)
(Operator)
Kubernetes
Cluster
Generate yamls and apply
Cluster N Cluster
Instance CRD
Store
Metadata
Watch

STATUS PROPAGATION
Client (TF, UI) Restful API
Datastore
Is my instance
Ready / updated?
Desired State = Current State -> healthy
Desired State != Current State -> reconciling
Error in State -> Error
Desired ACL 1.2.3.4 != Current ACL 0.0.0.0/0 ->
reconciling
Desired healthy pods 1 > Current healthy pods 3 ->
healthy

STATUS PROPAGATION
Client (TF, UI) Restful API
Datastore
Is my instance
Ready / updated?
Cluster
Status Check
Worker
Get last check timestamp
Check Status Field
Update status

STATUS PROPAGATION
Client (TF, UI) Restful API
Datastore
Is my instance
Ready / updated?
Message Queue
Status Check
Worker
Consumer status message
Update status
Agent(let)
(Operator)
Cluster
Query / Watch
resources
Publish status
change message

STATUS PROPAGATION
Client (TF, UI) Restful API
Is my instance
Ready / updated?
Update status
Agent(let)
(Operator)
Cluster
Watch
resources
Instance CRD
Get CRD and calculate status

OTHER MANAGEMENT RESOURCES (BACKUPS, ACLS)
Client (TF, UI) Restful API
1. Create Backup
3. Reconcile
Agent
Instance
(Postgres)
4. Pg_dump
Backup CRD
2. Create Backup CRD
Object Storage
5. Upload Backup
6. Update Status
CreatedAt
8. Get Status7. Get Backups
(Status)

TAKE AWAYS
-All approaches can be valid depending on the use case
-Aiven uses a DB+Queue and host on VMs
-Measure / Monitor potential bottlenecks
-Kubernetes is a powerful platform and operators are great to manage components
-Be careful to not overload the kube api
-Commit to one approach and use it in the entire system

THAT´S IT FOLKS!