Running Low-Latency Workloads on Kubernetes by Jimmy Zelinskie

ScyllaDB 687 views 31 slides Oct 15, 2024
Slide 1
Slide 1 of 31
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31

About This Presentation

Configuring Kubernetes for optimal workload performance is a continuous journey. Best practices can sometimes harm performance. Join us as we share our findings on running SpiceDB, a low-latency authorization system, and what has worked best for us. #Kubernetes #DevOps #SpiceDB


Slide Content

A ScyllaDB Community
Running Low-Latency Workloads
on Kubernetes
Jimmy Zelinskie
Co-Founder at AuthZed

2
$ whoami
●Cofounder authzed, the creators of SpiceDB
○Previously Red Hat, CoreOS
○OCI Maintainer, co-creator Operator Framework, etc...
●Contact
○email [email protected]
○github @jzelinskie
○x @jimmyzelinskie
○bsky @jimmy.zelinskie.com
○discord.gg/spicedb @jzelinskie

3
"Low latency" can mean
a lot of things

4
How low are we talking?
●Network Service with single-digit millisecond SLAs
●Not talking about "soft/hard real time"
●Garbage-collected language
●Running in the cloud
❌ ✅

5
What is SpiceDB?
●Database purpose-built for storing and
querying authorization data
●Inspired by Zanzibar, Google's internal system
●Written in Go, Apache 2.0 licensed
●Queries
○Can $subject take $action on $resource?
○List all subjects that can take $action on $resource
○List all resources $subject can take $action on

6
Permission checks run
before any other work
can be done

7
5ms p95  1M RPS
authzed.com/blog/google-scale-authorization

8
How do I design for
"low latency"?

9
How do I write a "low latency" system?
1.Write the system as simple and understandable as possible
2.PROFILE

10
Convenient lies
●Concurrency Control
○Ergonomic concurrency abstractions are critical
○Control over concurrency limits, cancellation, prioritization of tasks
●Memory Control
○Garbage collection is OK as long as you've got some means of
controlling allocation
○Being able to tune the GC based on your workload

11
Parallelize EVERYTHING
●Kubernetes-aware & self-clustering
●Divide & Conquer
across the deployment
●Schedule tasks by priority
●Singleflight requests
●Hedge requests

12
Serve EVERYTHING from memory
●Write the best data structures
●Connections pools maxed
●Set GOMEMLIMIT
●Reuse ALL the buffers
●Simple types simplify GC

13
got all that?
great

14
How do I run
"low latency"?

15
authzed products run
SpiceDB on Kubernetes
youtu.be/FTDMpeqj1B0

16
running SpiceDB
optimally on Kubernetes
was not easy

17
Kubernetes Defaults: are you even trying?
●Kubernetes is best effort by default
●No protection from consuming all node memory
●Kubernetes will schedule two sensitive workloads on the same node

18
Let's give the
Kubernetes scheduler
more details!

19
Taints & Tolerations
●Labels to control which pods can be
handled by which nodes
●General concept: affinity & anti-affinity
●Guarantee only 1 pod can run on 1 node
●Create a node pool specifically for our
workload
○Optimized instances and hardware for our
workload

20
Requests & Limits
●Requests: resources that must be
available on a node for a pod to be
scheduled
●Limits: max resources available to the
pod and reserved upfront
●Guarantee the resources we need for
our pods

21
We're done, right?

22
●Limits artificially limit
processing
●Limits are reactive
●Pods still suffer from context
switching
●No more OOM
●Limits make perf
predictable
Pros Cons

23
What is
context switching?

24
Context Switching
●Downstream effects
○Saving the current context
○Restoring the next context
○CPU Cache invalidated
○Kernel scheduling overhead
○Expensive! ??????
●Cooperative Scheduling -
triggered on an event
(system calls)
●Preemptive Scheduling -
triggered at any time

25
Kubernetes has
one last feature...

26
●Stable in Kubernetes v1.26
○k8s.io/docs/tasks/administer-cluster/cpu-management-policies
●Allows containers with integer CPU requests exclusive access to CPUs
●At least 1 CPU per node must be allocated to system/kubelet
Static CPU Manager Policy

27
... but we need things
to be dynamic

28
github.com/kubernetes-sigs/karpenter
●Kubernetes Node Auto-Scaler
●Originally by AWS
●Just-in-time capacity
●Alternatives
○GKE AutoPilot
○AKS Cluster Autoscaler

29
summary
●Isolation is the number one thing to guarantee latency
●Isolates need awareness of their environment
●Isolation models are in our docs and product marketing

30
Shoutouts
●Brad Ison, Evan Cordell, Víctor Roldán Betancort
●medium.com/@eliran89c/2225341e9dbd
●P99 Conf for organizing

Thank you! Let’s connect.
Jimmy Zelinskie
[email protected]
discord.gg/spicedb
Tags