Running Low-Latency Workloads on Kubernetes by Jimmy Zelinskie

ScyllaDB 687 views 31 slides Oct 15, 2024

Slide 1 of 31

About This Presentation

Configuring Kubernetes for optimal workload performance is a continuous journey. Best practices can sometimes harm performance. Join us as we share our findings on running SpiceDB, a low-latency authorization system, and what has worked best for us. #Kubernetes #DevOps #SpiceDB

Size: 6.13 MB

Language: en

Added: Oct 15, 2024

Slides: 31 pages

Slide Content

A ScyllaDB Community
Running Low-Latency Workloads
on Kubernetes
Jimmy Zelinskie
Co-Founder at AuthZed

2
$ whoami
●Cofounder authzed, the creators of SpiceDB
○Previously Red Hat, CoreOS
○OCI Maintainer, co-creator Operator Framework, etc...
●Contact
○email [email protected]
○github @jzelinskie
○x @jimmyzelinskie
○bsky @jimmy.zelinskie.com
○discord.gg/spicedb @jzelinskie

3
"Low latency" can mean
a lot of things

4
How low are we talking?
●Network Service with single-digit millisecond SLAs
●Not talking about "soft/hard real time"
●Garbage-collected language
●Running in the cloud
❌ ✅

5
What is SpiceDB?
●Database purpose-built for storing and
querying authorization data
●Inspired by Zanzibar, Google's internal system
●Written in Go, Apache 2.0 licensed
●Queries
○Can $subject take $action on $resource?
○List all subjects that can take $action on $resource
○List all resources $subject can take $action on

6
Permission checks run
before any other work
can be done

7
5ms p95  1M RPS
authzed.com/blog/google-scale-authorization

8
How do I design for
"low latency"?

9
How do I write a "low latency" system?
1.Write the system as simple and understandable as possible
2.PROFILE

10
Convenient lies
●Concurrency Control
○Ergonomic concurrency abstractions are critical
○Control over concurrency limits, cancellation, prioritization of tasks
●Memory Control
○Garbage collection is OK as long as you've got some means of
controlling allocation
○Being able to tune the GC based on your workload

11
Parallelize EVERYTHING
●Kubernetes-aware & self-clustering
●Divide & Conquer
across the deployment
●Schedule tasks by priority
●Singleflight requests
●Hedge requests

12
Serve EVERYTHING from memory
●Write the best data structures
●Connections pools maxed
●Set GOMEMLIMIT
●Reuse ALL the buffers
●Simple types simplify GC

13
got all that?
great

14
How do I run
"low latency"?

15
authzed products run
SpiceDB on Kubernetes
youtu.be/FTDMpeqj1B0

16
running SpiceDB
optimally on Kubernetes
was not easy

17
Kubernetes Defaults: are you even trying?
●Kubernetes is best effort by default
●No protection from consuming all node memory
●Kubernetes will schedule two sensitive workloads on the same node

18
Let's give the
Kubernetes scheduler
more details!

19
Taints & Tolerations
●Labels to control which pods can be
handled by which nodes
●General concept: affinity & anti-affinity
●Guarantee only 1 pod can run on 1 node
●Create a node pool specifically for our
workload
○Optimized instances and hardware for our
workload

20
Requests & Limits
●Requests: resources that must be
available on a node for a pod to be
scheduled
●Limits: max resources available to the
pod and reserved upfront
●Guarantee the resources we need for
our pods

21
We're done, right?

22
●Limits artificially limit
processing
●Limits are reactive
●Pods still suffer from context
switching
●No more OOM
●Limits make perf
predictable
Pros Cons

23
What is
context switching?

24
Context Switching
●Downstream effects
○Saving the current context
○Restoring the next context
○CPU Cache invalidated
○Kernel scheduling overhead
○Expensive! ??????
●Cooperative Scheduling -
triggered on an event
(system calls)
●Preemptive Scheduling -
triggered at any time

25
Kubernetes has
one last feature...

26
●Stable in Kubernetes v1.26
○k8s.io/docs/tasks/administer-cluster/cpu-management-policies
●Allows containers with integer CPU requests exclusive access to CPUs
●At least 1 CPU per node must be allocated to system/kubelet
Static CPU Manager Policy

27
... but we need things
to be dynamic

28
github.com/kubernetes-sigs/karpenter
●Kubernetes Node Auto-Scaler
●Originally by AWS
●Just-in-time capacity
●Alternatives
○GKE AutoPilot
○AKS Cluster Autoscaler

29
summary
●Isolation is the number one thing to guarantee latency
●Isolates need awareness of their environment
●Isolation models are in our docs and product marketing

30
Shoutouts
●Brad Ison, Evan Cordell, Víctor Roldán Betancort
●medium.com/@eliran89c/2225341e9dbd
●P99 Conf for organizing

Thank you! Let’s connect.
Jimmy Zelinskie
[email protected]
discord.gg/spicedb

Running Low-Latency Workloads on Kubernetes by Jimmy Zelinskie

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Running Low-Latency Workloads on Kubernetes by Jimmy Zelinskie

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......