Running a Go App in Kubernetes: CPU Impacts

ScyllaDB 484 views 25 slides Jul 01, 2024
Slide 1
Slide 1 of 25
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25

About This Presentation

Understanding the impacts of running a containerized Go application inside Kubernetes with a focus on the CPU.


Slide Content

Running a Go App in Kubernetes: CPU Impacts Teiva Harsanyi SRE at Google

Teiva Harsanyi SRE at Google SRE in the Borg ML team 100 Go Mistakes author — 100go.co /book

Introduction K8s is not straightforward , it's easy to be wrong Unveil some Go & k8s complexity Discuss the impacts of running a Go app inside k8s — focus on CPU

Core Concepts Go and k8s Scheduler, GOMAXPROCS

Go Scheduling 3 key components: G: Goroutine M: OS thread ( machine ) P: CPU core ( processor ) Main actors: OS scheduler: assigns an M on a P Go scheduler: assigns a G on an M

M1 (runnable) P0 M0 LRQ0 GRQ G - executing P1 LRQ1 G - runnable go f() Go Scheduling G: goroutine, M: OS thread, P: CPU core

P0 M0 LRQ0 GRQ G - executing P1 M1 LRQ1 G - runnable Step 1 1/61 G - executing Go Scheduling G: goroutine, M: OS thread, P: CPU core

P0 M0 LRQ0 GRQ G - executing P1 M1 LRQ1 G - waiting Go Scheduling G: goroutine, M: OS thread, P: CPU core G - executing

P0 M0 LRQ0 GRQ G - executing P1 M1 LRQ1 G - executing G - runnable Go Scheduling G: goroutine, M: OS thread, P: CPU core Step 2

P0 M0 LRQ0 GRQ G - executing P1 M1 LRQ1 G - runnable Step 3 Work stealing Go Scheduling G: goroutine, M: OS thread, P: CPU core Step 2 G - executing

P0 M0 LRQ0 GRQ G - executing P1 M1 LRQ1 Go Scheduling G: goroutine, M: OS thread, P: CPU core Step 5 Network polling Step 3 No goroutine, what's next? Step 4

GOMAXPROCS Variable that defines the number of M (OS threads) that can execute user-level Go code simultaneously runtime.GOMAXPROCS(8) // Set GOMAXPROCS to 8 n := runtime.GOMAXPROCS(0) // Get the current value of GOMAXPROCS N otes: If an M is blocked, the Go scheduler can spin up more Ms The GC derives the limit of how much CPU it should consume from GOMAXPROCS

k8s Deployment Config spec: containers: - name: img image: img:latest resources: requests: cpu: "2000m" <--- Guaranteed minimum amount of CPU resources limits: cpu: "4000m" <--- Maximum amount of CPU resources

k 8s Scheduler k8s uses Completely Fair Scheduler (CFS) as a process scheduler Two main parameters: c pu.cfs_period_us : Period (set to 100 ms by default) c pu.cfs_quota_us : The amount of CPU time the app can consume during the defined period Example: if the resources.limits.cpu is set to 2000m (= 2 cores) => 2x100 = 200 ms => Each period of 100 ms , the app can consume up to 200 ms of CPU resource

Wait... What’s the default value of GOMAXPROCS ? Equal to runtime.NumCPU() , the number of CPU cores GOMAXPROCS Kubernetes 4 CPU-Core Machine spec: resources: limits: cpu: 1000m App => I n this situation, will GOMAXPROCS be equal to 4 or to 1? It will be equal to 4 , Go isn’t CFS-aware ( github.com/golang/go/issues/33803 )

Experiment

Scenario Kubernetes Load tester HTTP spec: resources: requests: cpu: 1000m limits: cpu: 1000m 4 CPU-Core Machine CPU-bound App ~50 ms Let's benchmark this scenario with: GOMAXPROCS = 1 GOMAXPROCS = 4

Results

Why? rps < 20 Quota: 100 ms Used: 99 ms ✅ 100 ms period 28 ms 53 ms 18 ms Quota: 100 ms Used: 86 ms ✅ 100 ms period 19 ms 17 ms 25 ms 25 ms Quota: 100 ms Used: 90 ms ✅ Core 0 Core 1 Core 2 Core 3 100 ms period 25 ms 28 ms 19 ms 18 ms GOMAXPROCS=4

... ... ... ... Why? rps >= 20 Throttling Throttling Throttling Throttling Quota: 100 ms Used: 100 ms ❌ 25 ms 25 ms 25 ms 25 ms Core 0 Core 1 Core 2 Core 3 100 ms period Throttling Throttling Throttling Throttling Quota: 100 ms Used: 100 ms ❌ 25 ms 25 ms 25 ms 25 ms 100 ms period Throttling Throttling Throttling Throttling Quota: 100 ms Used: 100 ms ❌ 25 ms 25 ms 25 ms 25 ms 100 ms period GOMAXPROCS=4

Solution Is the solution to remove the limit? Yes, in most cases Main drawbacks : May increase latency variance Be careful in some specific conditions; For example, if a workload has direct correlation between CPU and memory usage, watch out to OOMs Main benefit : A workload can use all the idle CPU in the node

At Google? We do have CPU limits Not set manually , the CPU limit is recalculated each time a job is rescheduled onto different machine GOMAXPROCS is also set automatically depending on the CPU limit

Conclusion

Conclusion Be aware that Go isn't CFS-aware GOMAXPROCS should reflect the available compute parallelism B e careful with k8s CPU limits Benchmarks for the win

Teiva Harsanyi @teivah blog.teivah.io Thank you! Let’s connect.
Tags