Understanding the impacts of running a containerized Go application inside Kubernetes with a focus on the CPU.
Size: 1005.65 KB
Language: en
Added: Jul 01, 2024
Slides: 25 pages
Slide Content
Running a Go App in Kubernetes: CPU Impacts Teiva Harsanyi SRE at Google
Teiva Harsanyi SRE at Google SRE in the Borg ML team 100 Go Mistakes author — 100go.co /book
Introduction K8s is not straightforward , it's easy to be wrong Unveil some Go & k8s complexity Discuss the impacts of running a Go app inside k8s — focus on CPU
Core Concepts Go and k8s Scheduler, GOMAXPROCS
Go Scheduling 3 key components: G: Goroutine M: OS thread ( machine ) P: CPU core ( processor ) Main actors: OS scheduler: assigns an M on a P Go scheduler: assigns a G on an M
M1 (runnable) P0 M0 LRQ0 GRQ G - executing P1 LRQ1 G - runnable go f() Go Scheduling G: goroutine, M: OS thread, P: CPU core
P0 M0 LRQ0 GRQ G - executing P1 M1 LRQ1 G - runnable Step 1 1/61 G - executing Go Scheduling G: goroutine, M: OS thread, P: CPU core
P0 M0 LRQ0 GRQ G - executing P1 M1 LRQ1 G - waiting Go Scheduling G: goroutine, M: OS thread, P: CPU core G - executing
P0 M0 LRQ0 GRQ G - executing P1 M1 LRQ1 G - executing G - runnable Go Scheduling G: goroutine, M: OS thread, P: CPU core Step 2
P0 M0 LRQ0 GRQ G - executing P1 M1 LRQ1 G - runnable Step 3 Work stealing Go Scheduling G: goroutine, M: OS thread, P: CPU core Step 2 G - executing
P0 M0 LRQ0 GRQ G - executing P1 M1 LRQ1 Go Scheduling G: goroutine, M: OS thread, P: CPU core Step 5 Network polling Step 3 No goroutine, what's next? Step 4
GOMAXPROCS Variable that defines the number of M (OS threads) that can execute user-level Go code simultaneously runtime.GOMAXPROCS(8) // Set GOMAXPROCS to 8 n := runtime.GOMAXPROCS(0) // Get the current value of GOMAXPROCS N otes: If an M is blocked, the Go scheduler can spin up more Ms The GC derives the limit of how much CPU it should consume from GOMAXPROCS
k8s Deployment Config spec: containers: - name: img image: img:latest resources: requests: cpu: "2000m" <--- Guaranteed minimum amount of CPU resources limits: cpu: "4000m" <--- Maximum amount of CPU resources
k 8s Scheduler k8s uses Completely Fair Scheduler (CFS) as a process scheduler Two main parameters: c pu.cfs_period_us : Period (set to 100 ms by default) c pu.cfs_quota_us : The amount of CPU time the app can consume during the defined period Example: if the resources.limits.cpu is set to 2000m (= 2 cores) => 2x100 = 200 ms => Each period of 100 ms , the app can consume up to 200 ms of CPU resource
Wait... What’s the default value of GOMAXPROCS ? Equal to runtime.NumCPU() , the number of CPU cores GOMAXPROCS Kubernetes 4 CPU-Core Machine spec: resources: limits: cpu: 1000m App => I n this situation, will GOMAXPROCS be equal to 4 or to 1? It will be equal to 4 , Go isn’t CFS-aware ( github.com/golang/go/issues/33803 )
Why? rps < 20 Quota: 100 ms Used: 99 ms ✅ 100 ms period 28 ms 53 ms 18 ms Quota: 100 ms Used: 86 ms ✅ 100 ms period 19 ms 17 ms 25 ms 25 ms Quota: 100 ms Used: 90 ms ✅ Core 0 Core 1 Core 2 Core 3 100 ms period 25 ms 28 ms 19 ms 18 ms GOMAXPROCS=4
... ... ... ... Why? rps >= 20 Throttling Throttling Throttling Throttling Quota: 100 ms Used: 100 ms ❌ 25 ms 25 ms 25 ms 25 ms Core 0 Core 1 Core 2 Core 3 100 ms period Throttling Throttling Throttling Throttling Quota: 100 ms Used: 100 ms ❌ 25 ms 25 ms 25 ms 25 ms 100 ms period Throttling Throttling Throttling Throttling Quota: 100 ms Used: 100 ms ❌ 25 ms 25 ms 25 ms 25 ms 100 ms period GOMAXPROCS=4
Solution Is the solution to remove the limit? Yes, in most cases Main drawbacks : May increase latency variance Be careful in some specific conditions; For example, if a workload has direct correlation between CPU and memory usage, watch out to OOMs Main benefit : A workload can use all the idle CPU in the node
At Google? We do have CPU limits Not set manually , the CPU limit is recalculated each time a job is rescheduled onto different machine GOMAXPROCS is also set automatically depending on the CPU limit
Conclusion
Conclusion Be aware that Go isn't CFS-aware GOMAXPROCS should reflect the available compute parallelism B e careful with k8s CPU limits Benchmarks for the win