How a Failed Experiment Helped Me Understand the Go Runtime in More Depth

ScyllaDB 322 views 73 slides Oct 11, 2024
Slide 1
Slide 1 of 73
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73

About This Presentation

In 2022, I began crafting a tool to visualize Go's GC in real-time. I'll dive into the hurdles of extracting data from Go's runtime, the paths I explored—from internal packages to a Go fork—and the route I took. I'll share my insights and look into other exciting options. #golang...


Slide Content

A ScyllaDB Community
How a failed experiment helped me
understand the Go runtime in more
depth
Aadhav Vignesh
Software Engineer

Aadhav Vignesh (he/him)

Software Engineer
■Love everything about Go! ~1.5 years of experience
with the language
■Interested in distributed systems, databases and
performance
■I work on random experiments for curiosity and fun!
■Love watching sports, and playing video games

Agenda
■The experiment
■Getting information out of the Go runtime
■What worked, what didn’t
■Ongoing work
■A bit about Go runtime observability
■Understanding the existing ecosystem (pprof, etc.)
■Going beyond pprof

The experiment

The experiment
■Wanted to understand how the GC works as a Go newbie
■Found an animation of the Go GC’s tri-color abstraction, but wanted to try
something out with real data
■Goal: build a tiny real-time GC viewer for Go (for fun)
■Started exploring the Go runtime to get more ideas
■Worked on a small prototype

GC theory

A high-level overview of the Go GC
■Go employs a concurrent, tri-color, mark-and-sweep GC
●Two actors: mutators and collectors
●Mutators are responsible for allocations/mutations in the heap; collectors work on finding and
freeing garbage
●Two phases: mark and sweep
●Mark phase identifies accessible objects and marks them as active/in-use; sweep phase finds
unmarked objects and determines them as garbage

A high-level overview of the Go GC (..contd)
■What’s “tri-color”?
●The GC separates objects into three different colored sets: white, grey, black
●Black = object has been scanned completely and its descendants have been identified by the
collector
●Grey = roots discovered by the collector, but descendants not identified yet
●White = hasn’t discovered by the collector yet; may or may not be reachable from the roots

Mark and sweep phases w/ tri-color abstraction

Getting information out of the
Go runtime

Getting information out of the Go runtime
■What information do I want?
●At a high-level: “objects”
■Anything which would help me in getting an object graph representation
■Where to get this information from?
●Option 1: runtime package?
●Option 2: Through existing Go tooling
●Option 3: ??

Option 1: The runtime package
■The runtime package is a great place to get information about the Go
runtime
■One candidate worth checking out is MemStats, a structure which keeps
track of statistics related to heap allocs, GC times, etc.
■MemStats is neat, but it provides memory allocator statistics as (mostly)
numeric data
●Could be useful in most tools, but not suitable for this experiment
●Spoiler: The MemStats documentation points to the right direction, but I realized it later

Option 2: Existing Go tooling
■viewcore is a command-line tool for exploring core dumps of a Go process
provided under golang.org/x/debug/cmd/viewcore
■However, viewcore has a few issues with parsing heap dumps and panics at a
few areas (hasn’t been updated for the Go 1.22 heap layout)
■During my experimentation, viewcore broke at a lot of places
■The Go team is working on fixing these issues; Proposal: #57447

Option 3: Heap dumps
■Go provides a way to get a heap dump using debug.WriteHeapDump
■The heap dump is a sequence of records; Go writes 17 types of records
(object, itab, etc.) to the heap dump
■A heap dump is useful for extracting “objects”
■Possibly the only correct option for this project; a heap dump provides a view
into the program’s memory at a point in time

Other options
■Forking Go
●Tempting, but the runtime moves a high velocity
●Lots of tweaks need to be made to extract information out neatly
●I wanted to have the visualization happen on normal Go installations too

Implementation

Implementation
■Read heap dumps generated by debug.WriteHeapDump
■Parse the heap dump based on the documented format
●Wrote a small library for reading Go heap dumps: https://github.com/burntcarrot/heaputil
■Generate an object graph by utilizing records parsed from the heap dump
■In-progress: Color nodes in the object graph and animate using multiple heap
dumps captured at intervals

Implementation

Implementation

What’s left to do?

What’s left?
■Try out gocore (golang.org/x/debug/internal/gocore) once stabilized
■Use delve as a means to retrieve type information

Ongoing work
(outside of my efforts)

Ongoing work
■https://github.com/golang/go/issues/57447
●Proposal for making x/debug/internal/gocore and x/debug/internal/core to work
with Go at tip.
●There has been some progress made by the Go team
■https://github.com/cloudwego/goref
●Heap analysis tool from ByteDance, uses Delve as a medium to retrieve type info
■https://github.com/adamroach/heapspurs
●Provides a small toolkit to work with Go heap dumps

I failed..

I failed, but..

I failed, but..
I wanted to learn more about Go runtime observability!

Go runtime observability

The components
■The Go runtime can be broken down into these “components”:
●GMP
■G = goroutines
■M = threads
■P = processors
●Scheduler
●Garbage Collector
●netpoller
●Concurrency primitives (channels, etc.)
●etc.

What’s available right now
■pprof
●CPU
●Memory
●Goroutine
●Block
●Mutex
■Execution tracing
■GC traces
■and much more..

pprof

CPU
■Helps in identifying which parts consume a lot of CPU time.
■Can be enabled through:
●pprof.StartCPUProfile(w)
●import _ "net/http/pprof" (GET /debug/pprof/profile?seconds=60)
■Observable parts: All Go code (spending time on CPU)
■Non-observable parts: I/O

Memory
■Helps in identifying which parts perform more heap allocations.
■Can be enabled through:
●pprof.StartMemProfile(w)
●import _ "net/http/pprof" (GET /debug/pprof/allocs?seconds=60)
■Observable parts: Reachable objects
■Non-observable parts: CGO allocs, Object lifetime, unreachable objects

Goroutines
■Used for getting information about active goroutines
■Can be enabled through:
●pprof.Lookup("goroutine").WriteTo(w, 0)
●import _ "net/http/pprof" (GET /debug/pprof/goroutine)
■Observable parts: All goroutines present in the internal allg struct
■Non-observable parts: Short-lived goroutines (~1μs)

Block profile
■The block profiler helps in identifying time spent by goroutines waiting for
channel operations, mutex operations, etc.
■Can be enabled through:
●pprof.Lookup("block").WriteTo(w, 0)
■Need to set runtime.SetBlockProfileRate(rate) before profiling
●import _ "net/http/pprof" (GET /debug/pprof/block?seconds=60)
■Observable parts: Synchronization primitives (channels, mutexes, etc.)
■Non-observable parts: in-progress blocking events, I/O, sleep, etc.

Mutex profile
■Useful for identifying time spent on lock contentions
■Can be enabled through:
●pprof.Lookup("mutex").WriteTo(w, 0)
■Need to set runtime.SetMutexProfileFraction(rate) before
profiling
●import _ "net/http/pprof" (GET /debug/pprof/mutex)
■Observable parts: Mutexes (sync.Mutex/sync.RWMutex), etc.
■Non-observable parts: Spinning (locks), sync.WaitGroup, sync.Cond, etc.

Going beyond pprof

But why?
■pprof should be able to provide a sufficient/clear picture of what’s happening
in your application
■In case you find yourself stuck in a situation where pprof can’t help, then the
next slides might help
■The last option would be to fork the Go source and add manual debug logs or
print statements; but it’s unlikely that you’d reach this point

Execution tracer
■Helpful for finding information about voids observed in normal profiling
■Captures a ton of information including:
●Goroutine events (creation, blocking, etc.)
●GC events (STW, mark assist, etc.)
●Syscalls
●Etc.
■Can be enabled through: trace.Start(w); defer trace.Stop()
■Reduced overhead (#60773) (20-30% prior to Go 1.21, post Go 1.21 overhead = 1-2%)
■Alternative frontend: https://github.com/dominikh/gotraceui

■#60773; kudos to the Go team and contributors!

https://go.dev/blog/execution-traces-2024

Trace view (with flow events enabled)

Profiles provided by the execution tracer

Goroutine analysis

Goroutine analysis (summary)

Goroutine analysis (breakdown view)

GC traces
■Helpful for getting granular information GC cycles
■Slightly less overhead than the execution tracer
■Can be enabled through: GODEBUG=gctrace=1
■You can also get information about the GC pacer:
GODEBUG=gctrace=1,gcpacertrace=1

GC trace

GC trace breakdown

GC trace breakdown

GC trace breakdown

GC trace breakdown

GC trace breakdown

GC trace breakdown

GC trace breakdown

GC trace breakdown

Scheduler traces
■Helpful for getting granular data about the scheduler like:
●Processor information (idle Ps, etc.)
●Thread information (idle threads, spinning threads, etc.)
●Global and local runqueue for Gs (goroutines)
●etc.
■Slightly less overhead than the execution tracer
■Can be enabled through: GODEBUG=schedtrace=1000
●Emits information every 1000 milliseconds, usage: GODEBUG=schedtrace=X, where X is
duration

Scheduler trace

Scheduler trace (breakdown)

Scheduler trace (breakdown)

Scheduler trace (breakdown)

Scheduler trace (breakdown)

fgprof
■Sampling Go profiler which captures information about both on-CPU and
off-CPU time.
■CPU profiling provided by pprof can’t observe off-CPU time
■Useful for observing off-CPU time (I/O, etc.)
■Limitations:
●fgprof is implemented as a background goroutine, so it relies on the Go scheduler, and any
scheduler delays might result in slightly inaccurate data
●Scales O(N) with number of goroutines, slightly less efficient on goroutine-heavy applications

Even further..

runtime/metrics
■Source for programatically getting metrics exported by the Go runtime
■88 metrics exposed (Go 1.23.2), 56 (excluding /godebug metrics)
■Open proposal to define a recommended set of metrics: #67120

Open proposal to define a recommended set of metrics:
#67120

Metrics exposed by runtime/metrics (as of Go 1.23.2)

Future opportunities

Future opportunities
■eBPF is pretty promising, and some tooling utilizing it could aid in more Go
runtime observability
■gocore and core packages once made public could be really useful too.

Conclusion

Conclusion
■Ending up in a rabbit hole is a great way to learn a lot of things!
■Go has a great set of tooling, and provides some really nice ways to
understand more about the Go runtime
■The contributors and the Go team deserve a huge kudos for supporting these
tools!

Further reading

Further reading
■Go docs:
●runtime/pprof: https://pkg.go.dev/runtime/pprof
●runtime/metrics: https://pkg.go.dev/runtime/metrics
●runtime/trace: https://pkg.go.dev/runtime/trace
●Go environment variables: https://pkg.go.dev/runtime#hdr-Environment_Variables
■Felix Geisendörfer’s Go Profiler Notes:
https://github.com/DataDog/go-profiler-notes

Thank you!
Email: [email protected]
Twitter: @carrotburnt
Bluesky: [email protected]
Website: https://databases.systems
●I’ll post a more detailed version here!
Tags