How a Failed Experiment Helped Me Understand the Go Runtime in More Depth
ScyllaDB
322 views
73 slides
Oct 11, 2024
Slide 1 of 73
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
About This Presentation
In 2022, I began crafting a tool to visualize Go's GC in real-time. I'll dive into the hurdles of extracting data from Go's runtime, the paths I explored—from internal packages to a Go fork—and the route I took. I'll share my insights and look into other exciting options. #golang...
In 2022, I began crafting a tool to visualize Go's GC in real-time. I'll dive into the hurdles of extracting data from Go's runtime, the paths I explored—from internal packages to a Go fork—and the route I took. I'll share my insights and look into other exciting options. #golang #gc
Size: 2.67 MB
Language: en
Added: Oct 11, 2024
Slides: 73 pages
Slide Content
A ScyllaDB Community
How a failed experiment helped me
understand the Go runtime in more
depth
Aadhav Vignesh
Software Engineer
Aadhav Vignesh (he/him)
Software Engineer
■Love everything about Go! ~1.5 years of experience
with the language
■Interested in distributed systems, databases and
performance
■I work on random experiments for curiosity and fun!
■Love watching sports, and playing video games
Agenda
■The experiment
■Getting information out of the Go runtime
■What worked, what didn’t
■Ongoing work
■A bit about Go runtime observability
■Understanding the existing ecosystem (pprof, etc.)
■Going beyond pprof
The experiment
The experiment
■Wanted to understand how the GC works as a Go newbie
■Found an animation of the Go GC’s tri-color abstraction, but wanted to try
something out with real data
■Goal: build a tiny real-time GC viewer for Go (for fun)
■Started exploring the Go runtime to get more ideas
■Worked on a small prototype
GC theory
A high-level overview of the Go GC
■Go employs a concurrent, tri-color, mark-and-sweep GC
●Two actors: mutators and collectors
●Mutators are responsible for allocations/mutations in the heap; collectors work on finding and
freeing garbage
●Two phases: mark and sweep
●Mark phase identifies accessible objects and marks them as active/in-use; sweep phase finds
unmarked objects and determines them as garbage
A high-level overview of the Go GC (..contd)
■What’s “tri-color”?
●The GC separates objects into three different colored sets: white, grey, black
●Black = object has been scanned completely and its descendants have been identified by the
collector
●Grey = roots discovered by the collector, but descendants not identified yet
●White = hasn’t discovered by the collector yet; may or may not be reachable from the roots
Mark and sweep phases w/ tri-color abstraction
Getting information out of the
Go runtime
Getting information out of the Go runtime
■What information do I want?
●At a high-level: “objects”
■Anything which would help me in getting an object graph representation
■Where to get this information from?
●Option 1: runtime package?
●Option 2: Through existing Go tooling
●Option 3: ??
Option 1: The runtime package
■The runtime package is a great place to get information about the Go
runtime
■One candidate worth checking out is MemStats, a structure which keeps
track of statistics related to heap allocs, GC times, etc.
■MemStats is neat, but it provides memory allocator statistics as (mostly)
numeric data
●Could be useful in most tools, but not suitable for this experiment
●Spoiler: The MemStats documentation points to the right direction, but I realized it later
Option 2: Existing Go tooling
■viewcore is a command-line tool for exploring core dumps of a Go process
provided under golang.org/x/debug/cmd/viewcore
■However, viewcore has a few issues with parsing heap dumps and panics at a
few areas (hasn’t been updated for the Go 1.22 heap layout)
■During my experimentation, viewcore broke at a lot of places
■The Go team is working on fixing these issues; Proposal: #57447
Option 3: Heap dumps
■Go provides a way to get a heap dump using debug.WriteHeapDump
■The heap dump is a sequence of records; Go writes 17 types of records
(object, itab, etc.) to the heap dump
■A heap dump is useful for extracting “objects”
■Possibly the only correct option for this project; a heap dump provides a view
into the program’s memory at a point in time
Other options
■Forking Go
●Tempting, but the runtime moves a high velocity
●Lots of tweaks need to be made to extract information out neatly
●I wanted to have the visualization happen on normal Go installations too
Implementation
Implementation
■Read heap dumps generated by debug.WriteHeapDump
■Parse the heap dump based on the documented format
●Wrote a small library for reading Go heap dumps: https://github.com/burntcarrot/heaputil
■Generate an object graph by utilizing records parsed from the heap dump
■In-progress: Color nodes in the object graph and animate using multiple heap
dumps captured at intervals
Implementation
Implementation
What’s left to do?
What’s left?
■Try out gocore (golang.org/x/debug/internal/gocore) once stabilized
■Use delve as a means to retrieve type information
Ongoing work
(outside of my efforts)
Ongoing work
■https://github.com/golang/go/issues/57447
●Proposal for making x/debug/internal/gocore and x/debug/internal/core to work
with Go at tip.
●There has been some progress made by the Go team
■https://github.com/cloudwego/goref
●Heap analysis tool from ByteDance, uses Delve as a medium to retrieve type info
■https://github.com/adamroach/heapspurs
●Provides a small toolkit to work with Go heap dumps
I failed..
I failed, but..
I failed, but..
I wanted to learn more about Go runtime observability!
Go runtime observability
The components
■The Go runtime can be broken down into these “components”:
●GMP
■G = goroutines
■M = threads
■P = processors
●Scheduler
●Garbage Collector
●netpoller
●Concurrency primitives (channels, etc.)
●etc.
What’s available right now
■pprof
●CPU
●Memory
●Goroutine
●Block
●Mutex
■Execution tracing
■GC traces
■and much more..
pprof
CPU
■Helps in identifying which parts consume a lot of CPU time.
■Can be enabled through:
●pprof.StartCPUProfile(w)
●import _ "net/http/pprof" (GET /debug/pprof/profile?seconds=60)
■Observable parts: All Go code (spending time on CPU)
■Non-observable parts: I/O
Memory
■Helps in identifying which parts perform more heap allocations.
■Can be enabled through:
●pprof.StartMemProfile(w)
●import _ "net/http/pprof" (GET /debug/pprof/allocs?seconds=60)
■Observable parts: Reachable objects
■Non-observable parts: CGO allocs, Object lifetime, unreachable objects
Goroutines
■Used for getting information about active goroutines
■Can be enabled through:
●pprof.Lookup("goroutine").WriteTo(w, 0)
●import _ "net/http/pprof" (GET /debug/pprof/goroutine)
■Observable parts: All goroutines present in the internal allg struct
■Non-observable parts: Short-lived goroutines (~1μs)
Block profile
■The block profiler helps in identifying time spent by goroutines waiting for
channel operations, mutex operations, etc.
■Can be enabled through:
●pprof.Lookup("block").WriteTo(w, 0)
■Need to set runtime.SetBlockProfileRate(rate) before profiling
●import _ "net/http/pprof" (GET /debug/pprof/block?seconds=60)
■Observable parts: Synchronization primitives (channels, mutexes, etc.)
■Non-observable parts: in-progress blocking events, I/O, sleep, etc.
Mutex profile
■Useful for identifying time spent on lock contentions
■Can be enabled through:
●pprof.Lookup("mutex").WriteTo(w, 0)
■Need to set runtime.SetMutexProfileFraction(rate) before
profiling
●import _ "net/http/pprof" (GET /debug/pprof/mutex)
■Observable parts: Mutexes (sync.Mutex/sync.RWMutex), etc.
■Non-observable parts: Spinning (locks), sync.WaitGroup, sync.Cond, etc.
Going beyond pprof
But why?
■pprof should be able to provide a sufficient/clear picture of what’s happening
in your application
■In case you find yourself stuck in a situation where pprof can’t help, then the
next slides might help
■The last option would be to fork the Go source and add manual debug logs or
print statements; but it’s unlikely that you’d reach this point
Execution tracer
■Helpful for finding information about voids observed in normal profiling
■Captures a ton of information including:
●Goroutine events (creation, blocking, etc.)
●GC events (STW, mark assist, etc.)
●Syscalls
●Etc.
■Can be enabled through: trace.Start(w); defer trace.Stop()
■Reduced overhead (#60773) (20-30% prior to Go 1.21, post Go 1.21 overhead = 1-2%)
■Alternative frontend: https://github.com/dominikh/gotraceui
■#60773; kudos to the Go team and contributors!
https://go.dev/blog/execution-traces-2024
Trace view (with flow events enabled)
Profiles provided by the execution tracer
Goroutine analysis
Goroutine analysis (summary)
Goroutine analysis (breakdown view)
GC traces
■Helpful for getting granular information GC cycles
■Slightly less overhead than the execution tracer
■Can be enabled through: GODEBUG=gctrace=1
■You can also get information about the GC pacer:
GODEBUG=gctrace=1,gcpacertrace=1
GC trace
GC trace breakdown
GC trace breakdown
GC trace breakdown
GC trace breakdown
GC trace breakdown
GC trace breakdown
GC trace breakdown
GC trace breakdown
Scheduler traces
■Helpful for getting granular data about the scheduler like:
●Processor information (idle Ps, etc.)
●Thread information (idle threads, spinning threads, etc.)
●Global and local runqueue for Gs (goroutines)
●etc.
■Slightly less overhead than the execution tracer
■Can be enabled through: GODEBUG=schedtrace=1000
●Emits information every 1000 milliseconds, usage: GODEBUG=schedtrace=X, where X is
duration
Scheduler trace
Scheduler trace (breakdown)
Scheduler trace (breakdown)
Scheduler trace (breakdown)
Scheduler trace (breakdown)
fgprof
■Sampling Go profiler which captures information about both on-CPU and
off-CPU time.
■CPU profiling provided by pprof can’t observe off-CPU time
■Useful for observing off-CPU time (I/O, etc.)
■Limitations:
●fgprof is implemented as a background goroutine, so it relies on the Go scheduler, and any
scheduler delays might result in slightly inaccurate data
●Scales O(N) with number of goroutines, slightly less efficient on goroutine-heavy applications
Even further..
runtime/metrics
■Source for programatically getting metrics exported by the Go runtime
■88 metrics exposed (Go 1.23.2), 56 (excluding /godebug metrics)
■Open proposal to define a recommended set of metrics: #67120
Open proposal to define a recommended set of metrics:
#67120
Metrics exposed by runtime/metrics (as of Go 1.23.2)
Future opportunities
Future opportunities
■eBPF is pretty promising, and some tooling utilizing it could aid in more Go
runtime observability
■gocore and core packages once made public could be really useful too.
Conclusion
Conclusion
■Ending up in a rabbit hole is a great way to learn a lot of things!
■Go has a great set of tooling, and provides some really nice ways to
understand more about the Go runtime
■The contributors and the Go team deserve a huge kudos for supporting these
tools!
Further reading
Further reading
■Go docs:
●runtime/pprof: https://pkg.go.dev/runtime/pprof
●runtime/metrics: https://pkg.go.dev/runtime/metrics
●runtime/trace: https://pkg.go.dev/runtime/trace
●Go environment variables: https://pkg.go.dev/runtime#hdr-Environment_Variables
■Felix Geisendörfer’s Go Profiler Notes:
https://github.com/DataDog/go-profiler-notes
Thank you!
Email: [email protected]
Twitter: @carrotburnt
Bluesky: [email protected]
Website: https://databases.systems
●I’ll post a more detailed version here!