How a Failed Experiment Helped Me Understand the Go Runtime in More Depth

A ScyllaDB Community
How a failed experiment helped me
understand the Go runtime in more
depth
Aadhav Vignesh
Software Engineer

Aadhav Vignesh (he/him)

Software Engineer
■Love everything about Go! ~1.5 years of experience
with the language
■Interested in distributed systems, databases and
performance
■I work on random experiments for curiosity and fun!
■Love watching sports, and playing video games

Agenda
■The experiment
■Getting information out of the Go runtime
■What worked, what didn’t
■Ongoing work
■A bit about Go runtime observability
■Understanding the existing ecosystem (pprof, etc.)
■Going beyond pprof

The experiment

The experiment
■Wanted to understand how the GC works as a Go newbie
■Found an animation of the Go GC’s tri-color abstraction, but wanted to try
something out with real data
■Goal: build a tiny real-time GC viewer for Go (for fun)
■Started exploring the Go runtime to get more ideas
■Worked on a small prototype

GC theory

A high-level overview of the Go GC
■Go employs a concurrent, tri-color, mark-and-sweep GC
●Two actors: mutators and collectors
●Mutators are responsible for allocations/mutations in the heap; collectors work on ﬁnding and
freeing garbage
●Two phases: mark and sweep
●Mark phase identiﬁes accessible objects and marks them as active/in-use; sweep phase ﬁnds
unmarked objects and determines them as garbage

A high-level overview of the Go GC (..contd)
■What’s “tri-color”?
●The GC separates objects into three different colored sets: white, grey, black
●Black = object has been scanned completely and its descendants have been identiﬁed by the
collector
●Grey = roots discovered by the collector, but descendants not identiﬁed yet
●White = hasn’t discovered by the collector yet; may or may not be reachable from the roots

Mark and sweep phases w/ tri-color abstraction

Getting information out of the
Go runtime

Getting information out of the Go runtime
■What information do I want?
●At a high-level: “objects”
■Anything which would help me in getting an object graph representation
■Where to get this information from?
●Option 1: runtime package?
●Option 2: Through existing Go tooling
●Option 3: ??

Option 1: The runtime package
■The runtime package is a great place to get information about the Go
runtime
■One candidate worth checking out is MemStats, a structure which keeps
track of statistics related to heap allocs, GC times, etc.
■MemStats is neat, but it provides memory allocator statistics as (mostly)
numeric data
●Could be useful in most tools, but not suitable for this experiment
●Spoiler: The MemStats documentation points to the right direction, but I realized it later

Option 2: Existing Go tooling
■viewcore is a command-line tool for exploring core dumps of a Go process
provided under golang.org/x/debug/cmd/viewcore
■However, viewcore has a few issues with parsing heap dumps and panics at a
few areas (hasn’t been updated for the Go 1.22 heap layout)
■During my experimentation, viewcore broke at a lot of places
■The Go team is working on ﬁxing these issues; Proposal: #57447

Option 3: Heap dumps
■Go provides a way to get a heap dump using debug.WriteHeapDump
■The heap dump is a sequence of records; Go writes 17 types of records
(object, itab, etc.) to the heap dump
■A heap dump is useful for extracting “objects”
■Possibly the only correct option for this project; a heap dump provides a view
into the program’s memory at a point in time

Other options
■Forking Go
●Tempting, but the runtime moves a high velocity
●Lots of tweaks need to be made to extract information out neatly
●I wanted to have the visualization happen on normal Go installations too

Implementation

Implementation
■Read heap dumps generated by debug.WriteHeapDump
■Parse the heap dump based on the documented format
●Wrote a small library for reading Go heap dumps: https://github.com/burntcarrot/heaputil
■Generate an object graph by utilizing records parsed from the heap dump
■In-progress: Color nodes in the object graph and animate using multiple heap
dumps captured at intervals

Implementation

What’s left to do?

What’s left?
■Try out gocore (golang.org/x/debug/internal/gocore) once stabilized
■Use delve as a means to retrieve type information

Ongoing work
(outside of my efforts)

Ongoing work
■https://github.com/golang/go/issues/57447
●Proposal for making x/debug/internal/gocore and x/debug/internal/core to work
with Go at tip.
●There has been some progress made by the Go team
■https://github.com/cloudwego/goref
●Heap analysis tool from ByteDance, uses Delve as a medium to retrieve type info
■https://github.com/adamroach/heapspurs
●Provides a small toolkit to work with Go heap dumps

I failed..

I failed, but..

I failed, but..
I wanted to learn more about Go runtime observability!

Go runtime observability

The components
■The Go runtime can be broken down into these “components”:
●GMP
■G = goroutines
■M = threads
■P = processors
●Scheduler
●Garbage Collector
●netpoller
●Concurrency primitives (channels, etc.)
●etc.

What’s available right now
■pprof
●CPU
●Memory
●Goroutine
●Block
●Mutex
■Execution tracing
■GC traces
■and much more..

pprof

CPU
■Helps in identifying which parts consume a lot of CPU time.
■Can be enabled through:
●pprof.StartCPUProfile(w)
●import _ "net/http/pprof" (GET /debug/pprof/profile?seconds=60)
■Observable parts: All Go code (spending time on CPU)
■Non-observable parts: I/O

Memory
■Helps in identifying which parts perform more heap allocations.
■Can be enabled through:
●pprof.StartMemProfile(w)
●import _ "net/http/pprof" (GET /debug/pprof/allocs?seconds=60)
■Observable parts: Reachable objects
■Non-observable parts: CGO allocs, Object lifetime, unreachable objects

Goroutines
■Used for getting information about active goroutines
■Can be enabled through:
●pprof.Lookup("goroutine").WriteTo(w, 0)
●import _ "net/http/pprof" (GET /debug/pprof/goroutine)
■Observable parts: All goroutines present in the internal allg struct
■Non-observable parts: Short-lived goroutines (~1μs)

Block proﬁle
■The block proﬁler helps in identifying time spent by goroutines waiting for
channel operations, mutex operations, etc.
■Can be enabled through:
●pprof.Lookup("block").WriteTo(w, 0)
■Need to set runtime.SetBlockProfileRate(rate) before profiling
●import _ "net/http/pprof" (GET /debug/pprof/block?seconds=60)
■Observable parts: Synchronization primitives (channels, mutexes, etc.)
■Non-observable parts: in-progress blocking events, I/O, sleep, etc.

Mutex proﬁle
■Useful for identifying time spent on lock contentions
■Can be enabled through:
●pprof.Lookup("mutex").WriteTo(w, 0)
■Need to set runtime.SetMutexProfileFraction(rate) before
profiling
●import _ "net/http/pprof" (GET /debug/pprof/mutex)
■Observable parts: Mutexes (sync.Mutex/sync.RWMutex), etc.
■Non-observable parts: Spinning (locks), sync.WaitGroup, sync.Cond, etc.

Going beyond pprof

But why?
■pprof should be able to provide a suﬃcient/clear picture of what’s happening
in your application
■In case you ﬁnd yourself stuck in a situation where pprof can’t help, then the
next slides might help
■The last option would be to fork the Go source and add manual debug logs or
print statements; but it’s unlikely that you’d reach this point

Execution tracer
■Helpful for ﬁnding information about voids observed in normal proﬁling
■Captures a ton of information including:
●Goroutine events (creation, blocking, etc.)
●GC events (STW, mark assist, etc.)
●Syscalls
●Etc.
■Can be enabled through: trace.Start(w); defer trace.Stop()
■Reduced overhead (#60773) (20-30% prior to Go 1.21, post Go 1.21 overhead = 1-2%)
■Alternative frontend: https://github.com/dominikh/gotraceui

■#60773; kudos to the Go team and contributors!

https://go.dev/blog/execution-traces-2024

Trace view (with ﬂow events enabled)

Proﬁles provided by the execution tracer

Goroutine analysis

Goroutine analysis (summary)

Goroutine analysis (breakdown view)

GC traces
■Helpful for getting granular information GC cycles
■Slightly less overhead than the execution tracer
■Can be enabled through: GODEBUG=gctrace=1
■You can also get information about the GC pacer:
GODEBUG=gctrace=1,gcpacertrace=1

GC trace

GC trace breakdown

Scheduler traces
■Helpful for getting granular data about the scheduler like:
●Processor information (idle Ps, etc.)
●Thread information (idle threads, spinning threads, etc.)
●Global and local runqueue for Gs (goroutines)
●etc.
■Slightly less overhead than the execution tracer
■Can be enabled through: GODEBUG=schedtrace=1000
●Emits information every 1000 milliseconds, usage: GODEBUG=schedtrace=X, where X is
duration

Scheduler trace

Scheduler trace (breakdown)

fgprof
■Sampling Go proﬁler which captures information about both on-CPU and
off-CPU time.
■CPU proﬁling provided by pprof can’t observe off-CPU time
■Useful for observing off-CPU time (I/O, etc.)
■Limitations:
●fgprof is implemented as a background goroutine, so it relies on the Go scheduler, and any
scheduler delays might result in slightly inaccurate data
●Scales O(N) with number of goroutines, slightly less eﬃcient on goroutine-heavy applications

Even further..

runtime/metrics
■Source for programatically getting metrics exported by the Go runtime
■88 metrics exposed (Go 1.23.2), 56 (excluding /godebug metrics)
■Open proposal to deﬁne a recommended set of metrics: #67120

Open proposal to deﬁne a recommended set of metrics:
#67120

Metrics exposed by runtime/metrics (as of Go 1.23.2)

Future opportunities

Future opportunities
■eBPF is pretty promising, and some tooling utilizing it could aid in more Go
runtime observability
■gocore and core packages once made public could be really useful too.

Conclusion

Conclusion
■Ending up in a rabbit hole is a great way to learn a lot of things!
■Go has a great set of tooling, and provides some really nice ways to
understand more about the Go runtime
■The contributors and the Go team deserve a huge kudos for supporting these
tools!

How a Failed Experiment Helped Me Understand the Go Runtime in More Depth

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

How a Failed Experiment Helped Me Understand the Go Runtime in More Depth

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Tags

Categories

Download

Quick Actions