The Whippet Embeddable Garbage Collection Library

igalia 18 views 36 slides Mar 11, 2025
Slide 1
Slide 1 of 36
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36

About This Presentation

Whippet is a minimal, embed-only, highly parallel, pure-C garbage collection
library, designed to replace Guile's use of the Boehm-Demers-Weiser collector,
but designed also to be usable by other languages that might appreciate a
zero-dependency, state-of-the-art upgrade to their memory manager....


Slide Content

Whippet: A practical
memory management
upgrade for Guile &
beyond

2 February 2025 - FOSDEM ’25
Andy Wingo

Igalia, S.L.

Agenda # The big idea
« The results
# The future

The big idea Whippet is a practical memory
management upgrade for Guile &
beyond

A practical
memory
management
upgrade for
Guile &
beyond

struct gc_options *options = NULL;
struct gc_stack_addr *stack_base = NULL;
struct gc_heap «heap;

struct gc_mutator *mut;

void *event_listener_data = NULL;

gc_init(options, stack_addr, &heap, Smut,
GC_NULL_EVENT_LISTENER, event_listener_data);

void *obj = gc_allocate(mut, 42);

options = gc_allocate_options();
gc_options_parse_and_set_many(options,
getenv("GC_OPTIONS"))

struct gc_mutator_roots roots; // Embedder-defined
gc_mutator_set_roots(mut, &roots);

// For generational configurations
gc_write_barrier(mut, obj, obj_size, edge, new_val);

// For cooperative safepoints
gc_safepoint(mut);

// For collectors that don't require copying
gc_pin_object(mut, ref);

static inline void
gc_trace_object(struct gc_ref ref,
void (*visit)(struct gc_edge edge,
struct gc_heap «heap,
void *visit_data),
struct gc_heap «heap,
void *trace_data,
size_t *size) { /* ... */ }

static inline void
gc_trace_mutator_roots(struct gc_mutator_roots «roots,
void (*trace_edge)(struct gc_edge edge,
struct gc_heap *heap
void *trace_data),
struct gc_heap «heap,
void *trace_data) { /* ... */ }

A practical
memory
management
upgrade for
Guile &
beyond

Whippet: An Performance: Bump-pointer allocation,

upgrade better parallelism
relative to Features: Ephemerons and finalizers
BDW-GC that work

Behavior: Choice of workload-
appropriate collectors

Memory use: Compaction, adaptive
heap sizing (membalancer)

Whippet: An
upgrade with
a migration
path

Bridge, n.: Construction with two ends
and a path in between

Whippet: a GC library with compile-
time abstraction over embedder needs
and collector construction

Collector variants: MMC, PCC, BDW

MMC collector has optional
conservative tracing

* stack roots
® global roots
* (optionally) intra-heap edges

MMC Mostly-marking collector
MMC = nofl space + lospace

Nofl space

“No free-list”

For objects less than 8192 bytes
Bump-pointer allocation
Excellent parallelism

Mostly-marking (Immix-derived),
occasionally compacting

12% overhead

Pinning (transitively due to
conservative roots, or permanently)

Optionally generational (sticky-mark)

Lospace “Large object space”
mmap allocation, freelist, deferred
release
Optionally generational (sticky-mark)

Whippet: An Language run-times often get stuck
upgrade with their GC

relative to Whippet’s compile-time API
bespoke GCs abstraction enables evolution

PCC Parallel copying collector
PCC = copy space + lospace

Copy space

For objects less than 8192 bytes
Bump-pointer allocation
Excellent parallelism

Always compacting

100% overhead

Generational
PCC

Generational PCC = copy space + copy
space + lospace

2 MB nursery per processor active in
last cycle

1 survivor cycle for copy space, O for
lospace

Field-logging write barrier

Nursery memory range aligned, can be
quick XOR check

Still has 100% overhead

BDW

Boehm-Demers-Weiser collector
Shim behind Whippet API

Different safepoint behavior: not
cooperative

No support for gc_trace_object
Not great parallelism
Higher memory overhead than MMC

A practical
memory
management
upgrade for
Guile &
beyond

Embed-only

No dependencies
C11

Hackable

Practical
testbench:
Whiffle

Scheme-to-C compiler: https://
github. com/wingo/whiffle

« Ensure Whippet offers appropriate
API for embedders

« Allow more test cases to be written
before moving to Guile

* Handles vs stack maps

Main motivation was testing; shook
out many bugs

A practical
memory
management
upgrade for
Guile &
beyond

The pivot:
æ Whippet API, but BDW collector

# MMC collector, with conservative
roots

* Generational MMC collector (write
barriers)

# Evacuating nursery?

Shout-out to NLnet foundation for
helping us with this work!

A practical
memory
management
upgrade for
Guile &
beyond

WebAssembly+GC-to-C: Enable
standalone Guile compilation via Hoot ?

Ocaml, R, etc...

Results: Strict throughput improvements: 20-
What do we 40%

win with Access to smaller heap sizes: 30-50%
Whippet?

9.6

8.8

7.2

6.4

3.2

2.4

16

0.8

nboyer.scm-5 throughput, one mutator

Ml bdw
M mmc
M pce

1123451719 22 25 3 4
Heap size multiplier

4.4

3.6

3.2

Time (s)

N

16

12

0.8

0.4

peval.scm-12-1 throughput, one mutator

123451719 22 25.
Heap size multiplier

Ml bdw
mmc
M pce

Time (s)

30
28
26
24
22
20
18
16
14
12

oN BO ©

nboyer.scm-5 throughput, eight mutators

11231451719 22 25.
Heap size multiplier

ME bdw
Bi mmc
M pce

9.6

8.8

7.2

Time (s)
>» u
© o

>

3.2

2.4

16

0.8

peval.scm-12-1 throughput, eight mutators

Ml bdw
mmc
M pce

1123451719 22 25 3 4
Heap size multiplier

Results: Conservative root-finding is OK
What do we Generational GC is complicated, more
learn with tuning needed

Whippet?

Time (s)

nboyer.scm-5 throughput, eight mutators

11.2.3451719 2.2 5
Heap size multiplier

M heap-conservative
M precise
I stack-conservative

Time (s)

10

peval.scm-12-1 throughput, eight mutators

1123451719 22 25
Heap size multiplier

M heap-conservative
M precise
I stack-conservative

9.6

8.8

TR

6.4

3.2

2.4

1.6

0.8

nboyer.scm-5 throughput, eight mutators

1.1.2.3.4.5 1.719 2.2 5
Heap size multiplier

EI mmc

I mmc-generational
pcc

M pcc-generational

7.2
6.8
6.4

5.6
5.2
48
44

Time (s)

3.6
32
2.8
2.4

1.6
1.2
0.8
0.4

peval.scm-12-1 throughput, eight mutators

EI mmc

I mmc-generational
pcc

M pcc-generational

12.3451719 22 25 3 4
Heap size multiplier

Future

Guile, finally. This month!!!
Your language run-time?
Concurrent marking

LXR-inspired reference counting of old
generation?

Try it out!

https://github. com/wingo/whippet
https://github.com/wingo/whiffle
https://nlnet.nl/project/Whippet/
wingodigalia.com

Thanks!

Attic

New since
2023

PCC, generational PCC, precise field-
logging write barriers instead of card
marking, better parallelism, bug fixes,
embeddability, finalizers, dynamic
heap sizing (membalancer), less VMM
traffic, whiffle, tests, nlnet, platform
abstraction, options interface, extern
space, stats, HDR histogram,
renamings, nofl more eager