A Deterministic Walk Down TigerBeetle’s main() Street

ScyllaDB 257 views 17 slides Jun 24, 2024
Slide 1
Slide 1 of 17
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17

About This Presentation

Learn how to use Zig to implement a fully deterministic distributed system which will never fail with an out of memory error, for predictable performance and 700x faster tests!


Slide Content

A Deterministic Walk Down TigerBeetle’s main() Street Aleksei Kladov Staff Software Engineer at TigerBeetle

Context TigerBeetle: Fast & small database for financial accounting Narrow domain, no custom DB schema yet Focus on: Reliability Performance Operator experience

Context Written in a NASA power-of-ten style! Major theme: Deterministic Execution Productivity (all bugs are reproducible bugs) Reduced performance variability / tail latencies Higher throughput – no resource management overhead

How To Determinism? Control IO! Use discrete event simulation! Do something about time! This talk — the rest of proverbial owl! Lot’s of code, sorry, going for nitty-gritty here!

fn main pub fn main() !void { var io = try IO.init(128, 0); var storage = try Storage.init(& io , fd); var message_bus = try MessageBus.init( & io , addresses, ); var replica = Replcia.open(.{ .storage = &storage, .message_bus = &message_bus, }); } Put IO in a struct! https://github.com/tigerbeetle/tigerbeetle/blob/main/src/tigerbeetle/main.zig

fn main pub fn main() !void { var io = try IO.init(128, 0); var storage = try Storage .init(& io , fd); var message_bus = try MessageBus .init( & io , addresses, ); var replica = Replcia.open(.{ .storage = & storage , .message_bus = & message_bus , }); } Mock IO OR anything that uses IO

Beating Heart / Ticking Clock pub fn main() !void { while (true) { try io.run_for_ns( constants.tick_ms * std.time.ns_per_ms ); replica. tick (); } }

Beating Heart / Ticking Clock pub fn main() !void { while (true) { try io.run_for_ns( constants.tick_ms * std.time.ns_per_ms ); replica. tick (); } } Don’t schedule timers, run at constant FPS ! Be DooM

Beating Heart / Ticking Clock pub fn main() !void { while (true) { try io.run_for_ns( constants.tick_ms * std.time.ns_per_ms ); replica.tick(); } } Write your own event loop

Simulated cluster while (true) { cluster.network. tick() ; for (cluster.storages) |*storage| storage. tick() ; for (cluster.clients) |*client| client. tick() ; for (cluster.replicas) |*replica| replica. tick() ; } Simulate IO Tick whole cluster

Simulated cluster while (true) { cluster.network.tick(); for (cluster.storages) |*storage| storage.tick(); for (cluster.clients) |*client| client.tick(); for (cluster.replicas) |*replica| replica.tick(); } // FPS = ∞ Simulate IO Tick whole cluster At max cpu speed !

From Time to Space const ClientSessions = struct { sessions: SessionsHashMap , fn init( gpa : mem.Allocator) !ClientSessions fn put(self: *ClientSessions, header: Header) }; HashMap for storing client info Need to allocate

From Time to Space fn init( gpa : mem.Allocator) ! ClientSessions { try sessions.ensureTotalCapacity( gpa , @intCast(u32, constants. clients_max ), ); } fn put(self: *ClientSessions, header: Header) { self.sessions . getOrPutAssumeCapacity (header.client) } Unbundle the allocator Needs fixed bounds , Provides backpressure Zig API is ❤️

And Back to Time io .read( where, how_much, callback, ); A loose end: IO invokes the callback “later” Who stores it? Where’s the queue?

From Time to Space c onst ClientReplies = struct { reads: IOPS( Read , constants.client_replies_iops_read_max, ) = .{}, writes: IOPS( Write , constants.client_replies_iops_write_max, ) = .{}, }; IOPS are statically allocated IO arranges them into an intrusive linked list

Recap Time! Controlling Time : Create IO object Bring your own event loop Interleave loops for simulation Run at constant FPS for timeouts Controlling Space : An upper bound for everything Allocate only during startup Intrusive collections avoid a single shared bound

Thank you! Let’s connect: https://slack.tigerbeetle.com/invite Aleksei Kladov Staff Software Engineer at TigerBeetle
Tags