Coroutines execution flow Suspend by return ing to caller/resumer Nested suspensions requires all callers/resumer to be a coroutine Introduces function coloring Used for generators, iterators, async/await, etc Available in Kotlin, Swift, C++, Rust and more Implemented by function transformation in the compiler Caller Coro call suspend resume suspend Stackless
Stackless coroutines transformation State Machine based coroutines
Coroutines execution flow Suspend by context-switch to caller/resumer Can suspend at any point Does not introduce function coloring Used for fibers and exception handling Available in Go, Java Caller Coro 1 Coro/Func 2 Coro/Func N call call call suspend return return return resume Stackfull
Suspendable? I wanted concurrency!
Asynchronous coroutines Any coroutine can potentially be made asynchronous Coroutine scheduler can manage concurrent execution of coroutines On suspend coroutine can: be scheduled for automatic resumption specify next coroutine to resume
How they work?
Subroutines f(x) local variables temp = ??? params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0x8
Subroutines f(x) local variables temp = 9 params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0xF g(x) local variables params: x = 9 resumption address: f(x) + 0x18 pc = ???
Subroutines f(x) local variables temp = 9 params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0xF g(x) local variables params: x = 9 resumption address: f(x) + 0x18 pc = g(x) + 0x8 h (x) local variables params: x = 10 resumption address: g(x) + 0xF pc = ???
Subroutines f(x) local variables temp = 9 params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0xF g(x) local variables params: x = 9 resumption address: f(x) + 0x18 pc = g(x) + 0x8 h(x) local variables params: x = 10 resumption address: g(x) + 0xF pc = h(x) + 0x8
Subroutines f(x) local variables temp = 9 params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0xF g(x) local variables h.result = 20 params: x = 9 resumption address: f(x) + 0x18 pc = g(x) + 0xF
Subroutines f(x) local variables temp = 9 g.result = 20 params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0x10
Stackless coroutines Saves only the state of current activation frame Creates a snapshot of required data on the heap Ephemeral local data on the stack Heap allocation can be optimized out by the compiler
Fibers f(x) local variables: task = 0xffab1234 params: x = 4 resumption address: main() + 0x42 pc = f(x) + 0x100 Stack Heap 0xffab1234 Fiber Context: f$anonfun$0(x) resumptionAddress: anonfun$0 + 0x00 state = new promise : ??? stack = null registers: rsp =??? rbp = ??? rdx = ???
F i bers f(x) local variables: task = 0xffab1234 params: x = 4 resumption address: main() + 0x42 pc = f(x) + 0x100 Stack Heap 0xffab1234 Fiber Context: f$anonfun$0(x) resumptionAddress: anonfun$0 + 0x00 state = active promise : stack = 0xbadcafe registers: rsp = 0xbad123 rbp = 0x1234 rdx = 0xcafe42 f$anonfun$0(x) local variables: temp=8 g.result = ??? params: x = 4 resumption address: f() + 0x108 pc = f$anonfun$0 + 0x18 resumption address: f$ anonfun$0 () + 0x20 pc = g(x) + 0x30 g(x) resumption address: g(x) + 0x38 pc = anonfun$1 + 0x50 anonfun$1(x) resumption address: anonfun$1(x) + 0x58 pc = anonfun$2 + 0x40 anonfun$2(x)
Fibers f(x) local variables: task = 0xffab1234 x = 0 params: x = 4 resumption address: main() + 0x42 pc = f(x) + 0x100 Stack 0xffab1234 Fiber Context: f$anonfun$0(x) resumptionAddress: anonfun$0 + 0x00 state = active promise : stack = 0xbadcafe registers: rsp = 0xbad123 rbp = 0x1234 rdx = 0xcafe42 f$anonfun$0(x) local variables: temp=8 g.result = ??? params: x = 4 resumption address: f() + 0x108 pc = f$anonfun$0 + 0x18 resumption address: f$ anonfun$0 () + 0x20 pc = g(x) + 0x30 g(x) resumption address: g(x) + 0x38 pc = anonfun$1 + 0x50 anonfun$1(x) resumption address: anonfun$1(x) + 0x58 pc = anonfun$2 + 0x40 anonfun$2(x)
Fibers Possible stack provision: Eager fiber stack allocated on creation Lazily copied to the heap on suspend Typically 1-10x slower suspend/resume then stackless coroutines Still way faster then thread context-switch
Quick summary Stackfull Coroutines Fibers Context switching Suspension at any point Subroutines Ordinary function No special mechanism No suspension Stackless Coroutines Generators State machine / Continuation Passing Style Only top-level suspension
LLVM stackless coroutines Transformed into state machine by LLVM Body needs to be inlined Transformed to cleanup function Transformed into ramp function
Switching-context: assembly Non portable Different registers available depending on CPU architecture and OS Hard to maintain
Switching-context: ucontext_t POSIX specific Simple API Deprecated and removed in latest APIs
Switching-context: setjmp/longjmp Portable between Unix/Windows Can implement exception handling Not available on some platforms (WebAssembly)
Switching-context: setjmp/longjmp
boundary-suspend Snapshot of context for each handler and continuation Create snapshot of continuation before moving control to handler Restore snapshot before resumption Greatly simplified!