Explaining Coroutines with Scala Native, Scala Days, Madrid, 2023.pptx

Virtuslab 14 views 54 slides Jun 17, 2024
Slide 1
Slide 1 of 54
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54

About This Presentation

Wojciech Mazur explains different types of coroutines, as presented at the Scala Days conference in Madrid.


Slide Content

Wojciech Mazur Explaining Different Coroutine Flavours using Scala Native

AboutMe.md Scala 3 Compiler Team at VirtusLab Scala Native R&D Open Community Build, maintenance of Scala 3 Cooperation with LAMP EPFL

All the tasty flavours of routines

Routines routine

Subroutines Caller Function return routine subroutine call

Coroutines routine subroutine coroutine

Coroutines routine subroutine coroutine stackless stackfull

Coroutines execution flow Suspend by return ing to caller/resumer Nested suspensions requires all callers/resumer to be a coroutine Introduces function coloring Used for generators, iterators, async/await, etc Available in Kotlin, Swift, C++, Rust and more Implemented by function transformation in the compiler Caller Coro call suspend resume suspend Stackless

Stackless coroutines transformation State Machine based coroutines

Stackless coroutines transformation Continuation Passing Style coroutine

Coroutines execution flow Suspend by context-switch to caller/resumer Can suspend at any point Does not introduce function coloring Used for fibers and exception handling Available in Go, Java Caller Coro 1 Coro/Func 2 Coro/Func N call call call suspend return return return resume Stackfull

Suspendable? I wanted concurrency!

Asynchronous coroutines Any coroutine can potentially be made asynchronous Coroutine scheduler can manage concurrent execution of coroutines On suspend coroutine can: be scheduled for automatic resumption specify next coroutine to resume

How they work?

Subroutines f(x) local variables temp = ??? params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0x8

Subroutines f(x) local variables temp = 9 params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0xF g(x) local variables params: x = 9 resumption address: f(x) + 0x18 pc = ???

Subroutines f(x) local variables temp = 9 params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0xF g(x) local variables params: x = 9 resumption address: f(x) + 0x18 pc = g(x) + 0x8 h (x) local variables params: x = 10 resumption address: g(x) + 0xF pc = ???

Subroutines f(x) local variables temp = 9 params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0xF g(x) local variables params: x = 9 resumption address: f(x) + 0x18 pc = g(x) + 0x8 h(x) local variables params: x = 10 resumption address: g(x) + 0xF pc = h(x) + 0x8

Subroutines f(x) local variables temp = 9 params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0xF g(x) local variables h.result = 20 params: x = 9 resumption address: f(x) + 0x18 pc = g(x) + 0xF

Subroutines f(x) local variables temp = 9 g.result = 20 params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0x10

Stackless coroutines f(x) local variables: temp = 9 coro = ??? res1 = ??? res2 = ??? res3 = ??? params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0x10

Stackless coroutines f(x) local variables: temp = 9 coro = ??? res1 = ??? res2 = ??? res3 = ??? params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0x10 g(x) local variables: tmp = 20 coroHandle = ??? params: x = 9 resumption address: f(x) + 0x18 pc = g(x) + 0x10

Stackless coroutines f(x) local variables: temp = 9 coro = ??? res1 = ??? res2 = ??? res3 = ??? params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0x10 g(x) local variables: tmp = 20 coroHandle = 0xffab1234 params: x = 9 resumption address: f(x) + 0x18 pc = g(x) + 0x10 Stack Heap Coroutine Frame g(x) 0xffab1234 resumptionAddress : g(x) + 0x18 localVariabl es : tmp = 20 x = 9 coroState = 42 promise : 20 registers: rax = 0x1234 rdx = 0x1234

Stackless coroutines f(x) local variables: temp = 9 coro = 0xffab1234 res1 = ??? res2 = ??? res3 = ??? params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0x10 Stack Heap Coroutine Frame g(x) 0xffab1234 resumptionAddress : g(x) + 0x18 localVariables : tmp = 20 x = 9 coroState = 42 promise : 20 registers: rax = 0x1234 rdx = 0x1234

Stackless coroutines f(x) local variables: temp = 9 coro = 0xffab1234 res1 = 20 res2 = ??? res3 = ??? params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0x10 Stack Heap Coroutine Frame g(x) 0xffab1234 resumptionAddress : g(x) + 0x18 localVariables : tmp = 20 x = 9 coroState = 42 promise : 20 registers: rax = 0x1234 rdx = 0x1234

Stackless coroutines f(x) local variables: temp = 9 coro = 0xffab1234 res1 = 20 res2 = ??? res3 = ??? params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0x10 Stack Heap Coroutine Frame g(x) 0xffab1234 resumptionAddress : g(x) + 0xF localVariables : tmp = 20 x = 9 coroState = 42 promise : 20 registers: rax = 0x1234 rdx = 0x1234 g(x) local variables: tmp = 20 coroHandle = 0xffab1234 params: x = 9 resumption address: f(x) + 0x18 pc = g(x) + 0x18

Stackless coroutines f(x) local variables: temp = 9 coro = 0xffab1234 res1 = 20 res2 = ??? res3 = ??? params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0x10 Stack Heap Coroutine Frame g(x) 0xffab1234 resumptionAddress : g(x) + 0xF localVariables : tmp = 20 x = 9 coroState = 42 promise : 20 registers: rax = 0x1234 rdx = 0x1234 g(x) local variables: tmp = 20 coroHandle = 0xffab1234 params: x = 9 resumption address: f(x) + 0x18 pc = g(x) + 0x18 h(x) local variables params: x = 20 resumption address: g(x) + 0x20 pc = h(x) + 0x8

Stackless coroutines f(x) local variables: temp = 9 coro = 0xffab1234 res1 = 20 res2 = ??? res3 = ??? params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0x10 Stack Heap Coroutine Frame g(x) 0xffab1234 resumptionAddress : g(x) + 0xF localVariables : tmp = 20 x = 9 coroState = 42 promise : 20 registers: rax = 0x1234 rdx = 0x1234 g(x) local variables: tmp = 20 coroHandle = 0xffab1234 h.result = 40 params: x = 9 resumption address: f(x) + 0x18 pc = g(x) + 0x20

Stackless coroutines f(x) local variables: temp = 9 coro = 0xffab1234 res1 = 20 res2 = ??? res3 = ??? params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0x10 Stack Heap Coroutine Frame g(x) 0xffab1234 resumptionAddress : g(x) + 0xF localVariables : tmp = 20 x = 9 coroState = 42 promise : 20 registers: rax = 0x1234 rdx = 0x1234 g(x) local variables: tmp = 20 coroHandle = 0xffab1234 h.result = 40 $1 = -40 params: x = 9 resumption address: f(x) + 0x18 pc = g(x) + 0x28

Stackless coroutines f(x) local variables: temp = 9 coro = 0xffab1234 res1 = 20 res2 = ??? res3 = ??? params: x = 3 resumption address: main() + 0x42 pc = f(x) + 0x10 Stack Heap Coroutine Frame g(x) 0xffab1234 resumptionAddress : g(x) + 0x38 localVariables : tmp = 20 x = 9 coroState = 37 promise : -40 registers: rax = 0x2456 rdx = 0x2456 g(x) local variables: tmp = 20 coroHandle = 0xffab1234 h.result = 40 $1 = -40 params: x = 9 resumption address: f(x) + 0x18 pc = g(x) + 0x30

Stackless coroutines Saves only the state of current activation frame Creates a snapshot of required data on the heap Ephemeral local data on the stack Heap allocation can be optimized out by the compiler

Fibers f(x) local variables: task = 0xffab1234 params: x = 4 resumption address: main() + 0x42 pc = f(x) + 0x100 Stack Heap 0xffab1234 Fiber Context: f$anonfun$0(x) resumptionAddress: anonfun$0 + 0x00 state = new promise : ??? stack = null registers: rsp =??? rbp = ??? rdx = ???

F i bers f(x) local variables: task = 0xffab1234 params: x = 4 resumption address: main() + 0x42 pc = f(x) + 0x100 Stack Heap 0xffab1234 Fiber Context: f$anonfun$0(x) resumptionAddress: anonfun$0 + 0x00 state = active promise : stack = 0xbadcafe registers: rsp = 0xbad123 rbp = 0x1234 rdx = 0xcafe42 f$anonfun$0(x) local variables: temp=8 g.result = ??? params: x = 4 resumption address: f() + 0x108 pc = f$anonfun$0 + 0x18 resumption address: f$ anonfun$0 () + 0x20 pc = g(x) + 0x30 g(x) resumption address: g(x) + 0x38 pc = anonfun$1 + 0x50 anonfun$1(x) resumption address: anonfun$1(x) + 0x58 pc = anonfun$2 + 0x40 anonfun$2(x)

Fibers f(x) local variables: task = 0xffab1234 x = 0 params: x = 4 resumption address: main() + 0x42 pc = f(x) + 0x100 Stack Heap 0xffab1234 Fiber Context: f$anonfun$0(x) resumptionAddress: anonfun$0 + 0x00 state = suspended promise : stack = 0xbadcafe registers: rsp = 0xbad123 rbp = 0x1234 rdx = 0xcafe42 Stack snapshot: 0xbadcafe f$anonfun$0(x) local variables: temp=8 g.result = ??? params: x = 4 resumption address: f() + 0x108 pc = f$anonfun$0 + 0x18 resumption address: f$ anonfun$0 () + 0x20 pc = g(x) + 0x30 g(x) resumption address: g(x) + 0x38 pc = anonfun$1 + 0x50 anonfun$1(x) resumption address: anonfun$1(x) + 0x58 pc = anonfun$2 + 0x40 anonfun$2(x)

Fibers f(x) local variables: task = 0xffab1234 x = 0 params: x = 4 resumption address: main() + 0x42 pc = f(x) + 0x100 Stack 0xffab1234 Fiber Context: f$anonfun$0(x) resumptionAddress: anonfun$0 + 0x00 state = active promise : stack = 0xbadcafe registers: rsp = 0xbad123 rbp = 0x1234 rdx = 0xcafe42 f$anonfun$0(x) local variables: temp=8 g.result = ??? params: x = 4 resumption address: f() + 0x108 pc = f$anonfun$0 + 0x18 resumption address: f$ anonfun$0 () + 0x20 pc = g(x) + 0x30 g(x) resumption address: g(x) + 0x38 pc = anonfun$1 + 0x50 anonfun$1(x) resumption address: anonfun$1(x) + 0x58 pc = anonfun$2 + 0x40 anonfun$2(x)

Fibers Possible stack provision: Eager fiber stack allocated on creation Lazily copied to the heap on suspend Typically 1-10x slower suspend/resume then stackless coroutines Still way faster then thread context-switch

Quick summary Stackfull Coroutines Fibers Context switching Suspension at any point Subroutines Ordinary function No special mechanism No suspension Stackless Coroutines Generators State machine / Continuation Passing Style Only top-level suspension

Coroutines a’la Scala Native

LLVM stackless coroutines Coroutines Lowering Switch-Resume Async Returned-Continuation

LLVM stackless coroutines

LLVM stackless coroutines Transformed into state machine by LLVM Body needs to be inlined Transformed to cleanup function Transformed into ramp function

LLVM stackless coroutines

LLVM stackless coroutines

LLVM stackless coroutines

LLVM stackless coroutines

LLVM stackless coroutines tinyurl.com/ stackless-coroutines

Fibers, threads and user context

Switching-context: assembly Non portable Different registers available depending on CPU architecture and OS Hard to maintain

Switching-context: ucontext_t POSIX specific Simple API Deprecated and removed in latest APIs

Switching-context: setjmp/longjmp Portable between Unix/Windows Can implement exception handling Not available on some platforms (WebAssembly)

Switching-context: setjmp/longjmp

boundary-suspend Snapshot of context for each handler and continuation Create snapshot of continuation before moving control to handler Restore snapshot before resumption Greatly simplified!

boundary-suspend tinyurl.com/ boundary -suspend

boundary-suspend tinyurl.com/ boundary-suspend