I see deadlocks : Matt Ellis - Techorama NL 2024

citizenmatt 209 views 43 slides Oct 09, 2024
Slide 1
Slide 1 of 76
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76

About This Presentation

.NET has LOADS of options for executing code concurrently. Except, sometimes it’s a little confusing what to use - threads, tasks or async/await? What about Reactive Extensions, Dataflow or Channels? Wait, what even is Dataflow? I’ve never heard of Channels! And now we’ve got async enumerable ...


Slide Content

I See Deadlocks!
Matt Ellis : JetBrains

Let’s talk async

Async programming is doing stuff
while other stuff is happening

Concurrency vs Parallelism

Concurrency:
doing stuff at the same time as other stuff

Parallelism:
doing stuff at the same time as other stuff

Concurrency vs Parallelism

Concurrency:
Managing multiple tasks at the same time
Parallelism:
Executing multiple tasks at the same time

Threads

Threads
•Basic unit of parallel execution, managed by the OS
•Create a new thread to run code in parallel
Runs a function, then exits
•Expensive resource
Slow to create, significant memory requirements for stack
•Blocking APIs can lead to running out of threads
Inefficient to have expensive threads sitting doing nothing

Mitigations
•Thread pool to reuse existing threads
Requires scheduling some kind of work item
•Async APIs to release thread execution
Requires registering a work item to be called back when API completes
•Needs a new abstraction

Tasks

Task Parallel Library
Explicit API
•A Task represents an asynchronous operation that can return a value
•Tasks queued to the thread pool
•Helper methods to wait (sync/async) for one or more tasks to complete
Returning a Task!
•Chaining tasks with continuations
Use different schedulers to post to other threads (UI)
•Exceptions propagated back to the waiter
•Cooperative cancellation with tokens

Green Threads

Threads in .NET are expensive
Tasks are much cheaper, but different model

Threads in .NET are an abstraction
Why do they map to OS threads?

Green Threads
•Abstraction that represents a thread, but separate from OS
•Scheduled in user mode
•Runs on one platform thread - n:1 (typically)
•Significantly cheaper - can create tens of thousands
•Cooperative - have to yield control
If the platform thread blocks, all green threads are blocked
•Java 1.1 > 1.3 only had green threads
To support threading on platforms with no native multithreading

Virtual Threads
Java 19/21 (aka Loom)
•Like virtual memory, but for threads
•Similar to green threads, but run on multiple platform threads - m:n
•Blocking calls park the thread and make an async call
Requires underlying runtime API changes
•Very simple programming style
Create a virtual thread per task. No thread pools
•Not just Java
Google Chrome has virtual threads, see also Go’s coroutines

Are Virtual Threads a Silver Bullet?
It depends
•Write in a very simple, sequential style
•Requires support across all traditionally blocking APIs
•Good for tasks that are IO bound, not CPU bound
•Doesn’t mix well with other async programming styles
•Unclear on memory implications
Each thread is cheaper, but that means you create more
•No control over scheduling (virtual threads don’t have OS priority, etc.)
•Doesn’t help composing tasks

void handle(Request request, Response response) {
var url1 = ...
var url2 = ...
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
var future1 = executor.submit(() -> fetchURL(url1));
var future2 = executor.submit(() -> fetchURL(url2));
response.send(future1.get() + future2.get());
} catch (ExecutionException | InterruptedException e) {
response.fail(e);
}
}
String fetchURL(URL url) throws IOException {
try (var in = url.openStream()) {
return new String(in.readAllBytes(), StandardCharsets.UTF_8);
}
}

Virtual Threads and .NET
dotnet/runtimelab#2398
•CLR team experimented with green threads in .NET 8
•Implemented it - proved it was viable
•Performance was slower, but competitive. Unoptimised
•Significant unresolved technical challenges
E.g. byref, ref struct, sync-async interop, …
•Too many runtime APIs need updating
Some are async only!
•Experiment parked

async/await

Language Support for Asynchronous Programming
•Enables writing in (almost) sequential style
•Compiler rewrites async method to state machine
•Async methods can await another async method
•Built on top of TPL
•Registers continuation with Task from async method
Callback restarts state machine with the current state
•Nice programming model

Downsides
•Viral - what colour is your function?
•Cancellation is explicit
•No support for composing tasks

public static async void Handle(Request request, Response response )
{
var url1 = ...
var url2 = ...
try
{
var task1 = FetchUrl(url1);
var task2 = FetchUrl(url2);
var result = await Task.WhenAll(task1, task2);
response.Send(result[ 0] + result[1]);
}
catch (Exception e)
{
response.fail(e);
}
}
private static async Task<string> FetchUrl(string imageUrl)
{
return await client.GetStringAsync(imageUrl);
}

async2

Async2 Experiment
dotnet/runtime#94620
•After looking at green threads, starting investigating async2
(.NET 9 timeframe)
•Async state machine is implemented in C#
•What if it was part of the runtime?
•Why? Performance!

Async2 Experiment
dotnet/runtime#94620
•Let the runtime or the JIT capture program execution state
Rather than explicit application state in C#
•Capture stack, registers, current instruction pointer
Restore state to resume
•Handles GC + async1-async2 interop
•Two experiments: VM and JIT
•Can do a lot more with assembly than C#…

0.00%
200.00%
400.00%
600.00%
800.00%
1000.00%
1200.00%
1 2 4 8 16
iters/s async2 divided by iters/s Task base async
Stack Depth
Relative performance of existing async vs runtime async
Return NoSuspendThrow NoSuspendReturn SuspendThrow Suspend

Async2 Experiment
dotnet/runtime#94620

Structured Concurrency

Nathianel J Smith, 2018
vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/
Notes on structured concurrency, or:
Go statement considered harmful

Goto statement considered harmful

Structured Programming
Dijkstra
•Goto statement considered harmful
•Cannot reason about your program if you don’t know where it will jump to
•Introduce block structures - conditions, loops, subroutines
•Control flow is always sequential, nested, scoped
•Benefits:
•Automatic resource cleanup
•Error handling - throw exceptions up the call stack

Structured Concurrency
•Whenever control splits into concurrent paths, make sure they join
•Scoped concurrency
Wait for all paths to complete
•If one path fails, all paths fail

Structured Concurrency
Benefits
•Resource cleanup
Known lifetime of paths means known usage of resources
•Error handling
Exceptions propagate up (logical) call stack - tree of nested calls
•Cancellation
If one path fails, all paths fail

Nathianel J Smith, 2018
vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/
Notes on structured concurrency, or:
Go statement considered harmful

C# does not have structured concurrency *

Structured Concurrency
in Kotlin

•What if outer scope is cancelled? Images continue to load
•What if one image fails? Second image continues to load
elizarov.medium.com/structured-concurrency-722d765aa952

•All async tasks must be called inside a coroutine scope
•Scope waits for children to complete
•If the scope is cancelled, all child coroutines are cancelled
•If one child fails, all children are cancelled
•Global scope for long running tasks - make it explicit
elizarov.medium.com/structured-concurrency-722d765aa952

C# does not have structured concurrency
But there are alternatives and alternative constructs

github.com/StephenCleary/StructuredConcurrency
•Create tasks inside task groups
•Waiting for the group waits for all children
•Group provides cancellation token
•Can register resources for cleanup
•Only works with your own groups

Producer/Consumer

Producer/Consumer
•Common pattern
•Data comes in, requires processing
List of files, load and process
•Buffer data until consumer requires it

Backpressure
•Don’t produce more than you can consume
Buffering can run out of memory
•Cheap to load a file, expensive to process
Throttle loading to allow processing to catch up
•Strategies:
•Wait until space in the queue
•Drop items from queue
•Allow more consumers to run in parallel

System.Threading.Channels
Separate NuGet package
•Surprisingly simple API for producer/consumer
•Write to the channel to produce items
•Read from the channel to consume
•No built in support for pipelines
•Options to create bounded/unbounded or enable optimisations for single
threaded access

System.Threading.Channels
Producer
•Producer writes to the channel
•Calls TryWrite to write synchronously, but might fail if buffer is full
•WriteAsync will wait (asynchronously) for space in buffer
•WaitToWriteAsync will wait until space is in buffer without writing
Useful if expensive to produce data
•Methods are thread safe for multiple producers

var postPathsChannel = Channel.CreateUnbounded< string>(new UnboundedChannelOptions() {
SingleReader = false,
SingleWriter = true
});
var postPaths = Directory.GetFiles(Constants.PostsDirectory);
foreach (var postPath in postPaths)
{
await postPathsChannel.Writer.WriteAsync(postPath);
}
postPathsChannel.Writer.TryComplete();

System.Threading.Channels
Consumer
•Consumer reads from the channel
•Calls TryRead to read synchronously, but might fail if no data
•ReadAsync will wait (asynchronously) for data from producer
•WaitToReadAsync will wait until data is available
•ReadAllAsync will return IAsyncEnumerable
•Methods are thread safe for parallel consumption

var frontMatterChannel = Channel.CreateBounded<( string?, FrontMatter?)>(
new BoundedChannelOptions( 10) { SingleReader = false, SingleWriter = false });
while (await postPathsChannel.Reader.WaitToReadAsync())
{
while (postPathsChannel.Reader.TryRead( out var postPath))
{
var frontMatter = await generator.ReadFrontMatterAsync(postPath);
await frontMatterChannel.Writer.WriteAsync((postPath, frontMatter));
}
}
frontMatterChannel.Writer.TryComplete();

Observable Streams

Reactive Extensions
IObservable/IObserver
•Subscribe to stream of events
•Filter, combine and transform data like LINQ and enumerables
Where, Select, SelectMany, etc.
•Dual of IEnumerable - events are pushed to the consumer
Can be a little confusing as to when things happen
•Producer/consumer, but no built in support for back pressure
•Fine grained control of the pipeline

Reactive Extensions
•Events can come from any thread
•Register callbacks with different schedulers
E.g. dispatch to UI thread, queue to enable backpressure
•Friction with async methods
Need to convert async method’s Task into an Observable
•Composing operators is very powerful, but risks getting confusing
Significantly different model to other methods
•Complex API. No help from the language

Async Sequences

IAsyncEnumerable
Non-blocking infinite sequences
•Normal IEnumerable is blocking
•How to return a sequence of items that are asynchronously produced?
•Use await foreach to asynchronously enumerate results
•Use async method with yield statements to write
•Supports cancellation
•Integrates better/simpler with the ecosystem than observables
•System.Linq.Async package provides LINQ extension methods

static async IAsyncEnumerable<int> RangeAsync(int start, int count )
{
  for (int i = 0; i < count; i++)
  {
    await Task.Delay(i);
    yield return start + i;
  }
}
await foreach (int item in RangeAsync(10, 3))
  Console.Write(item + " "); Prints 10 11 12

Dataflow

learn.microsoft.com/en-us/dotnet/standard/parallel-programming/dataflow-task-parallel-library
Promote actor based programming by
providing in-process message passing for
coarse grained dataflow and pipelining tasks

System.Threading.Tasks.Dataflow
In-process message processing
•Not new. Part of .NET Framework 4.5 (2012)
•Part of/builds on the TPL
Separate NuGet package
•Uses “blocks” to buffer and process data
•Link blocks to build a pipeline

Blocks
Blocks, blocks, blocks
•Data structure to produce, receive or transform data
Source, target, propagator
•Each block has a buffer to help with back pressure
•Optional degree of parallelisation
•Predefined blocks
BufferBlock, ActionBlock, TransformBlock, …
•BufferBlock is essentially a producer/consumer channel
Can be significantly slower than Channels, but doing more work
And has built in parallelisation

Coarse Grained Concurrency
Build a pipeline
•Split pipeline into discrete tasks/blocks
E.g. read file, process image, upload data
•Individual tasks have different performance characteristics
CPU/IO-bound, short-lived, etc.
•Apply different buffers for back pressure
•Parallelise tasks/blocks for coarse grained concurrency
•Requires understanding tasks and expected data load

var downloadString = new TransformBlock<string, string>(async uri =>
{
return await new HttpClient(new HttpClientHandler).GetStringAsync(uri);
});
var createWordList = new TransformBlock<string, string[]>(text =>
{
char[] tokens = text.Select(c => char.IsLetter(c) ? c : ' ').ToArray();
text = new string(tokens);
return text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
});
var filterWordList = new TransformBlock<string[], string[]>(words =>
{
return words
.Where(word => word.Length > 3)
.Distinct()
.ToArray();
});
var printReversedWords = new ActionBlock<string>(reversedWord =>
{
Console.WriteLine( "Found reversed words {0}/{1}" ,
reversedWord, new string(reversedWord.Reverse().ToArray()));
});

var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
downloadString.LinkTo(createWordList, linkOptions);
createWordList.LinkTo(filterWordList, linkOptions);
filterWordList.LinkTo(findReversedWords, linkOptions);
findReversedWords.LinkTo(printReversedWords, linkOptions)
Process "The Iliad of Homer" by Homer.
downloadString.Post( "http:www.gutenberg.org/cache/epub/16452/pg16452.txt" );

What have we learnt?

What have we learnt?
•TPL is great, but very explicit
•Async/await is a nice model
But there are limitations
•Structured concurrency would be nice
•Other models can fill the gaps
Producer/consumer, streams, dataflow

Thanks for listening
Tags