.NET has LOADS of options for executing code concurrently. Except, sometimes it’s a little confusing what to use - threads, tasks or async/await? What about Reactive Extensions, Dataflow or Channels? Wait, what even is Dataflow? I’ve never heard of Channels! And now we’ve got async enumerable ...
.NET has LOADS of options for executing code concurrently. Except, sometimes it’s a little confusing what to use - threads, tasks or async/await? What about Reactive Extensions, Dataflow or Channels? Wait, what even is Dataflow? I’ve never heard of Channels! And now we’ve got async enumerable as well?! And what’s the deal with Interlocked, SemaphoreSlim, and friends?! It’s all getting a bit much. In this talk, we’ll take a look at all the different ways you can do concurrent programming with C#, see how they work and when you’d choose one approach over another.
Size: 3.82 MB
Language: en
Added: Oct 09, 2024
Slides: 43 pages
Slide Content
I See Deadlocks!
Matt Ellis : JetBrains
Let’s talk async
Async programming is doing stuff
while other stuff is happening
Concurrency vs Parallelism
Concurrency:
doing stuff at the same time as other stuff
Parallelism:
doing stuff at the same time as other stuff
Concurrency vs Parallelism
Concurrency:
Managing multiple tasks at the same time
Parallelism:
Executing multiple tasks at the same time
Threads
Threads
•Basic unit of parallel execution, managed by the OS
•Create a new thread to run code in parallel
Runs a function, then exits
•Expensive resource
Slow to create, significant memory requirements for stack
•Blocking APIs can lead to running out of threads
Inefficient to have expensive threads sitting doing nothing
Mitigations
•Thread pool to reuse existing threads
Requires scheduling some kind of work item
•Async APIs to release thread execution
Requires registering a work item to be called back when API completes
•Needs a new abstraction
Tasks
Task Parallel Library
Explicit API
•A Task represents an asynchronous operation that can return a value
•Tasks queued to the thread pool
•Helper methods to wait (sync/async) for one or more tasks to complete
Returning a Task!
•Chaining tasks with continuations
Use different schedulers to post to other threads (UI)
•Exceptions propagated back to the waiter
•Cooperative cancellation with tokens
Green Threads
Threads in .NET are expensive
Tasks are much cheaper, but different model
Threads in .NET are an abstraction
Why do they map to OS threads?
Green Threads
•Abstraction that represents a thread, but separate from OS
•Scheduled in user mode
•Runs on one platform thread - n:1 (typically)
•Significantly cheaper - can create tens of thousands
•Cooperative - have to yield control
If the platform thread blocks, all green threads are blocked
•Java 1.1 > 1.3 only had green threads
To support threading on platforms with no native multithreading
Virtual Threads
Java 19/21 (aka Loom)
•Like virtual memory, but for threads
•Similar to green threads, but run on multiple platform threads - m:n
•Blocking calls park the thread and make an async call
Requires underlying runtime API changes
•Very simple programming style
Create a virtual thread per task. No thread pools
•Not just Java
Google Chrome has virtual threads, see also Go’s coroutines
Are Virtual Threads a Silver Bullet?
It depends
•Write in a very simple, sequential style
•Requires support across all traditionally blocking APIs
•Good for tasks that are IO bound, not CPU bound
•Doesn’t mix well with other async programming styles
•Unclear on memory implications
Each thread is cheaper, but that means you create more
•No control over scheduling (virtual threads don’t have OS priority, etc.)
•Doesn’t help composing tasks
Virtual Threads and .NET
dotnet/runtimelab#2398
•CLR team experimented with green threads in .NET 8
•Implemented it - proved it was viable
•Performance was slower, but competitive. Unoptimised
•Significant unresolved technical challenges
E.g. byref, ref struct, sync-async interop, …
•Too many runtime APIs need updating
Some are async only!
•Experiment parked
async/await
Language Support for Asynchronous Programming
•Enables writing in (almost) sequential style
•Compiler rewrites async method to state machine
•Async methods can await another async method
•Built on top of TPL
•Registers continuation with Task from async method
Callback restarts state machine with the current state
•Nice programming model
Downsides
•Viral - what colour is your function?
•Cancellation is explicit
•No support for composing tasks
public static async void Handle(Request request, Response response )
{
var url1 = ...
var url2 = ...
try
{
var task1 = FetchUrl(url1);
var task2 = FetchUrl(url2);
var result = await Task.WhenAll(task1, task2);
response.Send(result[ 0] + result[1]);
}
catch (Exception e)
{
response.fail(e);
}
}
private static async Task<string> FetchUrl(string imageUrl)
{
return await client.GetStringAsync(imageUrl);
}
async2
Async2 Experiment
dotnet/runtime#94620
•After looking at green threads, starting investigating async2
(.NET 9 timeframe)
•Async state machine is implemented in C#
•What if it was part of the runtime?
•Why? Performance!
Async2 Experiment
dotnet/runtime#94620
•Let the runtime or the JIT capture program execution state
Rather than explicit application state in C#
•Capture stack, registers, current instruction pointer
Restore state to resume
•Handles GC + async1-async2 interop
•Two experiments: VM and JIT
•Can do a lot more with assembly than C#…
0.00%
200.00%
400.00%
600.00%
800.00%
1000.00%
1200.00%
1 2 4 8 16
iters/s async2 divided by iters/s Task base async
Stack Depth
Relative performance of existing async vs runtime async
Return NoSuspendThrow NoSuspendReturn SuspendThrow Suspend
Async2 Experiment
dotnet/runtime#94620
Structured Concurrency
Nathianel J Smith, 2018
vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/
Notes on structured concurrency, or:
Go statement considered harmful
Goto statement considered harmful
Structured Programming
Dijkstra
•Goto statement considered harmful
•Cannot reason about your program if you don’t know where it will jump to
•Introduce block structures - conditions, loops, subroutines
•Control flow is always sequential, nested, scoped
•Benefits:
•Automatic resource cleanup
•Error handling - throw exceptions up the call stack
Structured Concurrency
•Whenever control splits into concurrent paths, make sure they join
•Scoped concurrency
Wait for all paths to complete
•If one path fails, all paths fail
Structured Concurrency
Benefits
•Resource cleanup
Known lifetime of paths means known usage of resources
•Error handling
Exceptions propagate up (logical) call stack - tree of nested calls
•Cancellation
If one path fails, all paths fail
Nathianel J Smith, 2018
vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/
Notes on structured concurrency, or:
Go statement considered harmful
C# does not have structured concurrency *
Structured Concurrency
in Kotlin
•What if outer scope is cancelled? Images continue to load
•What if one image fails? Second image continues to load
elizarov.medium.com/structured-concurrency-722d765aa952
•All async tasks must be called inside a coroutine scope
•Scope waits for children to complete
•If the scope is cancelled, all child coroutines are cancelled
•If one child fails, all children are cancelled
•Global scope for long running tasks - make it explicit
elizarov.medium.com/structured-concurrency-722d765aa952
C# does not have structured concurrency
But there are alternatives and alternative constructs
github.com/StephenCleary/StructuredConcurrency
•Create tasks inside task groups
•Waiting for the group waits for all children
•Group provides cancellation token
•Can register resources for cleanup
•Only works with your own groups
Producer/Consumer
Producer/Consumer
•Common pattern
•Data comes in, requires processing
List of files, load and process
•Buffer data until consumer requires it
Backpressure
•Don’t produce more than you can consume
Buffering can run out of memory
•Cheap to load a file, expensive to process
Throttle loading to allow processing to catch up
•Strategies:
•Wait until space in the queue
•Drop items from queue
•Allow more consumers to run in parallel
System.Threading.Channels
Separate NuGet package
•Surprisingly simple API for producer/consumer
•Write to the channel to produce items
•Read from the channel to consume
•No built in support for pipelines
•Options to create bounded/unbounded or enable optimisations for single
threaded access
System.Threading.Channels
Producer
•Producer writes to the channel
•Calls TryWrite to write synchronously, but might fail if buffer is full
•WriteAsync will wait (asynchronously) for space in buffer
•WaitToWriteAsync will wait until space is in buffer without writing
Useful if expensive to produce data
•Methods are thread safe for multiple producers
var postPathsChannel = Channel.CreateUnbounded< string>(new UnboundedChannelOptions() {
SingleReader = false,
SingleWriter = true
});
var postPaths = Directory.GetFiles(Constants.PostsDirectory);
foreach (var postPath in postPaths)
{
await postPathsChannel.Writer.WriteAsync(postPath);
}
postPathsChannel.Writer.TryComplete();
System.Threading.Channels
Consumer
•Consumer reads from the channel
•Calls TryRead to read synchronously, but might fail if no data
•ReadAsync will wait (asynchronously) for data from producer
•WaitToReadAsync will wait until data is available
•ReadAllAsync will return IAsyncEnumerable
•Methods are thread safe for parallel consumption
var frontMatterChannel = Channel.CreateBounded<( string?, FrontMatter?)>(
new BoundedChannelOptions( 10) { SingleReader = false, SingleWriter = false });
while (await postPathsChannel.Reader.WaitToReadAsync())
{
while (postPathsChannel.Reader.TryRead( out var postPath))
{
var frontMatter = await generator.ReadFrontMatterAsync(postPath);
await frontMatterChannel.Writer.WriteAsync((postPath, frontMatter));
}
}
frontMatterChannel.Writer.TryComplete();
Observable Streams
Reactive Extensions
IObservable/IObserver
•Subscribe to stream of events
•Filter, combine and transform data like LINQ and enumerables
Where, Select, SelectMany, etc.
•Dual of IEnumerable - events are pushed to the consumer
Can be a little confusing as to when things happen
•Producer/consumer, but no built in support for back pressure
•Fine grained control of the pipeline
Reactive Extensions
•Events can come from any thread
•Register callbacks with different schedulers
E.g. dispatch to UI thread, queue to enable backpressure
•Friction with async methods
Need to convert async method’s Task into an Observable
•Composing operators is very powerful, but risks getting confusing
Significantly different model to other methods
•Complex API. No help from the language
Async Sequences
IAsyncEnumerable
Non-blocking infinite sequences
•Normal IEnumerable is blocking
•How to return a sequence of items that are asynchronously produced?
•Use await foreach to asynchronously enumerate results
•Use async method with yield statements to write
•Supports cancellation
•Integrates better/simpler with the ecosystem than observables
•System.Linq.Async package provides LINQ extension methods
static async IAsyncEnumerable<int> RangeAsync(int start, int count )
{
for (int i = 0; i < count; i++)
{
await Task.Delay(i);
yield return start + i;
}
}
await foreach (int item in RangeAsync(10, 3))
Console.Write(item + " "); Prints 10 11 12
Dataflow
learn.microsoft.com/en-us/dotnet/standard/parallel-programming/dataflow-task-parallel-library
Promote actor based programming by
providing in-process message passing for
coarse grained dataflow and pipelining tasks
System.Threading.Tasks.Dataflow
In-process message processing
•Not new. Part of .NET Framework 4.5 (2012)
•Part of/builds on the TPL
Separate NuGet package
•Uses “blocks” to buffer and process data
•Link blocks to build a pipeline
Blocks
Blocks, blocks, blocks
•Data structure to produce, receive or transform data
Source, target, propagator
•Each block has a buffer to help with back pressure
•Optional degree of parallelisation
•Predefined blocks
BufferBlock, ActionBlock, TransformBlock, …
•BufferBlock is essentially a producer/consumer channel
Can be significantly slower than Channels, but doing more work
And has built in parallelisation
Coarse Grained Concurrency
Build a pipeline
•Split pipeline into discrete tasks/blocks
E.g. read file, process image, upload data
•Individual tasks have different performance characteristics
CPU/IO-bound, short-lived, etc.
•Apply different buffers for back pressure
•Parallelise tasks/blocks for coarse grained concurrency
•Requires understanding tasks and expected data load
var downloadString = new TransformBlock<string, string>(async uri =>
{
return await new HttpClient(new HttpClientHandler).GetStringAsync(uri);
});
var createWordList = new TransformBlock<string, string[]>(text =>
{
char[] tokens = text.Select(c => char.IsLetter(c) ? c : ' ').ToArray();
text = new string(tokens);
return text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
});
var filterWordList = new TransformBlock<string[], string[]>(words =>
{
return words
.Where(word => word.Length > 3)
.Distinct()
.ToArray();
});
var printReversedWords = new ActionBlock<string>(reversedWord =>
{
Console.WriteLine( "Found reversed words {0}/{1}" ,
reversedWord, new string(reversedWord.Reverse().ToArray()));
});
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
downloadString.LinkTo(createWordList, linkOptions);
createWordList.LinkTo(filterWordList, linkOptions);
filterWordList.LinkTo(findReversedWords, linkOptions);
findReversedWords.LinkTo(printReversedWords, linkOptions)
Process "The Iliad of Homer" by Homer.
downloadString.Post( "http:www.gutenberg.org/cache/epub/16452/pg16452.txt" );
What have we learnt?
What have we learnt?
•TPL is great, but very explicit
•Async/await is a nice model
But there are limitations
•Structured concurrency would be nice
•Other models can fill the gaps
Producer/consumer, streams, dataflow