Performance Pitfalls of Rust Async Function Pointers (And Why It Might Not Matter)

ScyllaDB 205 views 18 slides Oct 14, 2024
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

An in-depth analysis of asynchronous function pointers in Rust, why they aren't a real thing (compared to normal function pointers) and a performance analysis of each way of constructing them. From Boxed Async functions, to Enum dispatch to StackFutures.


Slide Content

A ScyllaDB Community
Performance Pitfalls of Rust Async
Function Pointers (And Why It
Might Not Matter)
Byron Wasti
Founder of Balter Load Testing

Byron Wasti (he/him)

Founder of Balter Load Testing
■Programming with Rust for 7+ years, 4+
professionally
■Focused on building robust high-performance,
low-latency systems
■Developer of Open Source load testing framework
for Rust called Balter
(github.com/BalterLoadTesting/balter)

Motivation: Building a Load Testing Framework
■Ability to run a user provided function repeatedly and in parallel

// Balter code
pub fn load_test(user_func: fn()) {
loop {
user_func();
}
}

// User Code
fn main() {
balter::load_test(my_load_test_scenario);
}

fn my_load_test_scenario() {
...
}

Function Pointers
■Any function with the same
signature

■Run the function in multiple threads

■Send the function to other machines
(its just a pointer)
pub fn load_test(user_func: fn()) {
for _ in 0..THREADS {
thread::spawn(|| {
loop {
user_func();
}
});
}
}

load_test(my_func_a);
load_test(my_func_b);
load_test(my_func_c);

Async Function Pointers
For IO bound tasks (e.g. HTTP requests), async promises better performance

pub async fn load_test(user_func: async fn()) {
for _ in 0..TASKS {
tokio::spawn(|| async {
loop {
user_func().await;
}
});
}
}
async fn foo() {
}

load_test(foo).await;

Async Function Pointers
For IO bound tasks (e.g. HTTP requests), async promises better performance

pub async fn load_test(user_func: async fn()) {
for _ in 0..TASKS {
tokio::spawn(|| async {
loop {
user_func().await;
}
});
}
}
async fn foo() {
}

load_test(foo).await;

Async Functions in Rust
■Desugar into normal functions returning `impl Future<Output=?>`
■The compiler auto-generates an opaque type for the `impl Trait`
async fn foo() -> i32 {
}

async fn bar() -> i32 {
}

// Compiler error!
let arr: [fn() -> impl Future<Output=i32>] = [foo, bar];

Type-Erased Async Function Pointers
■Common workaround is to use `Box::pin()`
fn foo() -> Pin<Box<dyn Future<Output=i32>>> {
Box::pin(async {
// Our usual async code
})
}

fn bar() -> Pin<Box<dyn Future<Output=i32>>> {
Box::pin(async {
// Our usual async code
})
}

// This works now!
let arr = [foo, bar];

Performance Characteristics
use std::hint::black_box;

fn main() {
load_test(black_box(foo));
}

fn load_test(func: fn(i32) -> i32) {
for i in 0..250_000_000 {
let _res = func(i);
}
}

fn foo(arg: i32) -> i32 {
black_box(arg * 2)
}

Performance Characteristics

Time (mean ± σ) Range (min … max)
Function Pointer 429.1 ms ± 7.0 ms 418.9 ms … 436.7 ms
Boxed Function Pointer 537.9 ms ± 2.5 ms 536.1 ms … 544.0 ms
Async Function 407.6 ms ± 3.6 ms 403.7 ms … 411.6 ms
Boxed Async Function 4.985 s ± 0.090 s 4.922 s … 5.198 s
Source: https://github.com/byronwasti/async-fn-pointer-perf

What is (Probably) Going On?


■Boxed Async Functions are an order of magnitude slower than boxed
functions

■Heap allocation for async functions includes the opaque state-machine Struct
the compiler generates
●A normal boxed function is just… a pointer on the heap

Alternative 1: `Box::Pin()` at the Edge
■Make use of Generics to have one `Box::pin()` call.


async fn load_test<T, F>(func: T)
where T: Fn() -> F,
F: Future<Output=i32>,
{
loop {
func().await;
}
}

async fn foo() -> i32 {
}

Let arr = [Box::pin(load_test(foo)), Box::pin(load_test(bar))];

Performance Characteristics
Time (mean ± σ) Range (min … max)
Function Pointer 429.1 ms ± 7.0 ms 418.9 ms … 436.7 ms
Boxed Function Pointer 537.9 ms ± 2.5 ms 536.1 ms … 544.0 ms
Async Function 407.6 ms ± 3.6 ms 403.7 ms … 411.6 ms
Boxed Async Function 4.985 s ± 0.090 s 4.922 s … 5.198 s
Generic Async Boxed 318.1 ms ± 1.2 ms 317.1 ms … 320.9 ms
Source: https://github.com/byronwasti/async-fn-pointer-perf

Alternative 2: Use an Enum
async fn load_test(func: Func)
{
loop {
func.run().await;
}
}

async fn foo() -> i32 {
}

async fn bar() -> i32 {
}
enum Func {
Foo,
Bar,
}

impl Func {
async fn run(&self) -> i32 {
match self {
Func::Foo => foo().await,
Func::Bar => bar().await,
}
}
}

Performance Characteristics
Time (mean ± σ) Range (min … max)
Function Pointer 429.1 ms ± 7.0 ms 418.9 ms … 436.7 ms
Boxed Function Pointer 537.9 ms ± 2.5 ms 536.1 ms … 544.0 ms
Async Function 407.6 ms ± 3.6 ms 403.7 ms … 411.6 ms
Boxed Async Function 4.985 s ± 0.090 s 4.922 s … 5.198 s
Generic Async Boxed 318.1 ms ± 1.2 ms 317.1 ms … 320.9 ms
Async Enum Dispatch 526.5 ms ± 0.8 ms 525.6 ms … 528.1 ms
Source: https://github.com/byronwasti/async-fn-pointer-perf

Alternative 3: Reset the Future
■Used by the Tower
rate-limiting
functionality [1]

■Unfortunately no
generic way to
implement
pub struct RateLimit {
...
sleep: Pin<Box<Sleep>>,
}

impl RateLimit {
fn call() {
...
// The service is disabled until further notice
// Reset the sleep future in place, so that we
don't have to
// deallocate the existing box and allocate a
new one.
self.sleep.as_mut().reset(until);
}
}
[1] https://docs.rs/tower/latest/src/tower/limit/rate/service.rs.html#106-109

Implementing In Practice
■Converted Balter to use Generics (pushing the Box::pin() to the edge)

■Saw no performance difference

■Functions calls are ridiculously fast, a 10x slowdown is… still really fast

■There is a Storage RFC for Rust which may add new options in the future

Thank you!
Byron Wasti
[email protected]
www.byronwasti.com
github.com/byronwasti
Tags