Rust + io_uring + ktls: How Fast Can We Make HTTP?

A ScyllaDB Community
Rust, io_uring, ktls:
How fast can we make HTTP?
Amos Wenger
writer, video producer, cat owner
bearcove

A ScyllaDB Community
Nobody in the Rust space is going
far enough with io_uring
(as far as I'm aware)
Amos Wenger
writer, video producer, cat owner
bearcove

Amos Wenger (they/them) aka @fasterthanlime

writer, video producer, cat owner
■Wrote "Making our own executable packer"
■Teaching Rust since 2019 with Cool Bear
■Fan of TLS (thread-local storage & the other one)
bearcove

Deﬁne "HTTP"

Deﬁne "fast"

Rust HTTP is already fast

hyper on master is ?????? v1.4.1 via ?????? v1.80.1
❯ gl --color=always | tail -5
Commit: 886551681629de812a87555bb4ecd41515e4dee6
Author: Sean McArthur <[email protected]>
Date: 2014-08-30 14:18:28 -0700 (10 years ago)

init

HTTP/1.1 200 OK
Date: Fri, 31 Dec 1999 23:59:59 GMT
Content-Type: text/html
Content-Length: 1354

/// An HTTP status code (`status-code` in RFC 9110 et al.).
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash )]
pub struct StatusCode(NonZeroU16);

mystery winner > itoa stack > itoa heap > std::fmt
criterion bench: format_status_code, avg µs

// A string of packed 3-ASCII-digit status code
// values for the supported range of [100, 999]
// (900 codes, 2700 bytes).
const CODE_DIGITS: &str = "\
100101102103104105106107108109110\
✂ ✂ ✂
989990991992993994995996997998999";

We're not bickering over
assembly anymore

My hypothesis

●spectre, meltdown, etc => mitigations
●mitigations => more expensive syscalls
●more expensive syscalls => io_uring

Type systems are hard

fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>

Lifetimes exist in every language

Rust merely explicits them

fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>

blocking
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>

fn poll_read(
&mut self,
buf: &mut [u8],
) -> Poll<Result<usize>>
evented (O_NONBLOCK)

blocking
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>

fn poll_read(
&mut self,
cx: &mut Context<'_>,
buf: &mut [u8],
) -> Poll<Result<usize>>
evented (O_NONBLOCK)

blocking
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
fn poll_read(
&mut self,
cx: &mut Context<'_>,
buf: &mut ReadBuf<'_>,
) -> Poll<Result<usize>>
evented (O_NONBLOCK)

blocking
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
fn poll_read(
&mut self,
cx: &mut Context<'_>,
buf: &mut ReadBuf<'_>,
) -> Poll<Result<()>>
evented (O_NONBLOCK)

blocking
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
fn poll_read(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
buf: &mut ReadBuf<'_>,
) -> Poll<Result<()>>
evented (O_NONBLOCK)

blocking
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
fn poll_read(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
buf: &mut ReadBuf<'_>,
) -> Poll<Result<()>>
evented (O_NONBLOCK)
fn read(
&mut self,
buf: &mut [u8]
) -> Read<'_, Self>
where Self: Unpin

async

blocking
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
fn read(
&mut self,
buf: &mut [u8]
) -> Read<'_, Self>
where Self: Unpin
async

blocking
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
async fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
where Self: Unpin { ... }
async

async fn mhh(mut s: TcpStream) -> io::Result<Vec<u8>> {
let mut buf = vec![0u8; 4];
s.read_exact(&mut buf).await?;
Ok(buf)
}

async stack trace

read(&mut [u8])
read_exact(&mut [u8])
mhhh()

// (not shown: tokio runtime internals)

real stack trace

Read::poll(Pin<&mut Read>, &mut Context<'_>)
ReadExact::poll(Pin<&mut ReadExact>, &mut Context<'_>)
Mhh::poll(Pin<&mut Mhh>, &mut Context<'_>)

// (not shown: tokio runtime internals)

async fn mhh(mut s: TcpStream) -> io::Result<Vec<u8>> {
let mut buf = vec![0u8; 4];
tokio::select! {
result = s.read_exact(&mut buf) => {
result?;
Ok(buf)
}
_ = sleep(Duration::from_secs(1)) => {
Err(timeout_err())
}
}
}

rio::Uring
pub fn recv<'a, Fd, Buf>(
&'a self,
stream: &'a Fd,
iov: &'a Buf
) -> Completion<'a, usize>

rio::Uring
impl<'a, C: FromCqe> Drop
for Completion<'a, C> {
fn drop(&mut self) {
self.wait_inner();
}
}

let mut buf = vec![0u8; 4];
let mut read_fut = Box::pin(s.read_exact(&mut buf));

tokio::select! {
_ = &mut read_fut => { todo!() }
_ = sleep(Duration::from_secs(1)) => {
std::mem::forget(read_fut);
Err(timeout_err())
}
}

tokio_uring::net::TcpStream
async fn read(&self, buf: T) -> (T, Result<usize>)
where T: BoundedBufMut;

Fine, I'll rewrite everything
on top of io-uring then.

docs.rs/loona

load testing is hard
■macOS = nice for dev, useless for perf
■P-states
■love your noisy neighbors
■stats are hard (coordinated omission etc.)

the plan?
■Intel(R) Xeon(R) CPU E3-1275 v6 @ 3.80GHz
■h2load from another dedicated server
■16 clients virtual clients, max 100 streams per
client
■python for automation (running commands over
SSH, CSV => XLS etc.)
■perf for counting cycles, instructions, branches

what's next?
●more/better benchmarks
●…on hardware from this decade
●proxying to HTTP/1.1, serving from disk
●messing with: allocators, buffer size
●io_uring: provided buffers, multishot accept/read
●move off of tokio entirely (no atomics needed, no "write to
unpark thread" needed)

how do we make that happen?
●money donations
●hardware donations
●expertise donations
●did I mention money

Thank you! Let’s connect.
Amos Wenger
[email protected]
@fasterthanlime
https://fasterthanli.me
bearcove

Rust + io_uring + ktls: How Fast Can We Make HTTP?

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Rust + io_uring + ktls: How Fast Can We Make HTTP?

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 42

Slide 43

Slide 45

Slide 46

Slide 49

Slide 50

Slide 51

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......