A Deep Dive into the Seastar Event Loop by Pavel Emelyanov

ScyllaDB 0 views 22 slides Oct 14, 2025

Slide 1 of 22

About This Presentation

The core and the basis of ScyllaDB's outstanding performance is the Seastar framework, and the core and the basis of seastar is its event loop. In this presentation, we'll see what the loop does in great detail, analyze the limitations that it runs in and all the consequences that follow tho...

Size: 1.03 MB

Language: en

Added: Oct 14, 2025

Slides: 22 pages

Slide Content

A ScyllaDB Community
A Deep Dive into Seastar's
Event Loop
Pavel Emelyanov
Engineer

Pavel Emelyanov

Engineer at ScyllaDB
■Linux containers
■Scylladb “storage team”
■Seastar

Agenda
■Seastar eventloop in a nutshell
■How loop shows itself
■Limitations and the consequences

Architecture at a glance
■One thread per core
●Threads are called “shards”
●Thread-pool thread is an exception
■As little communications between threads as possible
■Keeps Linux as far away as possible
●Networking
●AIO
●Initial memory mappings
●A bit more
Linux
Seastar
ScyllaDB

Main loop
■Runs everything in a loop
●Running tasks
●Kicking side activities
Run tasks
Poll

Running tasks
■Task == lambda function
■Queued per scheduling groups
■Running tasks
●Pick sched group with min vruntime
●Run tasks until needs to preempt
sched
group A
sched
group B
sched
group C Tasks

Polling
■Side activities
●Dispatch and pickup completed AIO
●Poll, send and receive network
●Serve cross-shard communication
●Execution stages
●Timers
Run tasks
Execute stages
Submit IO
Complete IO
Poll SMP
Flush sockets
Run timers

Leisure time
■Seastar can sleep
Run tasks
Execute stages
Submit IO
Complete IO
Poll SMP
Flush sockets
Run timers
Nothing
to do
Sleep until
next event

Reactor ways of debugging
■Linux tools
■Metrics
■Logs

Linux tools
■CPU is (almost) never idle
■strace: Lots of “unrelated” system calls
■RSS is close to 100%, so is VMEM size

Core reactor metrics
■Global and Per scheduling groups
■CPU, memory, IO, network
■Etc.
●SMP
●exceptions

CPU timing metrics
time
run tasks
idling (polling)
sleeping
reactor_cpu_busy_ms
reactor_awake_time_ms_total
reactor_sleep_time_ms_total

CPU timing metrics (advanced)
time
task quota exceeded
sched group wake-up
non-seastar thread runtime
keep running tasks
pending scheduling group
reactor_cpu_steal_time_ms
scheduler_time_spent_in_task_quota_violation
scheduler_starvetime_ms

Logged events
■Stalls (CPU)
■Large allocations (memory)
■Delayed requests (IO)

Stalls? What stalls?
■The task::run() can run arbitrary long time

■Violating queue length threshold
Reactor stalled for 66 ms on shard 0, in scheduling group main .----
Backtrace: 0x5008d9f 0x4ffff3c 0x4fff343 0x1ff1598 0x40fcf 0x17dd2c
0x18a3adc 0x18a1593 0x16b44bb 0x18e08f4 0x21f958d __________________
Too long queue accumulated for sl:default (1029 tasks) ____________________________________________
122: N7seastar8internal21coroutine_traits_baseINS_10shared_ptrIN2db9commitlog7segmentEE …_________
54: N7seastar9coroutine3allIJNS_6futureIvEES3_EE17intermediate_taskILm0EEE _______________________
80: N7seastar12continuationINS_8internal22promise_base_with_typeIvEEZZZZNS_3rpc11recv_helperI …___
4: N7seastar12continuationINS_8internal22promise_base_with_typeINS_3rpc5tupleIJN5query6resultENS …

Task stall pitfalls
■Timer ﬁres at its own expiration, not task-quota time
■The “stall time” is captured by a signal
■The printed call-trace is random in some sense
■It doesn’t show the continuation chain
●Always starts at reactor::run_tasks()
●Contains many inner nameless lambdas

Why stalls are bad?
■Single task or scheduling group occupies CPU
■Other non-CPU activity is not processed either
●Except those, that had been started before
●When ﬁnished, freed resource is not re-utilized

How to avoid stalls
■Make code preempt
●co_await maybe_yield()
●Remember to keep races under control
■Avoid exceptions cascading
●Propagating exceptions through co_awaits is very expensive
●2312b7a703cb9c4630c75c713458445abeb26325

Preemption
■Stalls are consequences of voluntary preemption
●Linux OS kernels preempt processes/threads by hardware time
■Preempt by signal?
●Overhead
●Locking problems

Phantom jam
■https://www.scylladb.com/2022/04/19/exploring-phantom-jams-in-your-data-ﬂow/

A Deep Dive into the Seastar Event Loop by Pavel Emelyanov

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

A Deep Dive into the Seastar Event Loop by Pavel Emelyanov

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx