Push the Database Beyond the Edge by Nikita Sivukhin

ScyllaDB 0 views 55 slides Oct 14, 2025
Slide 1
Slide 1 of 55
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55

About This Presentation

Almost any application can benefit from having data available locally - enabling blazing-fast access and optimized write patterns. This talk will walk you through one approach to designing a full-featured sync engine, applicable across a wide range of domains, including front-end, back-end, and mach...


Slide Content

A ScyllaDB Community
Push the Database
Beyond the Edge
Nikita Sivukhin
Software Engineer

Nikita Sivukhin (he/his)

Software Engineer at Turso
■Currently work at Turso (previously at ShareChat/Moj)
■Learning to play the ukulele
(so far, it feels harder than writing code)
■I like algorithms and data structures
●Especially if DS is compact or succinct!

Talk structure
We will discuss how to bring a database locally to improve read/write patterns
1.Use cases from different domains where this can be useful
2.Design considerations for the solution we will build
3.Storage model overview
4.Database replication with read-only queries
5.Database replication with read/write queries

Use cases
Let’s discover a few example where local state can greatly
improve the app

Example #1: backend-router??????
■Router app: proxy requests by hostname to specific IP
●Hostname: sivukhin.p99conf.io → 10.4.113.154
●100 million mappings in total, 1000+ reads per sec, 10 writes per sec
●Eventual consistency: route update can be delayed by a few seconds
■Solution:
●Pull changes from the remote periodically and maintain local state
■Complexity:
●Updates must be streamed consistently from the main DB
●Full snapshot must be sent from the main DB in an efficient manner
●Local state must be maintained durably to recover from crashes

Example #2: ML-training pipeline??????
■Train embeddings in real-time with some algorithm
●Train on events (user, post, +/-), 10-100k events at peak
●Distributed training pipeline to handle the load
●Consistency: workers can accumulate gradients for some time and
flush them periodically
■Solution:
●Maintain embeddings in local state and accumulate gradients locally
●Flush gradients and update local embeddings periodically
■Complexity:
●Need to track accumulated gradients to send updates later
●Need to select embeddings to store locally – as the dataset is huge

Insight!
■Examples from completely different domains have benefited from local state
●One of the key components – is relaxed consistency
■Examples are pretty simple – in reality complexity usually grows fast
●Complex queries, support for local DDL statements, indexes, transactions, etc
■We must push the database beyond the “edge”!

Design considerations

Communication
■Remote DB is the source of truth
●It’s enough for all examples
●Simpler to reason about the data and perform complex actions (e.g. CREATE INDEX)
■“Star” topology for synchronization
●Tree topology is possible: just add intermediate nodes running the same DB

Technology choice
■Embedded database
●Must be pluggable into any language and any runtime
■A few “moving” components
●Otherwise replication complexity will increase

Technology choice: SQLite!
■Embedded database
●Must be pluggable into any language and any runtime
■A few “moving” components
●Otherwise replication complexity will increase
■SQLite is almost* perfect technology for us!


* we will discuss later why it’s not the perfect technology

SQLite storage model

SQLite storage model
■The DB “virtually” represented by a single file
●File consists of same-size pages linked as specified in the SQLite database file format

SQLite storage model
■The DB “virtually” represented by a single file
●File consists of same-size pages linked as specified in the SQLite database file format
■“Physically”, the DB consists of 2 files: main DB file and the WAL
●WAL is the sequence of frames which hold page content update , ID and commit marker
●SQLite forbids concurrent writers

SQLite storage model
■The DB “virtually” represented by a single file
●File consists of same-size pages linked as specified in the SQLite database file format
■“Physically”, the DB consists of 2 files: main DB file and the WAL
●WAL is the sequence of frames which hold page content update , ID and commit marker
●SQLite forbids concurrent writers
■WAL makes “time travel” easy

SQLite storage model
■The DB “virtually” represented by a single file
●File consists of same-size pages linked as specified in the SQLite database file format
■“Physically”, the DB consists of 2 files: main DB file and the WAL
●WAL is the sequence of frames which hold page content update , ID and commit marker
●SQLite forbids concurrent writers
■WAL makes “time travel” easy
■WAL checkpointed to the DB file

SQLite storage model: simple as 2 files
$> echo -n
'H4sIAAAAAAAAAwsO9MksSVVIyy/KTSxRMGZgYmBiYnBQUGBgADIhGAYYgZgFjU8IMDHo9f7gBSlm
3McARKOAqsCBkU1cWZnRsyQxKSc1OT8vLbUoNS85tRiJyeQc5OoY4qoQ4ujk46qAJKGRl5ibqgmKG
5RYHgUjCgAAXKzHWwAEAAA=' | base64 -d | zstd -d > conferences.db
$> echo -n
'H4sIAAAAAAAAAzOvZ2ti0H0kwcDAxAACq+ov/npx5ufO83Oa9joWqHsxQCSYYOILjmoYKNS9n8QL
FGRk/MIARKNghAAuRibZAktLveT8vDQAbci+bDgCAAA=' | base64 -d | zstd -d >
conferences.db-wal
$> sqlite3 conferences.db 'SELECT * FROM conferences'
p99.conf

Read-only replication

Read-only replication
■Replicate data from remote to local client
■Read queries can be executed completely locally

Read-only replication
■Replicate the DB file and stream the WAL frames to the client
■Client DB can checkpoint the WAL at its own pace

Read-only replication
■Pros:
●� Heavy operations are “executed” only once at the remote DB (e.g. CREATE INDEX)
●�Impossible to diverge from the remote (no worries about difference in SQLite versions)
■Cons:
●� WAL frames from different transactions contain same pages – unnecessary data transfer
●�Any update will transfer at least one frame – which is 4KB by default

Read-only replication
■Pros:
●� Heavy operations are “executed” only once at the remote DB (e.g. CREATE INDEX)
●�Impossible to diverge from the remote (no worries about difference in SQLite versions)
■Cons:
●� WAL frames from different transactions contain same pages – unnecessary data transfer
●�Any update will transfer at least one frame – which is 4KB by default



sqlite> UPDATE conferences SET name = 'p99.conf' WHERE id = 1;
sqlite> UPDATE conferences SET dates = '22 October 2025 - 23 October 2025' WHERE id = 1;
$> ls -l conferences.db-wal
8272 conferences.db-wal

Read-only replication
■Pros:
●� Heavy operations are “executed” only once at the remote DB (e.g. CREATE INDEX)
●�Impossible to diverge from the remote (no worries about difference in SQLite versions)
■Cons:
●� WAL frames from different transactions contain same pages – unnecessary data transfer
●�Any update will transfer at least one frame – which is 4KB by default
■*SQLite doesn’t have a WAL API for the replication and needs to be extended
●wal_begin_session / wal_read_page_at / wal_insert_page_at / wal_end_session

Read-only replication: deduplicate pages
■Fetch unique page updates and maintain “virtual” DB equivalence
●WAL history will be different between client and server
●But every commit boundary from the client will match some commit boundary on the server

Read-only replication: compress pages
■Pages in the single update set can be compressed
■Even more: client and remote can establish shared compression “dictionary”
●No need to transfer the “dictionary” – client and server need to agree only about page IDs

Read-only replication: compress pages
■Pages in the single update set can be compressed
■Even more: client and remote can establish shared compression “dictionary”
●Server has full DB history and can choose a page version same as on the client side
●No need to transfer the “dictionary” – client and server need to agree only about page IDs

Read-only replication: intelligent server
■For optimizations to work – server must be “intelligent” (just code – no AI)
●This is not just some trivial file server which streams DB and WAL files
■Keep “immutable” history of the database and efficiently store it
■Efficiently compute page updates delta for client between virtual DB states

Read-only replication: summary
■� Heavy operations are “executed” only once at the remote DB
■�Impossible to diverge from the remote
■�Straightforward consistency guaranteed by simplicity of the WAL
■Data transfer is close enough to “logical” replication
●It depends: some workloads will benefit more from page-based replication with compression
●If mutations cause lots of updates (indices, triggers, etc) – “logical” replication will be better

Writes support

Local write support
■Client must be able to write data locally
■Local updates can be synced on demand with remote
●E.g. client is back online or large enough batch is accumulated to amortize write cost

Local write support
■For the sync of local writes we need two components:
●Push local changes to the remote
●Pull remote changes to the local

Local write support: push
■Using physical replication for the push phase is problematic due to conflicts

Local write support: push
■Using physical replication for the push phase is problematic due to conflicts
■Capture deterministic logical changes and “replay” them on the remote
●*SQLite needs to be extended to have a smooth CDC capability

Local write support: push
■Using physical replication for the push phase is problematic due to conflicts
■Capture deterministic logical changes and “replay” them on the remote
●*SQLite needs to be extended to have a smooth CDC capability
■Maintain last change ID from the client applied on the remote in the a
separate table

Local write support: pull
■Need to “rebase” local changes on top of the WAL without disruption

Local write support: pull
■Need to “rebase” local changes on top of the WAL without disruption
■Instead of “fixing” WAL file directly – append “revert” pages to fix it “virtually”
●Revert F#C and F#D frames
●Apply remote frames F#A and F#B
●Replay local changes from the logical log

Local write support: pull
■Need to “rebase” local changes on top of the WAL without disruption
■Instead of “fixing” WAL file directly – append “revert” pages to fix it “virtually”
●Revert F#C and F#D frames
●Apply remote frames F#A and F#B
●Replay local changes from the logical log
■Use WAL “time travel” capability
to “revert” page changes

Local write support: checkpoint
■We use a portion of WAL to do the “time-travel” – how to checkpoint WAL?
■We need to have frames somewhere to revert local changes before “replay”

Local write support: checkpoint
■We use a portion of WAL to do the “time-travel” – how to checkpoint WAL?
■We need to have frames somewhere to revert local changes before “replay”
■Before checkpoint – just put revert frames into a separate file

Local write support: checkpoint
■We use a portion of WAL to do the “time-travel” – how to checkpoint WAL?
■We need to have frames somewhere to revert local changes before “replay”
■Before checkpoint – just put revert frames into a separate file

Local write support: checkpoint
■We use a portion of WAL to do the “time-travel” – how to checkpoint WAL?
■We need to have frames somewhere to revert local changes before “replay”
■Before checkpoint – just put revert frames into a separate file

Local write support: checkpoint
■We use a portion of WAL to do the “time-travel” – how to checkpoint WAL?
■We need to have frames somewhere to revert local changes before “replay”
■Before checkpoint – just put revert frames into a separate file

Local write support: checkpoint
■We use a portion of WAL to do the “time-travel” – how to checkpoint WAL?
■We need to have frames somewhere to revert local changes before “replay”
■Before checkpoint – just put revert frames into a separate file

Local write support: conflict resolution
■Easy to get an LWW conflict resolution strategy with logical “replay”
●Convert mutations to statements INSERT INTO … ON CONFLICT DO UPDATE …

Local write support: conflict resolution
■Easy to get an LWW conflict resolution strategy with logical “replay”
●Convert mutations to statements INSERT INTO … ON CONFLICT DO UPDATE …
■Some applications need more sophisticated rules

Local write support: conflict resolution
■Easy to get an LWW conflict resolution strategy with logical “replay”
●Convert mutations to statements INSERT INTO … ON CONFLICT DO UPDATE …
■Some applications need more sophisticated rules
●Use the power of SQL and let the user decide!
transform: m => ({
sql: `update ${m.tableName} set counter = counter + ?`,
values: [m.after.counter - m.before?.counter ?? 0]
})

Final design
■Designed a fully-functional, writable SQL DB replicated locally into the app
●Built on top of SQLite
●Pull changes by replicating physical pages
●Push changes by executing SQL mutations on the remote DB

Further improvements
■Logical mode for pull operation
●Logical mode has different trade-offs and expands supported use cases
■Partial synchronization
●Physical mode allows lazy pulling of B-Tree pages only when a query actually needs them.
■Local changes deduplication
●In most cases, only the latest local update matters. This reduces the storage required for CDC.
■…and more!
●We’re listening, and your feedback shapes the future of sync!

Implementation details
■SQLite is almost a perfect solution – but it lacks features which we used:
●Raw WAL API for pull
●Logical logging support for push
●Other things left overboard: precise checkpoint control, schema versioning and more

Implementation details: meet Turso
■SQLite is almost a perfect solution – but it lacks features which we used:
●Raw WAL API for pull
●Logical logging support for push
●Other things left overboard: precise checkpoint control, schema versioning and more
■SQLite is extensible – but core is hard to improve without major changes.
■Meet turso-database.

Implementation details: meet Turso
■Written in Rust which is very expressive language with big community.
●Sync-engine uses genawaiter crate
●Async code can be written flawlessly – but core is completely runtime agnostic
async fn apply(&self, coro: &Coro, changes: Changes) -> Result<()> {
let changes_file = &changes.file_slot;
let apply_result = self.apply_internal(coro, &changes_file). await;
let Ok(watermark) = pull_result else {
return Err(pull_result.err().unwrap());
};
let revert = self.io.open_file(&self.revert_wal)?;
reset_wal_file(coro, revert_wal_file, 0). await?;
self.update_meta(coro, |m| m.watermark = watermark). await?;
Ok(())
}

Implementation details: meet Turso
■CDC required for sync-engine is implemented as independent feature
●User can enable it with special pragma and track all changes happening in the database
●All database connections created by sync-engine enable CDC by default
turso> SELECT
bin_record_json_object(table_columns_json_array(table_name), before) before,
bin_record_json_object(table_columns_json_array(table_name), after) after
FROM turso_cdc;
──────────────────────────────────────────────┬──────────────────────────────
──────────────────────
before │ after
│ {"text":"record p99 talk","date":"23-09-2025"}
│ {"text":"play ukulele","date":"27-09-2025"}
{"text":"play ukulele","date":"27-09-2025"} │ {"text":"improve sync engine","date":"27-09-2025"}
──────────────────────────────────────────────┴──────────────────────────────
──────────────────────

Implementation details: meet Turso
■WAL API exposed to support precise control over frames in the sync engine
●Basic operations like read/append
●More advanced stuff like generation of “revert” frames for pull implementation
trait DatabaseWalSession {
fn append_page(&mut self, page_no: u32, page: &[u8]) -> Result<()>;
fn revert_page(&mut self, page_no: u32, frame_watermark: u64) -> Result<()>;
fn revert_changes_after(&mut self, frame_watermark: u64) -> Result<usize>;
fn commit(&mut self, db_size: u32) -> Result<()>;
}

Call to action
■Think if local state can improve the performance of your application
■There are already some options for you to consider:
●SQLite – battle-tested and very extensible embedded DB which can cover simple use cases
●Turso-db – SQLite compatible rewrite of SQLite powered by Deterministic Simulation Testing
■Sync-engine described in the talk is already available for use: @tursodatabase/sync

Thank you! Let’s connect.
Nikita Sivukhin
[email protected]
@SivukhinN
sivukhin.github.io
Tags