Writing Low Latency Database Applications Even If Your Code Sucks
ScyllaDB
356 views
21 slides
Jun 26, 2024
Slide 1 of 21
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
About This Presentation
All latency lovers are used to the mystical experience of joy coming from aligning data to a cache line size and seeing your latency improve by hundreds of microseconds. But as transcendent as we can become, the laws of physics still apply to us.
That means that there is little we can do when data ...
All latency lovers are used to the mystical experience of joy coming from aligning data to a cache line size and seeing your latency improve by hundreds of microseconds. But as transcendent as we can become, the laws of physics still apply to us.
That means that there is little we can do when data has to be fetched from across the planet. By putting data close to its users, you can save hundreds of milliseconds and still be faster than the most optimized code, even if your code sucks.
Massive data replication comes with challenges, though: how to keep the costs in check? How to make sure that a semblance of consistency is maintained? What about those pesky regulations that mandate data residency?
In this talk I will explore the Data Edge: a growing movement of databases that are overcoming those challenges to make your data always available where your users are.
Size: 6.14 MB
Language: en
Added: Jun 26, 2024
Slides: 21 pages
Slide Content
Writing Low Latency Database Applications Even if your code sucks! Glauber Costa CEO at Turso
Glauber Costa CEO at Turso COMPANY LOGO Former VP Field Engineering at ScyllaDB Former contributor to the Linux Kernel Latency aficionado When I am not working, I take care of my 3 kids
Your application and your database How latency is commonly measured
Zoom out Is that the end-to-end latency?
Zoom out Is that the end-to-end latency? What do you mean “there are other countries” ???
Impact P99 goes from 10ms to 1ms. That’s 10x better. Roundtrip NYC to Sydney: ~250ms
Impact P99 goes from 10ms to 1ms: That’s 10x better. Roundtrip NYC to Sydney: ~250ms. Barbie not impressed. I don’t care about your 10ms savings, mate!
Solution 1 Write good code, but move your data closer to Sydney
Solution 2 Put your data everywhere. Just do whatever you want and save yourself 200ms.
Some estimates (approx) NYC to: Sydney 250ms Sao Paulo 150ms San Francisco 70ms Amsterdam 70ms Tokyo 180ms https://wondernetwork.com/pings/New%20York
The Edge Global coverage. Geographical Abstraction. Serverless: Cloudflare Workers, Vercel Edge, Netlify Edge, etc. Or VMs: Fly.io, Koyeb, etc.
The Edge Still bad!!
How? For per-user data: partitioning . Unfortunately for global data, there’s only one way: replication . But the problem with replication is: cost .
Reframing the problem How to build the cheapest possible database, so that it can replicate to a lot of locations? What is a lot? 30, not 3. Seasonality of traffic. Cost of a GB: ~$0.10/month Cost of a CPU (with reasonable memory): ~$20/month
What is the cheapest SQL database ? Hey, can we change SQLite to add the building blocks of replication? No. Ok… (disappointed face)
What is the cheapest SQL database ? l ibsql: fork of SQLite. Everybody welcome. Major changes: write-ahead-log virtualization hooks. server-mode
What is the cheapest SQL database ? Built on libsql for massive replication. Serverless or Embedded. Primary writes, replicated reads. Scale-to-zero. 1 command replication, single URL routing. Multitenancy.
What does it look like? Vercel’s Edge benchmark 5 waterfall queries from edge compute locations. Client in Brazil. Postgres Purple : latency from the client’s preferred location Blue: latency from the DB’s preferred location
What does it look like?
Summary Moving data close to users can eclipse lots of optimization work. Partition and replication are really the only way to do it. To make replication viable, it needs to be: Cheap, easy to set up, and easy to use.
Glauber Costa [email protected] @glcst https://turso.tech Thank you! Let’s connect.