Exploring ScyllaDB’s DynamoDB-Compatible API by Guilherme Nogueira & Nadav Har'El
ScyllaDB
144 views
51 slides
Mar 10, 2025
Slide 1 of 51
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
About This Presentation
In this talk we inspect how ScyllaDB implemented Alternator, the DynamoDB-compatible API. We review internal table structure, load balancing and deployment characteristics. We also inspect an existing workload in DynamoDB and compare it running on Scylla Cloud as a DBaaS. Aspects include performance...
In this talk we inspect how ScyllaDB implemented Alternator, the DynamoDB-compatible API. We review internal table structure, load balancing and deployment characteristics. We also inspect an existing workload in DynamoDB and compare it running on Scylla Cloud as a DBaaS. Aspects include performance, cost, feature comparison with DynamoDB.
Size: 2.35 MB
Language: en
Added: Mar 10, 2025
Slides: 51 pages
Slide Content
A ScyllaDB Community
Exploring ScyllaDB’s
DynamoDB-Compatible API
Guilherme Nogueira
Technical Director
Nadav Har'El
Distinguished Engineer
Guilherme Nogueira
Guilherme is a Technical Director at ScyllaDB, assists users in their
journey to see ScyllaDB's potential. Geek, passionate about Linux and
open source software.
Presenters
Nadav Har'El
Nadav Har'El has had a diverse 30-year career in computer programming
and computer science, working in areas including information retrieval,
virtualization and operating systems. Today he works on ScyllaDB, and
among other things led the Alternator development.
■Why move out
■Use-case analysis and cost
■Alternator deep dive
■Migrating off of DynamoDB
■ScyllaDB cost
Presentation Agenda
Why move out of DynamoDB?
Reasons workloads move out of DynamoDB
■We work with tons of customers that migrated off of DDB
■Main reasons:
■Cost
■Latency
■Flexibility
For a more detailed look, check out out the blog
https://www.scylladb.com/2023/12/04/dynamodb-when-to-move-out/
A little spoiler
50%
guaranteed
https://www.scylladb.com/compare/scylladb-vs-dynamodb
As of March 2025
ProvisionedThroughputExceededException
■Adequate provisioned capacity but high throttling
■Throttling despite activated AWS Application Auto Scaling
■Throttling in on-demand capacity mode
■Presence of a hot partition
■Traffic exceeding account throughput quotas
Lock-in
■DynamoDB integrates seamlessly with AWS ecosystem
■Harder to leave
■ScyllaDB runs anywhere
■On-prem
■Any cloud
■K8s
Use-case analysis
Use case: Streaming recommendation
A streaming service serves ML-generated recommendations to user
profiles across the globe, from the nearest replicated location.
■6 total locations
■User feedback and watched titles are written from any
replicated location
■Stores users likes and dislikes, watched titles, watch time
■Recommendations are generated daily
■Immediately available globally
Use case profile
■Dataset: 1PB
■Payload / item size: 10KB
■Reads
■100K ops/s baseline
■500K ops/s peak (10h a day)
■Writes
■400K ops/s baseline
■1.2M ops/s peak (12h a day)
■Daily batch load, large portion of the dataset is overwritten
■Batch writes to a single region, replicated to other 5
■User-incoming writes are written to the local region and replicated
Use case challenges
■Users’ watch history replicated along recommendations
■Unpredictable patterns
■Interest influenced by external trends
■Single table approach
■Tail latency
■Cost
■Lock-in
DynamoDB Cost Analysis
Cost drivers
■Unpredictable user spikes
■On-demand is easy
■Global Table with 6 regions
■Replicated Write units + network for replication
■Daily data load
■Massive cost
Est. DynamoDB on-demand cost: $571,811,035.40 / year
Estimated numbers as of Feb 2025 list prices
Cost estimation - provisioned
■Replicated writes
■Spiky reads, which can be challenging on provisioned mode
■10KB payload
■Assuming reserved capacity
Est. DynamoDB provisioned cost: $264,716,657.90 / year
2x cheaper than on-demand, but still…
Estimated numbers as of Feb 2025 list prices
What is ScyllaDB Alternator?
Run DynamoDB apps anywhere, at lower cost
■Amazon DynamoDB-compatible API
■Supports SDKs, data modeling, and queries as DynamoDB
■Deploy anywhere: on-premise, or on any public cloud
■Predictable and sustained lower latencies without high
operational costs
ScyllaDB Alternator
ScyllaDB started (2015) with Cassandra®’s APIs: CQL & Thrift
Alternator (2019): added a DynamoDB
TM
-compatible API to ScyllaDB
ScyllaDB Alternator
Why a DynamoDB API?
■Popularity growing, closing in on Cassandra
■ScyllaDB solves real AWS lock-in problem
■DynamoDB apps can now be developed,
or run, anywhere, and at cheaper cost
■DynamoDB API is close in features to CQL
Alternator is part of ScyllaDB
■ScyllaDB listens for DynamoDB API requests in addition to CQL.
■Same installation, setup, etc. just run with extra options:
■alternator_port and/or alternator_https_port
■alternator_write_isolation (how read-modify-write operations are isolated)
Available in ScyllaDB Cloud since June 2020
ScyllaDB Alternator
Alternator goal: full compatibility with DynamoDB API
■Support unmodified applications designed for DynamoDB
■But run them anywhere that ScyllaDB can run - avoid vendor lock-in!
■Already mostly compatible except some unimplemented features listed
in docs/alternator/compatibility.md
Alternator vs. DynamoDB
■From docs/alternator/compatibility.md:
■DynamoDB’s multi-item “transactions” feature is not supported.
■All of Alternator’s tables are “Global Tables”, i.e., same on all DCs.
■DynamoDB Streams API is supported, but Kinesis Streams API not.
■DynamoDB’s Backup API not supported - use Scylla’s backup tools instead.
■DynamoDB’s per-table ProvisionedThroughput not supported, use Scylla’s
“workload prioritization”.
■New PartiQL syntax not supported. Use CQL if you want an SQL-like syntax.
■Export/import to S3.
■“Projection” parameter of GSI/LSI is not honored - all base table attributes are
available in the index.
Alternator vs. DynamoDB
■Like ScyllaDB, Alternator is a dedicated installation,
not multi-tenant service like DynamoDB:
■Cluster is dedicated to one customer
■CQL users are aware of the nodes in this cluster, but DynamoDB users are not
■We’ll discuss load balancing in the next slide
■Powerful monitoring framework from ScyllaDB (Prometheus, Grafana) -
■Understand your workload’s performance and get insights on how to improve it
Alternator vs. DynamoDB
DynamoDB applications configure one endpoint URL, e.g.,
dynamodb.us-east-1.amazonaws.com
Application is not aware of the individual nodes in the cluster…
■Cannot send a request to the right node (let alone right core)
■Cannot balance the load between the cluster’s live nodes
Alternator vs. DynamoDB - load balancing
So user must deploy an additional load balancer:
■Server-side load balancer (TCP or HTTP)
■Client sends request to the load balancer, which forwards it to a random node, and
from there, to the right node. Expensive.
■ScyllaDB coordinator-only node
■Client goes to a data-less Scylla node, request forwarded to the right node.
Alternator vs. DynamoDB - load balancing
Our recommended option:
■Client-side load balancer:
■Small wrapper over Amazon’s existing SDK (in various programming languages).
■After slightly different setup, application uses unmodified API calls.
■The wrapper sends each request to the best ScyllaDB node:
■Maintains a list of live nodes.
■Can picks a node in the right DC and rack (region and zone).
■Can picks a node holding the requested item.
■Lowers infrastructure and networking costs, and request latency.
Alternator vs. DynamoDB - load balancing
Most of what you know about ScyllaDB internals and ops still relevant:
■Scalability
■Any number of nodes in the cluster, and size of nodes
■Across multiple geographic regions and availability zones
■New: fast and efficient cluster growth/downsizing
■“Tablets” - small pieces of tables that can migrate efficiently between nodes
Inside Alternator
Most of what you know about ScyllaDB internals and ops still relevant:
■schemas (Alternator tables are CQL tables with special conventions)
■sstables
■compactions
■repairs
■caching
■data centers
■etc.
Inside Alternator
Alternator tables are CQL tables with special conventions:
■Table “xyz” stored in as table “alternator_xyz.xyz“
■secondary indexes (GSI and LSI) implemented as materialized views
Inside Alternator
■CQL requires a full schema - list of columns, type of each.
DynamoDB only declares names and types of key attributes.
■Key attributes:
■Stored as real columns in the schema, with known type
■Key type can be string, bytes or number
■Non-key attributes:
■Stored together in one map<text, bytes>
■Attribute values can different different DynamoDB types - number, boolean, bytes,
null, list, map, string set, number set, binary set - represented as JSON
Inside Alternator
To try Alternator quickly, you can:
■Create an Alternator cluster on ScyllaDB Cloud
■Run one Alternator node on your machine in 5 minutes, using docker:
docker pull scylladb/scylla:latest
docker run --name scylla -d -p 8000:8000
scylladb/scylla:latest --alternator-port=8000
--alternator-write-isolation=always
Trying Alternator
Migrating to ScyllaDB
What ScyllaDB brings to the table
■Alternator - DynamoDB API compatibility
■Not a translation layer!
■Popular standards as transport
■HTTP/S and JSON
■ScyllaDB also implements CQL
■Query language
■Protocol
■Data model
Comparison - Schema
■CQL enforces a schema
■Partition Key(s), Clustering Key(s)
■CQL types
■UDTs
■Materialized Views, Local Secondary Indexes, Global Secondary Indexes
■CDC
■DynamoDB enforces Keys
■Primary Key, Sorting Key
■DynamoDB Types
■Local Secondary Indexes, Global Secondary Indexes
■DynamoDB Streams
Migration Paths
Depending on the chosen path
■DynamoDB to ScyllaDB Alternator
■No application rewrite
■Table should work as-is
■Run DDB-compatible workloads anywhere!
■DynamoDB to CQL
■Application and Table changes required
■Feature-rich, cluster aware
■High performance
Migration Paths - DDB to Alternator
DynamoDB to ScyllaDB Alternator
■Use DynamoDB-API compatible libraries and tools
■Dual writes
■Change application to target both DDB and Alternator
■Alternatively, DynamoDB Streams + Lambda
■Historical data lift-and-shift
■Read from DDB table
■Consumes RCUs
■Beware of throttling and $
■Export to S3
Migration Paths - DynamoDB to CQL
DynamoDB to ScyllaDB using CQL
■Use DynamoDB-API compatible tools
■Python boto3 and Java AWS SDK
■Native CQL libraries for interacting with ScyllaDB
■Dual writes
■Change application to target both DDB and ScyllaDB
■Historical data lift-and-shift
■Read from table or S3
■ETL from DDB types to CQL types
Tooling - Alternator
■Libraries designed to work with DynamoDB API works with Alternator
■Python boto3, Java AWS SDK…
■Kafka Streams Connector
■NoSQL Workbench
■Spark
■Incl. ScyllaDB Migrator
See https://github.com/scylladb/scylla-migrator
Tooling - CQL
■Built-in shard awareness on ScyllaDB drivers
■Enabled by default
■Supports many languages:
■Leveraged by connectors:
■Rack awareness (save on Cloud networking)
Migration challenges
■Reading tables can trip throttling
■Impact to Production apps, cost
■Reading JSON from S3 has its challenges
■Not everything abides to JSON standards
■Better to stick with official libraries to import data
■Schema changes over the years
What about ScyllaDB cost?
The answer is…
50%
guaranteed
https://www.scylladb.com/compare/scylladb-vs-dynamodb
As of Feb 2025
Cost estimation - ScyllaDB on-prem
■Moved apps and DB to on-prem hosting
■Kept 6 total regions
■ScyllaDB + on-prem infra cost:
■$24,177,216 / year (excl. management)
■24x lower than DynamoDB on-demand
■11x lower than DynamoDB provisioned
■Cost efficient in dedicated infrastructure
■Improved latencies
Wrap
■ScyllaDB helps if you are experiencing:
■High cost
■Unpredictable latencies
■Lock-in
■Alternator removes friction of moving off of DynamoDB
■CQL is featureful