Exploring ScyllaDB’s DynamoDB-Compatible API by Guilherme Nogueira & Nadav Har'El

ScyllaDB 144 views 51 slides Mar 10, 2025
Slide 1
Slide 1 of 51
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51

About This Presentation

In this talk we inspect how ScyllaDB implemented Alternator, the DynamoDB-compatible API. We review internal table structure, load balancing and deployment characteristics. We also inspect an existing workload in DynamoDB and compare it running on Scylla Cloud as a DBaaS. Aspects include performance...


Slide Content

A ScyllaDB Community
Exploring ScyllaDB’s
DynamoDB-Compatible API
Guilherme Nogueira
Technical Director
Nadav Har'El
Distinguished Engineer

Guilherme Nogueira
Guilherme is a Technical Director at ScyllaDB, assists users in their
journey to see ScyllaDB's potential. Geek, passionate about Linux and
open source software.
Presenters
Nadav Har'El
Nadav Har'El has had a diverse 30-year career in computer programming
and computer science, working in areas including information retrieval,
virtualization and operating systems. Today he works on ScyllaDB, and
among other things led the Alternator development.

■Why move out
■Use-case analysis and cost
■Alternator deep dive
■Migrating off of DynamoDB
■ScyllaDB cost
Presentation Agenda

Why move out of DynamoDB?

Reasons workloads move out of DynamoDB
■We work with tons of customers that migrated off of DDB
■Main reasons:
■Cost
■Latency
■Flexibility


For a more detailed look, check out out the blog
https://www.scylladb.com/2023/12/04/dynamodb-when-to-move-out/

A little spoiler
50%
guaranteed
https://www.scylladb.com/compare/scylladb-vs-dynamodb
As of March 2025

Pains that cause change

Latency
lp.scylladb.com/price-performance-dynamoDB-benchmark-offer

Latency
lp.scylladb.com/price-performance-dynamoDB-benchmark-offer

ProvisionedThroughputExceededException
■Adequate provisioned capacity but high throttling
■Throttling despite activated AWS Application Auto Scaling
■Throttling in on-demand capacity mode
■Presence of a hot partition
■Traffic exceeding account throughput quotas

Lock-in
■DynamoDB integrates seamlessly with AWS ecosystem
■Harder to leave
■ScyllaDB runs anywhere
■On-prem
■Any cloud
■K8s

Use-case analysis

Use case: Streaming recommendation
A streaming service serves ML-generated recommendations to user
profiles across the globe, from the nearest replicated location.
■6 total locations
■User feedback and watched titles are written from any
replicated location
■Stores users likes and dislikes, watched titles, watch time
■Recommendations are generated daily
■Immediately available globally

Use case profile
■Dataset: 1PB
■Payload / item size: 10KB
■Reads
■100K ops/s baseline
■500K ops/s peak (10h a day)
■Writes
■400K ops/s baseline
■1.2M ops/s peak (12h a day)
■Daily batch load, large portion of the dataset is overwritten
■Batch writes to a single region, replicated to other 5
■User-incoming writes are written to the local region and replicated

Use-case profile
■Batch writes peak 1.2M ops/s
■Regular writes 400K ops/s
■Reads peak 500K ops/s

Use case challenges
■Users’ watch history replicated along recommendations
■Unpredictable patterns
■Interest influenced by external trends
■Single table approach
■Tail latency
■Cost
■Lock-in

DynamoDB Cost Analysis

Cost drivers
■Unpredictable user spikes
■On-demand is easy
■Global Table with 6 regions
■Replicated Write units + network for replication
■Daily data load
■Massive cost



Estimated numbers as of Feb/2025

Cost estimation - on-demand
■Replicated writes
■Spiky reads
■10KB payload

Est. DynamoDB on-demand cost: $571,811,035.40 / year



Estimated numbers as of Feb 2025 list prices

Cost estimation - provisioned
■Replicated writes
■Spiky reads, which can be challenging on provisioned mode
■10KB payload
■Assuming reserved capacity

Est. DynamoDB provisioned cost: $264,716,657.90 / year
2x cheaper than on-demand, but still…

Estimated numbers as of Feb 2025 list prices

What is ScyllaDB Alternator?

Run DynamoDB apps anywhere, at lower cost
■Amazon DynamoDB-compatible API
■Supports SDKs, data modeling, and queries as DynamoDB
■Deploy anywhere: on-premise, or on any public cloud
■Predictable and sustained lower latencies without high
operational costs

ScyllaDB Alternator

ScyllaDB started (2015) with Cassandra®’s APIs: CQL & Thrift

Alternator (2019): added a DynamoDB
TM
-compatible API to ScyllaDB



ScyllaDB Alternator

Why a DynamoDB API?
■Popularity growing, closing in on Cassandra
■ScyllaDB solves real AWS lock-in problem
■DynamoDB apps can now be developed,
or run, anywhere, and at cheaper cost
■DynamoDB API is close in features to CQL


ScyllaDB Alternator
db-engines.com popularity ranking (log)


2013 2025

Alternator is part of ScyllaDB
■ScyllaDB listens for DynamoDB API requests in addition to CQL.
■Same installation, setup, etc. just run with extra options:
■alternator_port and/or alternator_https_port
■alternator_write_isolation (how read-modify-write operations are isolated)

Available in ScyllaDB Cloud since June 2020
ScyllaDB Alternator

Alternator goal: full compatibility with DynamoDB API
■Support unmodified applications designed for DynamoDB
■But run them anywhere that ScyllaDB can run - avoid vendor lock-in!
■Already mostly compatible except some unimplemented features listed
in docs/alternator/compatibility.md
Alternator vs. DynamoDB

■From docs/alternator/compatibility.md:
■DynamoDB’s multi-item “transactions” feature is not supported.
■All of Alternator’s tables are “Global Tables”, i.e., same on all DCs.
■DynamoDB Streams API is supported, but Kinesis Streams API not.
■DynamoDB’s Backup API not supported - use Scylla’s backup tools instead.
■DynamoDB’s per-table ProvisionedThroughput not supported, use Scylla’s
“workload prioritization”.
■New PartiQL syntax not supported. Use CQL if you want an SQL-like syntax.
■Export/import to S3.
■“Projection” parameter of GSI/LSI is not honored - all base table attributes are
available in the index.
Alternator vs. DynamoDB

■Like ScyllaDB, Alternator is a dedicated installation,
not multi-tenant service like DynamoDB:
■Cluster is dedicated to one customer
■CQL users are aware of the nodes in this cluster, but DynamoDB users are not
■We’ll discuss load balancing in the next slide
■Powerful monitoring framework from ScyllaDB (Prometheus, Grafana) -
■Understand your workload’s performance and get insights on how to improve it
Alternator vs. DynamoDB

DynamoDB applications configure one endpoint URL, e.g.,
dynamodb.us-east-1.amazonaws.com

Application is not aware of the individual nodes in the cluster…
■Cannot send a request to the right node (let alone right core)
■Cannot balance the load between the cluster’s live nodes
Alternator vs. DynamoDB - load balancing

So user must deploy an additional load balancer:
■Server-side load balancer (TCP or HTTP)
■Client sends request to the load balancer, which forwards it to a random node, and
from there, to the right node. Expensive.

■ScyllaDB coordinator-only node
■Client goes to a data-less Scylla node, request forwarded to the right node.
Alternator vs. DynamoDB - load balancing

Our recommended option:
■Client-side load balancer:
■Small wrapper over Amazon’s existing SDK (in various programming languages).
■After slightly different setup, application uses unmodified API calls.
■The wrapper sends each request to the best ScyllaDB node:
■Maintains a list of live nodes.
■Can picks a node in the right DC and rack (region and zone).
■Can picks a node holding the requested item.
■Lowers infrastructure and networking costs, and request latency.
Alternator vs. DynamoDB - load balancing

Most of what you know about ScyllaDB internals and ops still relevant:
■Scalability
■Any number of nodes in the cluster, and size of nodes
■Across multiple geographic regions and availability zones
■New: fast and efficient cluster growth/downsizing
■“Tablets” - small pieces of tables that can migrate efficiently between nodes
Inside Alternator

Most of what you know about ScyllaDB internals and ops still relevant:
■schemas (Alternator tables are CQL tables with special conventions)
■sstables
■compactions
■repairs
■caching
■data centers
■etc.
Inside Alternator

Alternator tables are CQL tables with special conventions:
■Table “xyz” stored in as table “alternator_xyz.xyz“
■secondary indexes (GSI and LSI) implemented as materialized views
Inside Alternator

■CQL requires a full schema - list of columns, type of each.
DynamoDB only declares names and types of key attributes.
■Key attributes:
■Stored as real columns in the schema, with known type
■Key type can be string, bytes or number
■Non-key attributes:
■Stored together in one map<text, bytes>
■Attribute values can different different DynamoDB types - number, boolean, bytes,
null, list, map, string set, number set, binary set - represented as JSON
Inside Alternator

To try Alternator quickly, you can:
■Create an Alternator cluster on ScyllaDB Cloud
■Run one Alternator node on your machine in 5 minutes, using docker:

docker pull scylladb/scylla:latest

docker run --name scylla -d -p 8000:8000
scylladb/scylla:latest --alternator-port=8000
--alternator-write-isolation=always



Trying Alternator

Migrating to ScyllaDB

What ScyllaDB brings to the table
■Alternator - DynamoDB API compatibility
■Not a translation layer!
■Popular standards as transport
■HTTP/S and JSON
■ScyllaDB also implements CQL
■Query language
■Protocol
■Data model

Comparison - Schema
■CQL enforces a schema
■Partition Key(s), Clustering Key(s)
■CQL types
■UDTs
■Materialized Views, Local Secondary Indexes, Global Secondary Indexes
■CDC
■DynamoDB enforces Keys
■Primary Key, Sorting Key
■DynamoDB Types
■Local Secondary Indexes, Global Secondary Indexes
■DynamoDB Streams

Migration Paths
Depending on the chosen path
■DynamoDB to ScyllaDB Alternator
■No application rewrite
■Table should work as-is
■Run DDB-compatible workloads anywhere!
■DynamoDB to CQL
■Application and Table changes required
■Feature-rich, cluster aware
■High performance

Migration Paths - DDB to Alternator
DynamoDB to ScyllaDB Alternator
■Use DynamoDB-API compatible libraries and tools
■Dual writes
■Change application to target both DDB and Alternator
■Alternatively, DynamoDB Streams + Lambda
■Historical data lift-and-shift
■Read from DDB table
■Consumes RCUs
■Beware of throttling and $
■Export to S3

Migration Paths - DynamoDB to CQL
DynamoDB to ScyllaDB using CQL
■Use DynamoDB-API compatible tools
■Python boto3 and Java AWS SDK
■Native CQL libraries for interacting with ScyllaDB
■Dual writes
■Change application to target both DDB and ScyllaDB
■Historical data lift-and-shift
■Read from table or S3
■ETL from DDB types to CQL types

Tooling - Alternator
■Libraries designed to work with DynamoDB API works with Alternator
■Python boto3, Java AWS SDK…
■Kafka Streams Connector
■NoSQL Workbench
■Spark
■Incl. ScyllaDB Migrator
See https://github.com/scylladb/scylla-migrator

Tooling - CQL
■Built-in shard awareness on ScyllaDB drivers
■Enabled by default
■Supports many languages:
■Leveraged by connectors:
■Rack awareness (save on Cloud networking)

Migration challenges
■Reading tables can trip throttling
■Impact to Production apps, cost
■Reading JSON from S3 has its challenges
■Not everything abides to JSON standards
■Better to stick with official libraries to import data
■Schema changes over the years

What about ScyllaDB cost?

The answer is…
50%
guaranteed
https://www.scylladb.com/compare/scylladb-vs-dynamodb
As of Feb 2025

Cost estimation - ScyllaDB on-prem
■Moved apps and DB to on-prem hosting
■Kept 6 total regions
■ScyllaDB + on-prem infra cost:
■$24,177,216 / year (excl. management)
■24x lower than DynamoDB on-demand
■11x lower than DynamoDB provisioned
■Cost efficient in dedicated infrastructure
■Improved latencies

Wrap
■ScyllaDB helps if you are experiencing:
■High cost
■Unpredictable latencies
■Lock-in
■Alternator removes friction of moving off of DynamoDB
■CQL is featureful

Stay in Touch
Guilherme Nogueira


[email protected]
hopugop
https://www.linkedin.com/in/guilherme-no
gueira-4740a116/
Nadav Har’El
[email protected]
nyh
https://www.linkedin.com/in/nyharel/
Tags