ScyllaDB Real-Time Event Processing with CDC

ScyllaDB 408 views 17 slides Jun 20, 2024
Slide 1
Slide 1 of 17
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17

About This Presentation

ScyllaDB’s Change Data Capture (CDC) allows you to stream both the current state as well as a history of all changes made to your ScyllaDB tables. In this talk, Senior Solution Architect Guilherme Nogueira will discuss how CDC can be used to enable Real-time Event Processing Systems, and explore a...


Slide Content

Real-Time Event Processing with CDC Guilherme Nogueira, Senior Solutions Architect

Guilherme Nogueira Senior Solutions Architect @ ScyllaDB Previously at IBM Loves all things open source Just got a new puppy!

What is CDC? Use cases Comparison Record types Consuming CDC data Presentation Agenda

What is CDC?

Change Data Capture – CDC Consumable m odification record for one or more tables Key Feature of ScyllaDB (GA'd in 4.3/2021.1) Constantly receiving improvements Capture changes (write/delete/updates) Each change can trigger an event Asynchronously readable by a Consumer Unified for both CQL and DynamoDB Streams

Use Cases

Where and How is CDC Used Database Replication (Elasticsearch) Notification Systems External Cache Invalidation In-flight Analytics Such as Fraud Detection Downstream Application Triggers

Comparison

How does ScyllaDB CDC Compares Against ... Cassandra DynamoDB MongoDB ScyllaDB Consumer location on-node off-node off-node off-node Replication duplicated deduplicated deduplicated deduplicated Deltas yes limited partial optional Pre-image no yes no optional Post-image no yes yes optional Slow consumer reaction Table stopped Consumer loses data Consumer loses data Consumer loses data Ordering no yes yes yes

Record Types

What do I Get Out of CDC? Delta Preimage Postimage 'full' : contain information about every modified column 'keys' : only the primary key of the change will be recorded 'false' : Disables the feature 'true' : contain only the columns that were changed by the write ‘full’: contain the entire row (how it was before the write was made) 'false' : Disables the feature 'true' : show the affected row’s state after the write. Postimage row always contains all the columns no matter if they were affected by the change or not What was changed? What was before? What’s the end result?

Consuming CDC Data

How to Consume CDC Data CDC data is available through normal CQL Easy to read raw streams Already de-duplicated All delta and pre image values are normal CQL data Can consume without knowledge of server internals Layered approach CDC core functionality relatively simple. Allows for more sophisticated adaptors Push models etc.

Integration Libraries High(er) level CDC consumer libraries with examples: Java – https://github.com/scylladb/scylla-cdc-java Go – https://github.com/scylladb/scylla-cdc-go Rust – https://github.com/scylladb/scylla-cdc-rust Python - coming Kafka integration https://github.com/scylladb/scylla-cdc-source-connector

Wrap Up

Easy to integrate and consume Plain tables Robust Replicated in same way as normal data Benefits from all read path improvements Reasonable overhead Coalesced writes to same replica ranges Overhead is comparable to adding another table Does not overflow if consumer fails to act Data is TTL'd Why CDC on ScyllaDB?

Stay in Touch Guilherme Nogueira [email protected] hopugop https://www.linkedin.com/in/guilherme-nogueira-4740a116/
Tags