Debezium Snapshots Revisited!

HostedbyConfluent 971 views 34 slides Oct 25, 2023
Slide 1
Slide 1 of 34
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34

About This Presentation

"Initial snapshots are a core feature of Debezium: when setting up a new CDC connector, existing tables can be scanned in order to export their full state to consumers, before starting to capture changes from the transaction log. While this works great in general, a few questions came up again ...


Slide Content

Image © Nicolas Buffler https://flic.kr/p/jpWcWD (CC BY 2.0)
Debezium Snapshots Revisited!
Gunnar Morling
Senior Staff Software Engineer, Decodable
@gunnarmorling

#DebeziumSnapshotting @gunnarmorling
Agenda

#DebeziumSnapshotting @gunnarmorling
●Software engineer at Decodable
●Former project lead of Debezium
●kcctl ?????? , JfrUnit, ModiTect,
MapStruct
●Spec Lead for Bean Validation 2.0
●Java Champion
Gunnar Morling

#DebeziumSnapshotting @gunnarmorling
Recap – Debezium
Log-Based Change Data Capture

#DebeziumSnapshotting @gunnarmorling
Snapshotting
Why Is It Needed?
●Need to backfill data, but don’t
have all TX logs
●Solution: scan data once before
streaming
●Emit READ event for each record

#DebeziumSnapshotting @gunnarmorling
Snapshotting
Classic Approach – General Idea
●Capture current
position in transaction
log
●Scan all relevant tables
●Start streaming

#DebeziumSnapshotting @gunnarmorling
Snapshotting
Key Configuration Options
●snapshot.mode (initial, never, schema_only_recovery)
●snapshot.select.statement.overrides
●snapshot.max.threads

#DebeziumSnapshotting @gunnarmorling
Snapshotting
Limitations of Classic Approach
●Can’t update filter list

#DebeziumSnapshotting @gunnarmorling
Snapshotting
Limitations of Classic Approach
●Can’t update filter list
●Can’t pause & resume long-running snapshots

#DebeziumSnapshotting @gunnarmorling
Snapshotting
Limitations of Classic Approach
●Can’t update filter list
●Can’t pause & resume long-running snapshots
●Can’t stream changes until snapshot completed

#DebeziumSnapshotting @gunnarmorling
Snapshotting
Limitations of Classic Approach
●Can’t update filter list
●Can’t pause & resume long-running snapshots
●Can’t stream changes until snapshot completed
●Can’t re-snapshot selected tables

Incremental
Snapshots
© Karen Blaha https://flic.kr/p/aeuPys (CC BY-SA 2.0)

#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
The Paper
●“DBLog: A Watermark Based
Change-Data-Capture
Framework”, by Andreas Andreakis
and Ioannis Papapanagiotou
●Key idea: interleave snapshot events
and events from TX log
https://arxiv.org/pdf/2010.12597v1.pdf

#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
General Idea

#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Windowing via Watermarks

#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Buffer Processing

#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Buffer Processing

#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Semantics
●No guarantee for snapshot (read) events for all records
●May receive update or delete without prior insert/read
●May receive read and update/delete

●What is guaranteed: complete data set after snapshot

Demo
© Wall Boat https://flic.kr/p/Y6zkmX (Public Domain)

#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Connector Offsets

#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
MySQL Read-Only Snapshots
●Write access to DB may be not desirable

#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Signalling Channels
●Database table
●Kafka topic
●JMX
●Custom
id 924e3ff8-2245-43ca-ba77-2af9af02fa07
type log, {execute|pause|resume|stop}-snapshot
value{ "data-collections": ["schema1.table1", "schema2.table2"],
"type":"incremental",
"additional-condition":"color=blue" }

#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Notifications

#Debezium + #ApacheFlink | @gunnarmorling
Comparison

#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Benefits
●Can update filter list ✅

#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Benefits
●Can update filter list ✅
●Long-running snapshots can be paused/resumed ✅

#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Benefits
●Can update filter list ✅
●Long-running snapshots can be paused/resumed ✅
●Can stream changes before snapshot completed ✅

#DebeziumSnapshotting @gunnarmorling
Incremental Snapshotting
Benefits
●Can update filter list ✅
●Long-running snapshots can be paused/resumed ✅
●Can stream changes before snapshot completed ✅
●Can re-snapshot selected tables ✅

#DebeziumSnapshotting @gunnarmorling
●Incremental Snapshots in Debezium
https://debezium.io/blog/2021/10/07/incremental-snapshots/
●Read-only Incremental Snapshots for MySQL
https://debezium.io/blog/2022/04/07/read-only-incremental-snapshots/
●Flink CDC
https://ververica.github.io/flink-cdc-connectors/
Resources

#DebeziumSnapshotting @gunnarmorling
●Debezium & Kafka Connect – Ask the Experts
With Chris Cranford (Red Hat) and Chris Egerton (Aiven)
Sep 27, 2:30 PM
●Change Stream Processing with Debezium and Apache Flink
With Robert Metzger (Decodable)
Sep 27, 5:30 PM, Dremio Office
https://www.meetup.com/sf-big-analytics/events/294068331/
Upcoming

#DebeziumSnapshotting @gunnarmorling
Q & A
[email protected]
@gunnarmorling
??????
Thank You!