Elasticity vs. State? Exploring Kafka Streams Cassandra State Store

ScyllaDB 305 views 27 slides Jun 21, 2024
Slide 1
Slide 1 of 27
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27

About This Presentation

kafka-streams-cassandra-state-store' is a drop-in Kafka Streams State Store implementation that persists data to Apache Cassandra.

By moving the state to an external datastore the stateful streams app (from a deployment point of view) effectively becomes stateless. This greatly improves elastic...


Slide Content

Elasticity vs. State? A New Kafka Streams State Store Hartmut Armbruster, Software Engineer at Thriving.dev

Hartmut Armbruster Software Architect & Developer Spent the past years working with distributed systems and real time data processing with clients such as HSBC, NEX Group plc, Deutsche Bahn. Striving to see the bigger picture, passionate about architecture, combining, integrating, and bringing all things together.

Recap: Kafka Streams, State Stores, RocksDB Managing State: Challenges & Opportunities Kafka Streams Cassandra State Store ‘Drop-in’ State Store Alternative - Quickstart Data Models & Supported Store Types Interactive Queries Conclusions, Limitations, Next Steps Presentation Agenda

Recap Kafka Streams, State Store, RocksDB

Apache Kafka Distributed Streaming Platform Stores its data in topics, immutable Each topic has partitions

Functional Java API Stream Processing on data stored in Kafka Kafka Streams Processor Topology Source(s) Stateless Stream Processors filter, map, flatMap Stateful Stream Processors join, group, window, aggregate Sink(s)

Stateful Processors -> State Stores State is l ocal , distributed across replicas State Stores Types RocksDB InMemory Writes to local + changelog topics ‘State Restore’ from changelog if local state is lost

Managing State Challenges & Opportunities

Challenges for Stateful Topologies (1) Example Use Case Real-time data processing Considerably large state ⚡ App Upgrade ⚡ Security Patching ⚡ Infrastructure Failure ⚡ Scaling up/down Mitigation Use ‘persistent’ stores (RocksDB) R un Stateful (keep disks -> local state) Prevent /M inimise Rebalances Static Group Membership ⚠️ Standby Replicas Warmup Replicas Moving Restoration to a Dedicated Thread (KAFKA-10199) Tuning Restore Consumer Config State restore required 😱 …may take minutes/hours!!

Challenges for Stateful Topologies (2) Idle Streams Group Members Following rebalancing & task re-assignment new members get warmup tasks assigned Warmup Replicas catch up Transition from ‘Warmup -> Active’ Never Happens Unbalanced assignment Idle Replicas Performance Degradation Unused Allocated Resources 💸

Kafka Streams Cassandra State Store

Kafka Streams Cassandra State Store thriving-dev/kafka-streams-cassandra-state-store

Why Cassandra / ScyllaDB? Distributed Architecture Scalability (r/w & data) High Availability through Data Replication High Fault Tolerance High Performance (Expiring Data with TTL)

Quickstart (1) - Get it! <dependency> <groupId>dev.thriving.oss</groupId> <artifactId>kafka-streams-cassandra-state-store</artifactId> <version>0.8.4</version> </dependency> Maven implementation 'dev.thriving.oss:kafka-streams-cassandra-state-store:0.8.4' Gradle implementation 'com.scylladb:java-driver-core:4.17.0.0' +

Quickstart (2) - High-level DSL ( RocksDB is used by default) KTable<Long,String> table = builder.table("topicName", Materialized.<Long,String>as( CassandraStores.builder(session, "store-name") .partitionedKeyValueStore()) .withKeySerde(Serdes.Long()) .withValueSerde(Serdes.String()) .withLoggingDisabled() ); KTable<Long,String> table = builder.table("topicName"); Cassandra ‘p artitionedKeyValueStore’

Basic usage example: Quickstart (3) - Builder CassandraStores.builder(session, "orders") .partitionedKeyValueStore(); Advanced usage example: CassandraStores.builder(session, "orders") .withKeyspace("order_processing") .withDmlExecutionProfile("kstreams-dml") .withTableOptions(""" compaction = { 'class' : 'LeveledCompactionStrategy' } AND default_time_to_live = 86400 """) .withCountAllEnabled() .partitionedKeyValueStore();

Under t he h ood - Data Model Key/Value are of type BLOB Any Payloads supported String, Number, Avro, Protobuf, … Serialized -> bytes

Under the hood - Primary Keys Composite Primary Key Having ` key ` as Clustering Key allows additional IQs such as `range` & `prefixScan` No. of partitions defined by Streams tasks (source topic partitions) ⚠️ Large Partitions BATCH inserts possible & efficient partitionedKeyValueStore global KeyValueStore Store Entry `key` as Primary Key High Cardinality, works for any data volumes No Support for `range` & `prefixScan` Lookups possible from any instance, knowledge/context of partition/task is not required

Store Types partitioned KeyValueStore global KeyValueStore partitioned Versioned KeyValueStore global Versioned KeyValueStore

Interactive Queries

Interactive Queries, REST API

Regular Kafka Streams State Store Interactive Queries, REST API

Cassandra Kafka Streams State Store Interactive Queries, REST API

Conclusions

Conclusions D rop-in replacement state store P ersistent, allows for very large state No changelog topic, no state restore required -> run as stateless -> instant rebalancing -> reduce rebalance downtimes & recovery time -> improve elasticity + scalability Exactly-Once Semantics (EOS) not supported No consistency guarantees for non-idempotent data streams on hard failures No window & session store support (yet) Experimental The bad and the ugly

Next Steps https://github.com/thriving-dev/kafka-streams-cassandra-state-store Buffered + batched writes at offset commit Add Window & Session Store Support Address shortcomings on reliability, consistency, EOS Benchmark Add in-memory read cache Open Source. Apache-2.0 license.

Stay in Touch Hartmut Armbruster @hartmut_co_uk @hartmut-co-uk @hartmut--thriving-dev thriving-dev/kafka-streams-cassandra-state-store @thriving_dev https://thriving.dev
Tags