Elasticity vs. State? Exploring Kafka Streams Cassandra State Store

ScyllaDB 305 views 27 slides Jun 21, 2024

Slide 1 of 27

About This Presentation

kafka-streams-cassandra-state-store' is a drop-in Kafka Streams State Store implementation that persists data to Apache Cassandra.

By moving the state to an external datastore the stateful streams app (from a deployment point of view) effectively becomes stateless. This greatly improves elastic...

Size: 11.57 MB

Language: en

Added: Jun 21, 2024

Slides: 27 pages

Slide Content

Elasticity vs. State? A New Kafka Streams State Store Hartmut Armbruster, Software Engineer at Thriving.dev

Hartmut Armbruster Software Architect & Developer Spent the past years working with distributed systems and real time data processing with clients such as HSBC, NEX Group plc, Deutsche Bahn. Striving to see the bigger picture, passionate about architecture, combining, integrating, and bringing all things together.

Recap: Kafka Streams, State Stores, RocksDB Managing State: Challenges & Opportunities Kafka Streams Cassandra State Store ‘Drop-in’ State Store Alternative - Quickstart Data Models & Supported Store Types Interactive Queries Conclusions, Limitations, Next Steps Presentation Agenda

Recap Kafka Streams, State Store, RocksDB

Apache Kafka Distributed Streaming Platform Stores its data in topics, immutable Each topic has partitions

Functional Java API Stream Processing on data stored in Kafka Kafka Streams Processor Topology Source(s) Stateless Stream Processors filter, map, flatMap Stateful Stream Processors join, group, window, aggregate Sink(s)

Stateful Processors -> State Stores State is l ocal , distributed across replicas State Stores Types RocksDB InMemory Writes to local + changelog topics ‘State Restore’ from changelog if local state is lost

Managing State Challenges & Opportunities

Challenges for Stateful Topologies (1) Example Use Case Real-time data processing Considerably large state ⚡ App Upgrade ⚡ Security Patching ⚡ Infrastructure Failure ⚡ Scaling up/down Mitigation Use ‘persistent’ stores (RocksDB) R un Stateful (keep disks -> local state) Prevent /M inimise Rebalances Static Group Membership ⚠️ Standby Replicas Warmup Replicas Moving Restoration to a Dedicated Thread (KAFKA-10199) Tuning Restore Consumer Config State restore required 😱 …may take minutes/hours!!

Challenges for Stateful Topologies (2) Idle Streams Group Members Following rebalancing & task re-assignment new members get warmup tasks assigned Warmup Replicas catch up Transition from ‘Warmup -> Active’ Never Happens Unbalanced assignment Idle Replicas Performance Degradation Unused Allocated Resources 💸

Kafka Streams Cassandra State Store

Kafka Streams Cassandra State Store thriving-dev/kafka-streams-cassandra-state-store

Why Cassandra / ScyllaDB? Distributed Architecture Scalability (r/w & data) High Availability through Data Replication High Fault Tolerance High Performance (Expiring Data with TTL)

Quickstart (1) - Get it! <dependency> <groupId>dev.thriving.oss</groupId> <artifactId>kafka-streams-cassandra-state-store</artifactId> <version>0.8.4</version> </dependency> Maven implementation 'dev.thriving.oss:kafka-streams-cassandra-state-store:0.8.4' Gradle implementation 'com.scylladb:java-driver-core:4.17.0.0' +

Quickstart (2) - High-level DSL ( RocksDB is used by default) KTable<Long,String> table = builder.table("topicName", Materialized.<Long,String>as( CassandraStores.builder(session, "store-name") .partitionedKeyValueStore()) .withKeySerde(Serdes.Long()) .withValueSerde(Serdes.String()) .withLoggingDisabled() ); KTable<Long,String> table = builder.table("topicName"); Cassandra ‘p artitionedKeyValueStore’

Basic usage example: Quickstart (3) - Builder CassandraStores.builder(session, "orders") .partitionedKeyValueStore(); Advanced usage example: CassandraStores.builder(session, "orders") .withKeyspace("order_processing") .withDmlExecutionProfile("kstreams-dml") .withTableOptions(""" compaction = { 'class' : 'LeveledCompactionStrategy' } AND default_time_to_live = 86400 """) .withCountAllEnabled() .partitionedKeyValueStore();

Under t he h ood - Data Model Key/Value are of type BLOB Any Payloads supported String, Number, Avro, Protobuf, … Serialized -> bytes

Under the hood - Primary Keys Composite Primary Key Having ` key ` as Clustering Key allows additional IQs such as `range` & `prefixScan` No. of partitions defined by Streams tasks (source topic partitions) ⚠️ Large Partitions BATCH inserts possible & efficient partitionedKeyValueStore global KeyValueStore Store Entry `key` as Primary Key High Cardinality, works for any data volumes No Support for `range` & `prefixScan` Lookups possible from any instance, knowledge/context of partition/task is not required

Store Types partitioned KeyValueStore global KeyValueStore partitioned Versioned KeyValueStore global Versioned KeyValueStore

Interactive Queries

Interactive Queries, REST API

Regular Kafka Streams State Store Interactive Queries, REST API

Cassandra Kafka Streams State Store Interactive Queries, REST API

Conclusions

Conclusions D rop-in replacement state store P ersistent, allows for very large state No changelog topic, no state restore required -> run as stateless -> instant rebalancing -> reduce rebalance downtimes & recovery time -> improve elasticity + scalability Exactly-Once Semantics (EOS) not supported No consistency guarantees for non-idempotent data streams on hard failures No window & session store support (yet) Experimental The bad and the ugly

Next Steps https://github.com/thriving-dev/kafka-streams-cassandra-state-store Buffered + batched writes at offset commit Add Window & Session Store Support Address shortcomings on reliability, consistency, EOS Benchmark Add in-memory read cache Open Source. Apache-2.0 license.

Stay in Touch Hartmut Armbruster @hartmut_co_uk @hartmut-co-uk @hartmut--thriving-dev thriving-dev/kafka-streams-cassandra-state-store @thriving_dev https://thriving.dev

Elasticity vs. State? Exploring Kafka Streams Cassandra State Store

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Elasticity vs. State? Exploring Kafka Streams Cassandra State Store

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

MGV Residential Design projects for different clients, including a New Mexico Adobe project-1-.pdf

EUNITED_Advocacy and Public Engagement through Visual Media

DESIGN THINKINGGG PPT 2 TOPIC IDEATION.pptx

DESIGN THINKING CHAPTER 1 PPTT PPT 1.pptx

Hinduism and Its History - PowerPoint Slides.pptx

Service Attributes of Manufactured Parts.pptx