Kafka streams - From pub/sub to a complete stream processing platform

PaoloCastagna1 1,040 views 36 slides Feb 12, 2018
Slide 1
Slide 1 of 36
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36

About This Presentation

A presentation on Kafka Streams APIs (part of Apache Kafka) and the innovative capabilities which brings in the world of open source stream processing engines. Simplicity (but powerful) and focus on developers being the biggest innovation.


Slide Content

Kafka Streams

From pub/sub to a complete
stream processing platform



Kafka Meetup Utrecht
Thursday, 8
th
June 2017
< paolo @ confluent.io >

https://www.confluent.io/blog/stream-data-platform-1/
Industry shift from Big Data
to Fast Data and Stream Processing

$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt
Apache Kafka APIs and UNIX analogy

$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt
Connect APIs
Apache Kafka APIs and UNIX analogy

$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt
Producer/Consumer APIs
Apache Kafka APIs and UNIX analogy

$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt
Streams APIs
Apache Kafka APIs and UNIX analogy

Streams APIs
part of Apache Kafka

http://kafka.apache.org/documentation/streams
http://docs.confluent.io/current/streams

Build applications, not clusters
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>0.10.2.1</version>
</dependency>

Spot the difference(s)

How do I run in production?

How do I run in production?
As any other Java applications...

How do I run in production?
Uncool Cool

How do I run in production?
http://docs.confluent.io/current/streams/introduction.html

Elastic and scalable
http://docs.confluent.io/current/streams/developer-guide.html#elastic-scaling-of-your-application

Elastic and scalable
http://docs.confluent.io/current/streams/developer-guide.html#elastic-scaling-of-your-application

Elastic and scalable
http://docs.confluent.io/current/streams/developer-guide.html#elastic-scaling-of-your-application

Typical high level architecture

Typical high level architecture
Real-time
Data
Ingestion

Typical high level architecture
Stream
Processing
Storage
Real-time
Data
Ingestion

Typical high level architecture
Data
Publishing /
Visualization
Stream
Processing
Storage
Real-time
Data
Ingestion

How many clusters do you count?
NoSQL
(Cassandra,
HBase,
Couchbase,
MongoDB, …)
or
Elasticsearch,
Solr,

Storm, Flink,
Spark
Streaming,
Ignite, Akka
Streams, Apex,

HDFS, NFS,
Ceph,
GlusterFS,
Lustre,
...
Apache Kafka

Simplicity is the ultimate sophistication
Apache Kafka
and Kafka Streams APIs
Stream Processing Platform

Publish & Subscribe
to streams of data like a
messaging system

Store
streams of data safely in a
distributed replicated cluster

Process
streams of data efficiently
and in real-time

Node.js

Duality of Streams and Tables
http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables

Duality of Streams and Tables
http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables

Interactive Queries
http://docs.confluent.io/current/streams/developer-guide.html#streams-developer-guide-interactive-queries

Interactive Queries
http://docs.confluent.io/current/streams/developer-guide.html#streams-developer-guide-interactive-queries

Kafka Streams DSL
http://docs.confluent.io/current/streams/developer-guide.html#kafka-streams-dsl

WorldCount (and Java 8)
WordCountLambdaExample.java
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put (StreamsConfig.APPLICATION_ID_CONFIG , "wordcount-lambda-example");
streamsConfiguration.put (StreamsConfig.BOOTSTRAP_SERVERS_CONFIG , bootstrapServers);

...

final Serde<String> stringSerde = Serdes.String();
final Serde<Long> longSerde = Serdes.Long();

final KStreamBuilder builder = new KStreamBuilder();

final KStream<String, String> textLines = builder.stream(stringSerde, stringSerde, "TextLinesTopic");

final Pattern pattern = Pattern.compile("\\W+", Pattern.UNICODE_CHARACTER_CLASS );

final KTable<String, Long> wordCounts = textLines
.flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase())))
.groupBy((key, word) -> word)
.count("Counts");

wordCounts.to(stringSerde, longSerde, "WordsWithCountsTopic ");

final KafkaStreams streams = new KafkaStreams(builder, streamsConfiguration );

streams.cleanUp();
streams.start();

Runtime.getRuntime().addShutdownHook(new Thread(streams::close));

Easy to Develop, Easy to Test
WordCountLambdaIntegrationTest.java
EmbeddedSingleNodeKafkaCluster CLUSTER =
new EmbeddedSingleNodeKafkaCluster ();



CLUSTER.createTopic(inputTopic);



Properties producerConfig = new Properties();
producerConfig.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG ,
CLUSTER.bootstrapServers ());

Apache Kafka and Streams APIs benefits
•Build applications, not clusters
•Native integration with Apacke Kafka
•Elastic, fast, distributed, fault-tolerant, secure
•Scalable: S, M, L, XL, XXL
•Run everywhere: from containers to cloud
•Streams (with KStream) and tables (with KTable)
•Local state replicated to Kafka for fault-tolerance
•Windowing and event time semantics out of the box
•Supports late-arriving and out-of-order events

References
•http://kafka.apache.org/
•http://kafka.apache.org/documentation/streams/

•http://docs.confluent.io/
•http://docs.confluent.io/current/streams/

•http://docs.confluent.io/current/streams/javadocs/

•http://blog.confluent.io/

•http://github.com/confluentinc/examples/

•http://github.com/apache/kafka/tree/trunk/streams/

References

The easiest way to get you started
https://www.confluent.io/download/

SIMPLICITY
WE

YOUR FEEDBACK!

Discount code: kafcom17
‪Use the Apache Kafka community discount code to get $50 off
‪www.kafka-summit.org
Kafka Summit San Francisco: August 28
Presented by