Apache Kafka Architecture & Fundamentals Explained

ConfluentInc 34,184 views 33 slides Oct 21, 2019
Slide 1
Slide 1 of 33
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33

About This Presentation

Watch this talk here: https://www.confluent.io/online-talks/apache-kafka-architecture-and-fundamentals-explained-on-demand

This session explains Apache Kafka’s internal design and architecture. Companies like LinkedIn are now sending more than 1 trillion messages per day to Apache Kafka. Learn a...


Slide Content

1
Fundamentalsfor ApacheKafka®
Apache Kafka Architecture & Fundamentals Explained
Joe Desmond, Sr.Technical Trainer, Confluent

2
Session Schedule
●Session 1: Benefits of Stream Processing and Apache Kafka Use Cases
●Session 2: Apache Kafka Architecture & Fundamentals Explained
●Session 3: How Apache Kafka Works
●Session 4: Integrating Apache Kafka into your Environment

3
Learning Objectives
After this module you will be able to:
●Identify the key elements in a Kafka cluster
●Name the essential responsibilities of each key
element
●Explain what a Topic is and describe its relation to
Partitions and Segments

4
The World Produces Data

5
Producers

6
Kafka Brokers

7
Consumers

8
Architecture

9
Decoupling Producers and Consumers
●Producers and Consumers are decoupled
●Slow Consumers do not affect Producers
●Add Consumers without affecting Producers
●Failure of Consumer does not affect System

10
How Kafka
Uses
ZooKeeper

11
ZooKeeper Basics
●Open SourceApache Project
●Distributed Key Value Store
●Maintains configuration information
●Stores ACLs and Secrets
●Enables highly reliable distributed coordination
●Providesdistributed synchronization
●Three or five servers form an ensemble

12
Topics
●Topics: Streams of “related” Messages in Kafka
○Is a Logical Representation
○Categorizes Messages into Groups
●Developers define Topics
●Producer Topic: N to N Relation
●Unlimited Number of Topics

13
Topics, Partitions, and Segments

14
Topics, Partitions, and Segments

15
The Log

16
Log Structured Data Flow

17
The Stream

18
Data Elements

19
Brokers Manage Partitions
●Messages of Topic spread across Partitions
●Partitions spread across Brokers
●Each Broker handles many Partitions
●Each Partition stored on Broker’s disk
●Partition: 1..n log files
●Each message in Log identified by Offset
●Configurable Retention Policy

20
Broker Basics
●Producer sends Messages to
Brokers
●Brokers receive and store
Messages
●A Kafka Cluster can have many
Brokers
●Each Broker manages multiple
Partitions

21
Broker Replication

22
Producer Basics
●Producers write Data as Messages
●Can be written in any language
○Native: Java, C/C++, Python, Go,, .NET, JMS
○More Languages by Community
○REST Server for any unsupported Language
●Command Line Producer Tool

23
Load Balancing and Semantic Partitioning
●Producers use a Partitioning Strategy to assign each message to a Partition
●Two Purposes:
○Load Balancing
○Semantic Partitioning
●Partitioning Strategy specified by Producer
○Default Strategy: hash(key) % number_of_partitions
○No Key Round-Robin
●Custom Partitioner possible

24
Consumer Basics
●Consumers pullmessages from 1..n topics
●New inflowing messages are automatically retrieved
●Consumer offset
○Keeps track of the last message read
○Is stored in special topic
●CLI tools exist to read from cluster

25
Consumer Offset

26
Distributed Consumption

27
Scalable Data Pipeline

28
Q&A
Questions:
●Why do we need an odd number of ZooKeeper nodes?
●How many Kafka brokers can a cluster maximally have?
●How many Kafka brokers do you minimally need for high
availability?
●What is the criteria that two or more consumers form a
consumer group?

29
Continue your Apache Kafka Education!
●Confluent Operations for Apache Kafka
●Confluent Developer Skills for Building Apache Kafka
●Confluent Stream Processing using Apache Kafka Streams
and KSQL
●Confluent Advanced Skills for Optimizing Apache Kafka
For more details, seehttp://confluent.io/training

3030
Certifications
Confluent Certified Developer
for Apache Kafka
(aligns to Confluent Developer Skills
for Building Apache Kafka course)
Confluent Certified
Administrator for Apache
Kafka
(aligns to Confluent Operations Skills
for Apache Kafka)
What you Need to Know
○Qualifications: 6-to-9 months hands-on
experience
○Duration: 90 mins
○Availability: Live, online 24/7
○Cost: $150
○Register online:
www.confluent.io/certification

3131
cnfl.io/slack
Stay in touch!
cnfl.io/kafka-trainingcnfl.io/download

32
Thank you for attending!
•Thank you for attending thesession!
•Feedback to:[email protected]

33
Copyright ©Confluent, Inc. 2014-2019. Privacy Policy | Terms & Conditions.
Apache,ApacheKafka,KafkaandtheKafkalogoaretrademarksof
the Apache SoftwareFoundation