StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
steffenkarlsson2
191 views
22 slides
May 22, 2024
Slide 1 of 22
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
About This Presentation
This is a classic migration case study (the past, current and the future) at scale from a world-wide company transitioning from Confluent Platform and Confluent Cloud to self-managed Apache Kafka on Kubernetes using Strimzi.
At Maersk, we have been architecting, designing and implementing our 3rd g...
This is a classic migration case study (the past, current and the future) at scale from a world-wide company transitioning from Confluent Platform and Confluent Cloud to self-managed Apache Kafka on Kubernetes using Strimzi.
At Maersk, we have been architecting, designing and implementing our 3rd generation Event Streaming Platform. This platform is based on Kubernetes in Azure and using Strimzi to operate Apache Kafka at large scale, highly reliable, segregating data based on isolated use cases. Our 2nd generation was based on OnPrem Confluent Platform and Confluent Cloud and this presentation is the story of this migration and reasoning behind it.
Furthermore, we would get into details on how we monitor (Grafana, Prometheus), alert (GoAlert and alert as code), operate and provide self-service solutions on top of Strimzi to enable business critical application in Maersk, implemented in GoLang using the GitOps deployment model with Flux and Kustomization among others.
Finally, if time allows we will end with a demo of an open-source self service tool to monitor and explore the cluster with most wanted features such as topic message browsing and configuring and restarting connectors.
Size: 4.79 MB
Language: en
Added: May 22, 2024
Slides: 22 pages
Slide Content
Classification: Internal
May 2024
Transition to Apache Kafka
on Kubernetes with Strimzi
Classification: Internal
Speakers
.
Steffen Wirenfeldt Karlsson
Lead Software Engineer
Streaming Services, Maersk [email protected]
Sai Charan Madhvaraj
Lead Software Engineer
RETinA Team, Maersk [email protected]
Ravikanth Mallappa
Lead Platform Architect
Streaming Services, Maersk [email protected]
Classification: Internal
A.P. Moller - Maersk4
START
Meet the customer
Booking Set for reliable and cost-efficient
shipping to the final destination.
Collect the goods
Production: Picked up at the
customers’ facilities at any
place in the world.
Store the goods
Warehouse: Goods are stored or
managed throughout the supply
chain, based on customers’ needs.
Transport the goods
Transportation: Moved by the world’s
most sustainable fleet through Maersk's
global transport network.
Clear the goods at arrival
Import terminal: Taken through customs
promptly and efficiently.
Store the goods
Warehouse: Stored and managed for
optimisation of stock, costs and inventory days.
END
Deliver the goods
Customer warehouse or shop
Seamlessly delivered at the
destination of the customers'
preference.
Clear the goods for departure
Export terminal: Taken
through customs promptly
and efficiently.
ALL THE WAY
Connecting and simplifying
our customers’ supply chains
Rail FreightConnecting and
simplifying
global supply chains
A.P. Moller - Maersk enables its customers to trade
and grow by transporting goods anywhere.
Maersk works to provide customers with a simple end-
to-end offering of products and services, seamless
customer engagement and a superior end-to-end
delivery network, taking the complexity out of global
supply chains.
Past
Based on Confluent
Classification: Internal
Why
A.P. Moller - Maersk6
A.P. Moller - Maersk.
PastPresentFuture
Strategy to move from
batch to event driven
enterprise architecture
Fast and easy
bootstrap with
PaaS and SaaS
Gain trust of the
business
Cost-benefit analysis:
Pay-as-you-go
Classification: Internal
Architecture
•Confluent Cloud and Enterprise (on-prem)
•Datadog for observability and monitoring
•Azure for KeyVault, Container Registry, DevOps and AD
•Confluent Replicators from G2C and C2G
•Confluent Connectors from MQ applications to Cloud
•Confluent Control Center for management
•Automated GitOps-like self-service using GitHub and Azure DevOps
pipelines to deploy topics, users etc.
•Custom reconciler as Azure Pipeline, to bring cluster to up to state
with expected self-service
A.P. Moller - Maersk7
PastPresentFuture
Cloud
Enterprise On
-Prem
Internal On
-Prem
Present
Based on Strimzi
Classification: Internal
Why
A.P. Moller - Maersk9
A.P. Moller - Maersk.
Open-source adoption
strategy, benefitting from
the economy of scale
Building up internal
capabilities and inner-
sourcing for a more
tailored solution
Internal knowledge on
how to manage and
administer a Kafka
ecosystem
Rapidly increasing
adoption of the platform,
point of no return
PresentPastFuture
Classification: Internal
API
RETinA Architecture
A.P. Moller - Maersk10
A.P. Moller - Maersk.
PresentPastFuture
Replication
Region XRegion YSelf-service
External
Systems
Runtime
Classification: Internal
Features
•Support for various Cluster Topologies
oShared: Central cluster hosting topics of multiple platforms/tenants with required governance using quotas and schemas
oDedicated: Independent Kafka clusters for platform teams with varying requirements
oHub and Spoke: Aggregates data between local and central Kafka clusters
•Highly available with active-active muti-region setup and 99.99% SLA/SLO, implemented using Mirror Maker 2
•Enforced schema management on shared cluster (AVRO, JSON and Protobuf)
•External Data connectivity using Kafka Connect and Debezium
•API first architecture with automated self-service capabilities on top, using GitHub Actions
•Clusters constantly up-to-date by reconciliation using Flux
•Fully monitored and observed using Grafana and Prometheus stack
•Secret management with HashiCorp Vault and synced to the clusters using External Secrets
A.P. Moller - Maersk11
PresentPastFuture
Classification: Internal
Tenants
Number of teams using our
Strimzi-based solution+210
Number of Strimzi-based clusters
actively running
Number of brokers in all
our clusters
Topics
Total number of topics on
all our clusters
Total number of partitions
on all our topics+177K
Number of AVRO and JSON schema
versions in Schema Registry+22K
MAERSK
Events
Total number of messages
produced per day, avg+250M
Bytes in per second, avg~65Mb
~90Mb
Improvinglife
forallbyintegrating
the world
OURPURPOSE
The integrationillustratedbyfiveyearsofAutomatic Identification
System(AIS) transponderdatafrom A.P.Moller-Maersk vessels
registeredinthecompany’sschedulingsystem GSIS
Gateway andhubterminals
63
+300
+13K
Bytes out per second, avg
Classification: Internal
RETinA Manager
A.P. Moller - Maersk13
PresentPastFuture
Unique users
per day
+300
Requests
per day
+7.5K
https://github.com/provectus/kafka-ui
Classification: Internal
Schema Compatibility UI
A.P. Moller - Maersk14
PresentPastFuture
https://github.com/steffen-karlsson/schema-compatibility-ui
•Standalone and open-source
Apache-2.0 license
•Fail fast and enable tenants to be
more independent to decrease
time-to-market
•Full self-service and transparency in
schema compatibility and
comparison
•Schema Types:
Avro, JSON and Protobuf
•Compatibility levels:
Backwards, Forwards, Full and None
Classification: Internal
Observability & Monitoring
•Fully internal and open-source
based observability platform
•Based on the Prometheus Grafana
stack and OpenTelemetry
•Real User Monitoring
•Synthetic Monitoring
A.P. Moller - Maersk15
PresentPastFuture
Classification: Internal
Alerting
•Aggregated alerting by cluster and namespace
•Alerts and Dashboards as code
•Automated deployment and synchronization using GitHub Actions
•Alert prediction per environment
•SLA/SLO 99.99%
A.P. Moller - Maersk16
PresentPastFuture
Classification: Internal
Strimzi Contributions
A.P. Moller - Maersk17
PresentPastFuture
•#3761 Provide metrics to monitor certificates expiration
•#2779 CrdGenerator validate @JsonPropertyOrder
•#8732 Enhance KafkaBridge resource with consumer inactivity
timeout and HTTP consumer/producer parts enablement
•#9537 Kafka Exporter Grafana dashboard too long URL error
•#7374 Improve the Kafka brokers Grafana dashboard
Future
... still based on Strimzi
Classification: Internal
A.P. Moller - Maersk19
FuturePresentPast
Replication
Replication
APISelf-service
•Future will be to migrate current Kafka
infrastructure running on VMs to K8S based
deployment using Strimzi
•Migration of existing tenants to Strimzi based
cluster will be done using MM2
•On-prem infrastructure to be fully monitored
and observed using the Grafana and
Prometheus stack
RETinA On-Prem
Classification: Internal
Stream Processing
A.P. Moller - Maersk20
•Automated self-serviceable stream processing
on-top of Strimzi with Flink and GitHub Actions
•API-first architecture for better system-to-
system integrations
•Predefined template jobs for better overall
performance on the cluster
•Deployed using open-source community operator
for better stability and configuration of Flink
•Fully monitored and observed using the Grafana
and Prometheus stack
FuturePresentPast
Replication
APISelf-service