Sessione di Demo
Roadmap & next steps:
- Stream Catalog, Data Portal, Stream Lineage, Client Side Field Level
Encryption
Q&A Session
Chiusura Lavori
Agenda
Confluent and Confluent Platform
Confluent Data Streaming
Platform
All your data continuously streamed, processed,
governed and shared as a product,
making it instantly valuable, usable, and
trustworthy everywhere.
CONNECT
PROCESS
GOVERN
STREAM
Accounts
Customer
Profile
Purchases
Shipments
Claims Orders
Clickstreams
From Data Mess To Data Products
To Instant Value
Everywhere
CONNECT
Data
Systems
Inventory
Replenishment
Forecasting
…
Custom Apps &
Microservices
Personalization
Recommendation
Fraud
…
Confluent Platform
5https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/
Application
Sticky Load Balancer
REST Proxy
Proxy
Kafka Brokers
Broker +
Rebalancer
Quorum Nodes (ZooKeeper or KRaft)
Q Q Q
Proxy
Broker +
Rebalancer
Broker +
Rebalancer
Broker +
Rebalancer
Schema Registry
Leader Follower
Q Q
Confluent
Control Center
Application
Clients
KStreams
pp
Streams
Kafka Connect
Worker +
Connectors
or
Replicator
Microservice
Worker +
Connectors
or
Replicator
ksqlDB
ksqlDB
Server
ksqlDB
Server
Data contracts
Data producer/owner
-Produce high quality data
-Evolve data safely
-Make data contextualized
and discoverable
-Share data
Data platform team
-Design and offer a self-serve data
streaming platform
-Facilitate the onboarding of
developers and other data users
Data consumer
-Search and discover data
-Understand data
-Trust data
-Consume and build on high
quality data
Remove friction at scale without centralization
produces
Kafka without Schema
No structure can cause
many problems
Producer
Consumer
consume
Consumer
Consumers
produces
Kafka without Schema
Who is encoding and
decoding?
Producer
Consumer
consume
Consumer
Consumers
Serializer Deserializer
Major problems without the schema
Breaking changes Low quality data Security challenges
Data producers and
consumers evolve
independently. Changes to
the data schema (e.g.,
adding a new field,
changing data types) can
break consumers if not
managed properly.
Without a central
mechanism to enforce
schema adherence,
producers may send
malformed or incompatible
data, leading to data quality
issues downstream (Missing
data, Incorrect data, etc )
Sensitive data cannot be
protected properly and can
be accessed by wrong users
/ applications
}
// 2023-01-01
// 2023-12-31
INCORRECT DATA TYPE
MISSING SSN FIELD
OUT OF ORDER
{
"start_date": 19358,
"end_date": 0,
"email": "john.doe",
"ssn": "fizzbuzz"
}
// 2023-01-01
Domain integrity ignored
Garbage in, garbage out
{
"start_date": 19358,
"end_date": 19722,
"email": "[email protected]",
"ssn": "856-45-6789"
}
Example of a high quality
data
Proper structure and
semantics
// 2023-01-01
// 2023-12-31
promises trust
produces
Data contract
structure
Producer
Consumer
consume
Consumer
Consumers
Business
Logic
Business
Logic
semantics
rules
Data Contracts is the
answer
Adding structure,
meaning, and policies
promises trust
produces
Data Contracts
Schema Registry as the foundation
Producer
Schema Registry
Confluent Kafka
Consumer
consume
Consumer
Consumers
promises trust
produces
Data contract
First step: Define the
structure
Structure is the
foundation of the
contract
Producer
Consumer
consume
Consumer
Consumers
structure
promises trust
produces
Data contract
structure
Proper semantics for data
Enables proper flow of
data
Producer
Consumer
consume
Consumer
Consumers
Business
Logic
Business
Logic
semantics
rules
Use Data Contract Rules
Data Quality
Constrains the values of
fields and customize
follow-up actions on
incompatible messages.
Data Encryption
Identifies and Encrypts
the value of a field
based on a tag added
to the field.
Data Transformation
Change the value of a
specific field or an entire
message based on a
condition.
Schema Migration
A Transform rule and
allows otherwise
breaking changes to be
performed on a schema
by adding upgrade and
downgrade rules.
Data Encryption Rule
{
"schema": "...",
"schemaType": "AVRO",
"metadata": {
"tags": {
"Membership.email": ["PII"]
}
},
"ruleSet": {
"domainRules": [{
"name": "encryptPII",
"type": "ENCRYPT",
"doc": "Rule encrypts every field tagged as PII. " ,
"tags": ["PII"],
"params": {
"encrypt.kek.name": "ce581594-3115-486e-b391-5ea874371e73",
"encrypt.kms.type": "aws-kms",
"encrypt.kms.key.id": "arn:aws:kms:us-east-1:586051073099:key/ce58..."
}
}]
}
}
On Roadmap
Data Transformation Rule
{
"schema": "...",
"schemaType": "AVRO",
"ruleSet": {
"domainRules": [{
"name": "populateDefaultSSN",
"kind": "TRANSFORM",
"type": "CEL_FIELD",
"doc": "Rule checks if ssn is empty and replaces it with 'unspecified' if it is." ,
"mode": "WRITE",
"expr": "name == 'ssn' ; value == '' ? 'unspecified' : value"
}]
}
}
Schema Migration Rule
{
"schema": "...",
"schemaType": "AVRO",
"ruleSet": {
"migrationRules": [{
"name": "changeSsnToSocialSecurityNumber",
"kind": "TRANSFORM",
"type": "JSONATA",
"doc": "Consumer is on new major version and gets socialSecurityNumber while producer sends ssn." ,
"mode": "UPGRADE",
"expr": "$merge([$sift($, function($v, $k) {$k != 'ssn'}), {'socialSecurityNumber': $.'ssn'}])"
}, {
"name": "changeSocialSecurityNumberToSsn",
"kind": "TRANSFORM",
"type": "JSONATA",
"doc": "Consumer is on old major version and gets ssn while producer sends socialSecurityNumber." ,
"mode": "DOWNGRADE",
"expr": "$merge([$sift($, function($v, $k) {$k != 'socialSecurityNumber'}), {'ssn':
$.'socialSecurityNumber'}])"
}]
}
contract
validation
trust
Data Contracts
Shift-left and ensure high quality data
Producer
Schema Registry
Confluent Kafka
Consumer
Consumer
Consumers
Serializer
content
validation
consume
prevented
promises
provides
Data Contracts
High quality data for processing
Producer
Schema Registry
Confluent Kafka
Consumer
Consumer
Consumers
Deserializer
content
validation
prevented
contract
validation
Solution overview
Breaking changes Low quality data Security challenges
Utilize validation and
transformation rules
Take advantage of metadata
capabilities like tagging,
business metadata, etc
Protect sensitive data using
Confluent Security and
Governance capabilities
Use schema registry and
define schemas
Evolve them as needed
Use schema validation
Demo
Roadmap
●Provide hybrid monitoring, governance
and management in a unified, secure
architecture:
✓Data Portal Sync
✓Data Policies
✓Observability - Metrics and Alerting
✓Cluster Linking & Mirrored Topics
Hybrid Connected Cloud
Extend Data Streaming Platform capabilities to
self-managed clusters
On Roadmap for 2025
Data Discovery
An Example from Amazon
Discoverable
Owner
Accessible
Contextualized
Standardized
Secure
Trustworthy
Self Describing
Metadata
Search and discover data
using Data Portal
On Roadmap for 2025