Strumenti e Strategie di Stream Governance con Confluent Platform

ConfluentInc 66 views 46 slides Sep 20, 2024
Slide 1
Slide 1 of 46
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46

About This Presentation

All your data continuously streamed, processed,
governed and shared as a product,
making it instantly valuable, usable, and
trustworthy everywhere


Slide Content

Strumenti e Strategie di Stream
Governance con Confluent Platform
Federico Alberton
Customer Success Technical Architects,
Confluent Italy
Paolo Venturini
Customer Success Technical Architects,
Confluent Italy

11:00 - 11:20

11:20 - 11:40
11:20 - 11:40



11:40 - 11:50
12:00
Stream Governance & Technical features:
- Schema, Schema Registry, Data Contracts

Sessione di Demo
Roadmap & next steps:
- Stream Catalog, Data Portal, Stream Lineage, Client Side Field Level
Encryption

Q&A Session
Chiusura Lavori

Agenda

Confluent and Confluent Platform

Confluent Data Streaming
Platform
All your data continuously streamed, processed,
governed and shared as a product,
making it instantly valuable, usable, and
trustworthy everywhere.
CONNECT
PROCESS
GOVERN
STREAM
Accounts
Customer
Profile
Purchases
Shipments
Claims Orders
Clickstreams
From Data Mess To Data Products
To Instant Value
Everywhere
CONNECT
Data
Systems
Inventory
Replenishment
Forecasting

Custom Apps &
Microservices
Personalization
Recommendation
Fraud

Confluent Platform
5https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/
Application
Sticky Load Balancer
REST Proxy
Proxy
Kafka Brokers
Broker +
Rebalancer
Quorum Nodes (ZooKeeper or KRaft)
Q Q Q
Proxy
Broker +
Rebalancer
Broker +
Rebalancer
Broker +
Rebalancer
Schema Registry
Leader Follower
Q Q
Confluent
Control Center
Application
Clients
KStreams
pp
Streams
Kafka Connect
Worker +
Connectors
or
Replicator
Microservice
Worker +
Connectors
or
Replicator
ksqlDB
ksqlDB
Server
ksqlDB
Server

Data contracts

Data producer/owner
-Produce high quality data
-Evolve data safely
-Make data contextualized
and discoverable
-Share data
Data platform team
-Design and offer a self-serve data
streaming platform
-Facilitate the onboarding of
developers and other data users

Data consumer
-Search and discover data
-Understand data
-Trust data
-Consume and build on high
quality data
Remove friction at scale without centralization

produces
Kafka without Schema
No structure can cause
many problems
Producer
Consumer
consume
Consumer
Consumers

produces
Kafka without Schema
Who is encoding and
decoding?
Producer
Consumer
consume
Consumer
Consumers
Serializer Deserializer

Major problems without the schema
























Breaking changes Low quality data Security challenges








Data producers and
consumers evolve
independently. Changes to
the data schema (e.g.,
adding a new field,
changing data types) can
break consumers if not
managed properly.
Without a central
mechanism to enforce
schema adherence,
producers may send
malformed or incompatible
data, leading to data quality
issues downstream (Missing
data, Incorrect data, etc )
Sensitive data cannot be
protected properly and can
be accessed by wrong users
/ applications

{
"type": "record",
"name": "Membership",
"fields": [
{"name": "start_date", "type": {"type": "int", "logicalType": "date"}},
{"name": "end_date", "type": {"type": "int", "logicalType": "date"}},
{"name": "email", "type": "string"},
{"name": "ssn", "type": "string"}
]
}
Example Avro Schema

{
"type": "record",
"name": "Membership",
"fields": [
{"name": "start_date", "type": {"type": "int", "logicalType": "date"}},
{"name": "end_date", "type": {"type": "int", "logicalType": "date"}},
{"name": "email", "type": "string"},
{"name": "ssn", "type": "string"},
{"name": "full_name", "type": ["null", "string"]}
]
}

Change in Structure
Can be disruptive

Structure Ignored
Unpredictable outcomes

SCHEMA



"start_date", "int"
"end_date", "int"
"email", "string"
"ssn", "string"
{
"start_date": 19358,
"end_date": 19722,
"email": 100

}
// 2023-01-01
// 2023-12-31
INCORRECT DATA TYPE
MISSING SSN FIELD

OUT OF ORDER
{
"start_date": 19358,
"end_date": 0,
"email": "john.doe",
"ssn": "fizzbuzz"
}
// 2023-01-01
Domain integrity ignored
Garbage in, garbage out

SCHEMA



"start_date", "int"
"end_date", "int"
"email", "string"
"ssn", "string"
INVALID EMAIL
INVALID SSN

{
"start_date": 19358,
"end_date": 19722,
"email": "[email protected]",
"ssn": "856-45-6789"
}
Example of a high quality
data
Proper structure and
semantics
// 2023-01-01
// 2023-12-31

SCHEMA



"start_date", "int"
"end_date", "int"
"email", "string"
"ssn", "string"

CLEAR TEXT
{
"start_date": 19358,
"end_date": 19722,
"email": "[email protected]",
"ssn": "856-45-6789"
}
Data security gaps
No Data Privacy
// 2023-01-01
// 2023-12-31

SCHEMA



"start_date", "int"
"end_date", "int"
"email", "string"
"ssn", "string"

promises trust
produces
Data contract
structure
Producer
Consumer
consume
Consumer
Consumers
Business
Logic
Business
Logic
semantics
rules
Data Contracts is the
answer
Adding structure,
meaning, and policies

promises trust
produces
Data Contracts
Schema Registry as the foundation
Producer
Schema Registry
Confluent Kafka
Consumer
consume
Consumer
Consumers

Confluent Schema Registry
Schema Repository and more!
1
2
3 1
2
21

promises trust
produces
Data contract
First step: Define the
structure
Structure is the
foundation of the
contract
Producer
Consumer
consume
Consumer
Consumers
structure

{
"type": "record",
"name": "Membership",
"fields": [
{"name": "start_date", "type": {"type": "int", "logicalType": "date"}},
{"name": "end_date", "type": {"type": "int", "logicalType": "date"}},
{"name": "email", "type": "string"},
{"name": "ssn", "type": "string"}
]
}
Example Avro Schema

{
"type": "record",
"name": "Membership",
"fields": [
{"name": "start_date", "type": {"type": "int", "logicalType": "date"}},
{"name": "end_date", "type": {"type": "int", "logicalType": "date"}},
{"name": "email", "type": "string"},
{"name": "ssn", "type": "string"},
{"name": "full_name", "type": ["null", "string"]}
]
}

Constant Change
How do we know if this is allowed?

Set Compatibility for Schema Evolution
Compatibility enables predictable changes

COMPATIBILITY TYPE ALLOWED CHANGES COMPARED TO UPGRADE FIRST
BACKWARD
-Delete fields
-Add optional fields
Latest version Consumers
BACKWARD_TRANSITIVE
-Delete fields
-Add optional fields
All previous versions Consumers
FORWARD
-Add fields
-Delete optional fields
Latest version Producers
FORWARD_TRANSITIVE
-Add fields
-Delete optional fields
All previous versions Producers
FULL
-Add optional fields
-Delete optional fields
Latest version Either
FULL_TRANSITIVE
-Add optional fields
-Delete optional fields
All previous versions Either
NONE -Any change allowed Not compared Depends

{
"schema": "...",
"metadata": {
"properties": {
"owner": "Carol Smith",
"email": "[email protected]",
"gdpr_sensitive": "True",
"retention_period": "24"
}
}
}

Adding Metadata

promises trust
produces
Data contract
structure
Proper semantics for data
Enables proper flow of
data
Producer
Consumer
consume
Consumer
Consumers
Business
Logic
Business
Logic
semantics
rules

Use Data Contract Rules
Data Quality
Constrains the values of
fields and customize
follow-up actions on
incompatible messages.
Data Encryption
Identifies and Encrypts
the value of a field
based on a tag added
to the field.
Data Transformation
Change the value of a
specific field or an entire
message based on a
condition.
Schema Migration
A Transform rule and
allows otherwise
breaking changes to be
performed on a schema
by adding upgrade and
downgrade rules.

// 2023-01-01
// 2023-12-31
{
"start_date": 19358,
"end_date": 19722,
"email": "[email protected]",
"ssn": "856-45-6789"
}
Domain Integrity
Ensuring proper values

SCHEMA



"start_date", "int"
"end_date", "int"
"email", "string"
"ssn", "string"

Data Quality Rule
{
"schema": "...",
"schemaType": "AVRO",
"ruleSet": {
"domainRules": [{
"name": "validateEmail",
"kind": "CONDITION",
"mode": "WRITE",
"type": "CEL",
"doc": "Rule checks email is well formatted and sends record to a DLQ if not." ,
"expr": "Membership.email.matches(r\\\".+\\\\@.+\\\\..+\\\")",
"onFailure": "DLQ",
"params": {
"dlq.topic": "bad_members"
}
}]
}
}

// 2023-01-01
// 2023-12-31
{
"start_date": 19358,
"end_date": 19722,
"email": "[email protected]",
"ssn": "XXX-XX-6789"
}
Data Privacy
Protecting Sensitive Data

SCHEMA



"start_date", "int"
"end_date", "int"
"email", "string"
"ssn", "string"

Data Encryption Rule
{
"schema": "...",
"schemaType": "AVRO",
"metadata": {
"tags": {
"Membership.email": ["PII"]
}
},
"ruleSet": {
"domainRules": [{
"name": "encryptPII",
"type": "ENCRYPT",
"doc": "Rule encrypts every field tagged as PII. " ,
"tags": ["PII"],
"params": {
"encrypt.kek.name": "ce581594-3115-486e-b391-5ea874371e73",
"encrypt.kms.type": "aws-kms",
"encrypt.kms.key.id": "arn:aws:kms:us-east-1:586051073099:key/ce58..."
}
}]
}
}
On Roadmap

Data Transformation Rule
{
"schema": "...",
"schemaType": "AVRO",
"ruleSet": {
"domainRules": [{
"name": "populateDefaultSSN",
"kind": "TRANSFORM",
"type": "CEL_FIELD",
"doc": "Rule checks if ssn is empty and replaces it with 'unspecified' if it is." ,
"mode": "WRITE",
"expr": "name == 'ssn' ; value == '' ? 'unspecified' : value"
}]
}
}

Schema Migration Rule
{
"schema": "...",
"schemaType": "AVRO",
"ruleSet": {
"migrationRules": [{
"name": "changeSsnToSocialSecurityNumber",
"kind": "TRANSFORM",
"type": "JSONATA",
"doc": "Consumer is on new major version and gets socialSecurityNumber while producer sends ssn." ,
"mode": "UPGRADE",
"expr": "$merge([$sift($, function($v, $k) {$k != 'ssn'}), {'socialSecurityNumber': $.'ssn'}])"
}, {
"name": "changeSocialSecurityNumberToSsn",
"kind": "TRANSFORM",
"type": "JSONATA",
"doc": "Consumer is on old major version and gets ssn while producer sends socialSecurityNumber." ,
"mode": "DOWNGRADE",
"expr": "$merge([$sift($, function($v, $k) {$k != 'socialSecurityNumber'}), {'ssn':
$.'socialSecurityNumber'}])"
}]
}

contract
validation
trust
Data Contracts
Shift-left and ensure high quality data
Producer
Schema Registry
Confluent Kafka
Consumer
Consumer
Consumers
Serializer
content
validation
consume
prevented

promises
provides
Data Contracts
High quality data for processing
Producer
Schema Registry
Confluent Kafka
Consumer
Consumer
Consumers
Deserializer
content
validation
prevented
contract
validation

Solution overview
























Breaking changes Low quality data Security challenges








Utilize validation and
transformation rules
Take advantage of metadata
capabilities like tagging,
business metadata, etc
Protect sensitive data using
Confluent Security and
Governance capabilities
Use schema registry and
define schemas
Evolve them as needed
Use schema validation

Demo

Roadmap

●Provide hybrid monitoring, governance
and management in a unified, secure
architecture:
✓Data Portal Sync
✓Data Policies
✓Observability - Metrics and Alerting
✓Cluster Linking & Mirrored Topics
Hybrid Connected Cloud
Extend Data Streaming Platform capabilities to
self-managed clusters
On Roadmap for 2025

Data Discovery

An Example from Amazon
Discoverable
Owner
Accessible
Contextualized
Standardized
Secure
Trustworthy
Self Describing
Metadata

Search and discover data
using Data Portal
On Roadmap for 2025

Understand the data lineage
On Roadmap for 2025

Stream Catalog
On Roadmap for 2025

Q&A Session

falberton@confluent.io
pventurini@confluent.io
Grazie!
Tags