Break data silos with real-time connectivity using Confluent Cloud Connectors

ConfluentInc 157 views 29 slides Jul 04, 2024

Slide 1 of 29

About This Presentation

Connectors integrate Apache Kafka® with external data systems, enabling you to move away from a brittle spaghetti architecture to one that is more streamlined, secure, and future-proof. However, if your team still spends multiple dev cycles building and managing connectors using just open source Ka...

Size: 5.29 MB

Language: en

Added: Jul 04, 2024

Slides: 29 pages

Slide Content

Conﬂuent Cloud Connectors
Webinar

Angelica Tacca Dughetti
Solutions Engineer
ataccadughetti@conﬂuent.io

Rohann Pillay
Solutions Engineer
rpillay@conﬂuent.io

01
02
03
04
Introduction to Kafka Connect
Why Conﬂuent Cloud connectors
Secure networking with cloud connectors
Migrating from self-managed to fully managed connectors
Agenda

Traditionally, systems and apps are tightly coupled
with a proliferation of point-to-point integrations

LINE OF BUSINESS 02

LINE OF BUSINESS 01

PUBLIC CLOUD

Common challenges include:
Complex architecture prone to
data silos, batch processing,
and data inconsistency
High system interdependencies
leading to cascading failures and
data loss
Difﬁculties moving data at scale
reliably and connecting across
different environments

Connectors decouple data sources and sinks by
integrating them with a data streaming platform
4
Connectors and Kafka enable:
Simple and future-proof
architecture ﬂexible enough
for new systems or changes
Fault tolerant design in the case
of individual system downtime or
failures
Reliable data movement at
massive scale with schema and
order management

Kafka Connect comes with many built-in advantages
over writing your own producer/consumers
5
Requirements for
streaming integrations
Kafka Connect Custom app
(producer / consumer)
Pre-built integrations Easy Hard
Scaling & parallel processing Easy Medium
High availability Easy Medium
Conﬁguration management Easy Medium
Transformations Easy Hard
Offset management / change data capture Easy Medium
Management via Rest API, CLI Easy Hard
Restarts Easy Medium
Don’t reinvent Kafka Connect if you don’t have to!

Using fully managed connectors is the fastest, most
efﬁcient way to break data silos
6
Custom-built connector
●Costly to allocate resources to
design, build, test, and maintain
non-differentiated data
integration components
●Delays time-to-value, taking
up to 3-6+ engineering months
to develop
●Perpetual management and
maintenance increases tech
debt and risk of downtime
●Pre-built but requires manual
installation / conﬁg efforts to
set-up and deploy connectors
●Perpetual management and
maintenance of connectors that
leads to ongoing tech debt
●Risk of downtime and business
disruption due to connector /
Connect cluster related issues
●Streamlined conﬁgurations and
on-demand provisioning of your
connectors
●Eliminates operational overhead
and management complexity
with seamless scaling and load
balancing
●Reduced risk of downtime with
99.99% uptime SLA for all your
mission critical use cases

Accelerated time-to-value • Increased developer productivity • Reduced operational burden
Self-managed connector Fully managed connector

Connect your entire business with just a few clicks

7
80+
fully
managed
connectors
Amazon S3
Amazon Redshift
Amazon DynamoDB
Google Cloud
Spanner
AWS Lambda
Amazon SQS Amazon Kinesis
Azure Service Bus
Azure Event Hubs
Azure Synapse
Analytics
Azure Blob
Storage
Azure Functions
Azure Data Lake
Google
BigTable

Fully Managed
Connectors
8
Conﬂuent Cloud’s portfolio of 80+
fully managed connectors
enables you to boost developer
productivity, eliminate
operational burden, and
accelerate time to value on your
data in motion journey.

Eliminate operational burdens of
self-managing connectors and
reduce total cost of ownership
Operate your business in real time
by modernizing your data systems
Accelerate your entire pipeline
development process with Stream
Designer, SMTs, and data preview

Easily build real-time data pipelines to your data
warehouse, database, and data lake

Customer’s cloud env
Data stores
(I.e. PostgreSQL, MongoDB
Atlas, MySQL, Oracle DB)
Application data
(i.e. Salesforce, ServiceNow,
Github, Zendesk)
Log data &
messaging systems
(i.e. MQTT, Azure Service Bus,
Azure Event Hubs, Solace)
Amazon
Redshift
Source
connectors
Optional: SMT
Flink

Sink
connectors
Optional: SMT
Conﬂuent Cloud
Kafka topics
Data Warehouses
Snowﬂake
Google
BigQuery
Azure Synapse
Analytics
MongoDB
Atlas
Amazon
DynamoDB
Azure
Cosmos DB
Google
BigTable
Databases
Databricks
Delta Lake
Amazon S3 Google Cloud
Storage
Azure Blob
Storage
Data Lakes
/ /

Only Conﬂuent offers 80+ expert-built, fully managed
connectors across the entire stack

10
Connector conﬁgurations Connector conﬁgurations
Connector development Connector development Connector development
Connector testing Connector testing Connector testing
Connector updates Connector updates Connector updates
Connector support Connector support Connector support
Connect cluster scaling Connect cluster scaling Connect cluster scaling
Connect worker conﬁgs Connect worker conﬁgs Connect worker conﬁgs
Connect internal topics Connect internal topics Connect internal topics
Schema registry Schema registry Schema registry
Monitoring and security Monitoring and security Monitoring and security
Load balancing Load balancing Load balancing
Connect plugin installation Connect plugin installation Connect plugin installation
Other Kafka hosted
services
Apache Kafka -
Kafka Connect
Conﬂuent Fully
Managed Connectors
Ease of use
You Manage
Provider Managed
Connectors
Connect Workers
*Streamlined conﬁgurations
with ability for granular
controls if needed
Provider managed features
Connector conﬁgurations*

Fully managed connectors are also rich with features
that boost developer productivity
Data output preview &
conﬁg validations
Instantly validate connector
conﬁgurations and preview
its outputs to seamlessly
and successfully launch
connectors
Connect log events

View connector events in
the console for contextual
information and error
debugging purposes
AWS IAM
AssumeRole
support*
Easily and securely
manage access to AWS
resources across an
organization with
temporary credentials

Single Message
Transforms (SMTs)
Perform lightweight data
transformations like
masking and ﬁltering in
ﬂight within the source or
sink connector

*coming soon in 2024

Custom connectors
12

Every organization has their unique data
architecture which requires additional ﬂexibility

Home-grown systems
and custom applications
need custom-built
connectors to break data
silos

Pre-built connectors in
the Kafka ecosystem
need additional
modiﬁcations to ﬁt your
speciﬁc work context

Lack of managed
connector options for
the long tail of less
popular data systems
and apps

Custom Connectors
14
Break any data silo without
needing to manage Kafka
Connect infrastructure by
bringing your own connector
plugins to Conﬂuent Cloud

Quickly connect to any data system
using your own Kafka Connect
plugins without code changes
Ensure high availability & performance
using logs and metrics to monitor the
health of your connectors and workers

Eliminate operational burden of
provisioning and perpetually managing
low-level connector infrastructure

Bring your own connectors and let Conﬂuent
provision and manage connector infrastructure

Responsible for
connectors

Connector infrastructure resources
Connect workers Connect clusters
Connect logs
Worker health
metrics

Connector plugins (BYO)

Custom-built
(original) connectors

Connector management
(i.e. upgrades, patching, support)
Modiﬁed connectors
(i.e. custom SMTs)

Partner-built
connectors
Community-built
connectors
Users
Conﬂuent Cloud
Responsible for
Connect infrastructure

Infrastructure management & support

Secure networking with cloud
connectors

The recommended secure networking path depends
on several factors
17
Source/sink data
system type
Endpoint and
environment type

CC cluster
networking type
CC cluster
cloud provider
Your company’s
security and
networking
policies
●1st party systems
(i.e. AWS S3, Azure
CosmosDB, GCP
GCS)
●3rd party systems
(i.e. Snowﬂake,
MongoDB Atlas)
●Self-managed
systems (i.e.
Postgres, MySQL)
●Public vs. private
endpoints
●On-prem vs. cloud
●Public internet
●VPC/VNet Peering
●Transit Gateway
●AWS PrivateLink,
Azure Private Link,
GCP Private
Service Connect
●AWS
●Azure
●GCP

Overview of networking scenarios for fully managed
connectors
18

Public endpoints
Peering or TGWv1
[Dedicated]
PrivateLink/PSC
[Dedicated]
PrivateLink/PSC
[Enterprise]

Public endpoint
Public endpoint w/
static egress IPs

Private endpoint w/
direct connection

Private endpoint w/
public DNS

Private endpoint w/
private DNS

Kafka Cluster Networking
External System Networking
Supported
Coming soon
(FY24)
Not applicable

Public Egress IPs to public endpoints
19
Source
Connector
Kafka Sink
Connector
Data
Source
Data
Sink
/ /
Static egress IPs provide an allow list of never changing IP addresses used by fully managed
connectors for outbound connections, restricting access to your data systems for only authorized clients
*By default cluster and the sink must be in the same cloud and region
Public
Internet
Static IP
range
Static IP
range
Public
Internet
Public cluster

DNS terminology
20

What is DNS?
DNS is the system that translates a URL/Hostname to an IP
address:
www.google.com -> 142.251.40.206
Public vs. Private DNS
This refers to whether or not a DNS record is able to be
resolved over the public internet or only in a private network
(i.e. VPC or on-prem)

Tip: Run dig <hostname> from your laptop to check

DNS Forwarding to private endpoints

21
Available now on AWS and Azure; GCP support coming soon
Customer’s network
Source/Sink
Connector
Data Sources/
Sinks
OR
VPC
Peering
Transit
Gateway
/
Kafka
Dedicated cluster

DNS forwarding enables fully managed connectors to resolve hostnames of private IP addresses by
forwarding the DNS lookup requests to customer-hosted or private DNS zones

PrivateLink terminology
22

What is PrivateLink?
PrivateLink allows for unidirectional connectivity between a
service consumer and a service provider. Communication
between then goes through the cloud service provider’s
backbone.
Consumers vs.
Providers
Service providers: Set up a PrivateLink service which is attached
to a load balancer, the targets of which becomes what is available
through the service
Service consumers: Connect to a PL service by creating an
endpoint that clients connect directly to

Egress Access Points to private endpoints
23
Available on AWS now; Azure Private Link and GCP PSC support coming soon
Source/Sink
Connector

Kafka
Dedicated cluster

Egress Access Points enable private connectivity with your source/sink systems by directly connecting
to a PrivateLink Service from Conﬂuent to your external data systems

Conﬂuent
Egress Gateway
CSP source/sink
SaaS or 3rd party
source/sink
Customer’s network
Data source/sink
Egress Access Point
Egress Access Point
Egress Access Point

A connector’s state is stored in following internal
Kafka topics
24
Internal Kafka Topics Property Description
Conﬁguration topic config.storage.topic
Stores the conﬁguration of all the connectors
and tasks
Offsets topic offset.storage.topicStores the offsets of the source connectors
Status topic status.storage.topicStores the current state of connectors and tasks
Note: The offsets for sink connectors are stored in the __consumer_offset topic like any other
consumer application.

Connector offset management enables seamless
migrations from one connector to another

25
123456 87 109 1112
Producers
Writes
Consumer A
(offset=4)
Consumer B
(offset=7)
Reads
Old New
Connect offset management enables:
Seamless migrations from
self-managed to fully managed
connectors without data duplication

Message replay starting from a
speciﬁc offset during disaster
recovery
Skip bad records causing issues
that can’t be addressed with
existing error-handling features

Migrating connectors from self-managed to fully
managed without any data loss or duplication
26
Get last offset of
self-managed sink
connector
Create fully managed
connector
Pause self-managed
connector
Delete self-managed
connector
Point fully-managed
connector to last offset of
the self managed
connector
1 2 3
45
*Note that for source connectors this will be system dependent as not all source
systems store offsets

Demo

Thank you!
Angelica Tacca Dughetti
Solutions Engineer
ataccadughetti@conﬂuent.io

Rohann Pillay
Solutions Engineer
rpillay@conﬂuent.io

Break data silos with real-time connectivity using Confluent Cloud Connectors

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Break data silos with real-time connectivity using Confluent Cloud Connectors

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 29

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx