Break data silos with real-time connectivity using Confluent Cloud Connectors
ConfluentInc
157 views
29 slides
Jul 04, 2024
Slide 1 of 29
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
About This Presentation
Connectors integrate Apache Kafka® with external data systems, enabling you to move away from a brittle spaghetti architecture to one that is more streamlined, secure, and future-proof. However, if your team still spends multiple dev cycles building and managing connectors using just open source Ka...
Connectors integrate Apache Kafka® with external data systems, enabling you to move away from a brittle spaghetti architecture to one that is more streamlined, secure, and future-proof. However, if your team still spends multiple dev cycles building and managing connectors using just open source Kafka Connect, it’s time to consider a faster and cost-effective alternative.
01
02
03
04
Introduction to Kafka Connect
Why Confluent Cloud connectors
Secure networking with cloud connectors
Migrating from self-managed to fully managed connectors
Agenda
Traditionally, systems and apps are tightly coupled
with a proliferation of point-to-point integrations
LINE OF BUSINESS 02
LINE OF BUSINESS 01
PUBLIC CLOUD
Common challenges include:
Complex architecture prone to
data silos, batch processing,
and data inconsistency
High system interdependencies
leading to cascading failures and
data loss
Difficulties moving data at scale
reliably and connecting across
different environments
Connectors decouple data sources and sinks by
integrating them with a data streaming platform
4
Connectors and Kafka enable:
Simple and future-proof
architecture flexible enough
for new systems or changes
Fault tolerant design in the case
of individual system downtime or
failures
Reliable data movement at
massive scale with schema and
order management
Kafka Connect comes with many built-in advantages
over writing your own producer/consumers
5
Requirements for
streaming integrations
Kafka Connect Custom app
(producer / consumer)
Pre-built integrations Easy Hard
Scaling & parallel processing Easy Medium
High availability Easy Medium
Configuration management Easy Medium
Transformations Easy Hard
Offset management / change data capture Easy Medium
Management via Rest API, CLI Easy Hard
Restarts Easy Medium
Don’t reinvent Kafka Connect if you don’t have to!
Using fully managed connectors is the fastest, most
efficient way to break data silos
6
Custom-built connector
●Costly to allocate resources to
design, build, test, and maintain
non-differentiated data
integration components
●Delays time-to-value, taking
up to 3-6+ engineering months
to develop
●Perpetual management and
maintenance increases tech
debt and risk of downtime
●Pre-built but requires manual
installation / config efforts to
set-up and deploy connectors
●Perpetual management and
maintenance of connectors that
leads to ongoing tech debt
●Risk of downtime and business
disruption due to connector /
Connect cluster related issues
●Streamlined configurations and
on-demand provisioning of your
connectors
●Eliminates operational overhead
and management complexity
with seamless scaling and load
balancing
●Reduced risk of downtime with
99.99% uptime SLA for all your
mission critical use cases
Connect your entire business with just a few clicks
7
80+
fully
managed
connectors
Amazon S3
Amazon Redshift
Amazon DynamoDB
Google Cloud
Spanner
AWS Lambda
Amazon SQS Amazon Kinesis
Azure Service Bus
Azure Event Hubs
Azure Synapse
Analytics
Azure Blob
Storage
Azure Functions
Azure Data Lake
Google
BigTable
Fully Managed
Connectors
8
Confluent Cloud’s portfolio of 80+
fully managed connectors
enables you to boost developer
productivity, eliminate
operational burden, and
accelerate time to value on your
data in motion journey.
Eliminate operational burdens of
self-managing connectors and
reduce total cost of ownership
Operate your business in real time
by modernizing your data systems
Accelerate your entire pipeline
development process with Stream
Designer, SMTs, and data preview
Easily build real-time data pipelines to your data
warehouse, database, and data lake
Customer’s cloud env
Data stores
(I.e. PostgreSQL, MongoDB
Atlas, MySQL, Oracle DB)
Application data
(i.e. Salesforce, ServiceNow,
Github, Zendesk)
Log data &
messaging systems
(i.e. MQTT, Azure Service Bus,
Azure Event Hubs, Solace)
Amazon
Redshift
Source
connectors
Optional: SMT
Flink
Sink
connectors
Optional: SMT
Confluent Cloud
Kafka topics
Data Warehouses
Snowflake
Google
BigQuery
Azure Synapse
Analytics
MongoDB
Atlas
Amazon
DynamoDB
Azure
Cosmos DB
Google
BigTable
Databases
Databricks
Delta Lake
Amazon S3 Google Cloud
Storage
Azure Blob
Storage
Data Lakes
/ /
Only Confluent offers 80+ expert-built, fully managed
connectors across the entire stack
10
Connector configurations Connector configurations
Connector development Connector development Connector development
Connector testing Connector testing Connector testing
Connector updates Connector updates Connector updates
Connector support Connector support Connector support
Connect cluster scaling Connect cluster scaling Connect cluster scaling
Connect worker configs Connect worker configs Connect worker configs
Connect internal topics Connect internal topics Connect internal topics
Schema registry Schema registry Schema registry
Monitoring and security Monitoring and security Monitoring and security
Load balancing Load balancing Load balancing
Connect plugin installation Connect plugin installation Connect plugin installation
Other Kafka hosted
services
Apache Kafka -
Kafka Connect
Confluent Fully
Managed Connectors
Ease of use
You Manage
Provider Managed
Connectors
Connect Workers
*Streamlined configurations
with ability for granular
controls if needed
Provider managed features
Connector configurations*
Fully managed connectors are also rich with features
that boost developer productivity
Data output preview &
config validations
Instantly validate connector
configurations and preview
its outputs to seamlessly
and successfully launch
connectors
Connect log events
View connector events in
the console for contextual
information and error
debugging purposes
AWS IAM
AssumeRole
support*
Easily and securely
manage access to AWS
resources across an
organization with
temporary credentials
Single Message
Transforms (SMTs)
Perform lightweight data
transformations like
masking and filtering in
flight within the source or
sink connector
*coming soon in 2024
Custom connectors
12
Every organization has their unique data
architecture which requires additional flexibility
Home-grown systems
and custom applications
need custom-built
connectors to break data
silos
Pre-built connectors in
the Kafka ecosystem
need additional
modifications to fit your
specific work context
Lack of managed
connector options for
the long tail of less
popular data systems
and apps
Custom Connectors
14
Break any data silo without
needing to manage Kafka
Connect infrastructure by
bringing your own connector
plugins to Confluent Cloud
Quickly connect to any data system
using your own Kafka Connect
plugins without code changes
Ensure high availability & performance
using logs and metrics to monitor the
health of your connectors and workers
Eliminate operational burden of
provisioning and perpetually managing
low-level connector infrastructure
Bring your own connectors and let Confluent
provision and manage connector infrastructure
The recommended secure networking path depends
on several factors
17
Source/sink data
system type
Endpoint and
environment type
CC cluster
networking type
CC cluster
cloud provider
Your company’s
security and
networking
policies
●1st party systems
(i.e. AWS S3, Azure
CosmosDB, GCP
GCS)
●3rd party systems
(i.e. Snowflake,
MongoDB Atlas)
●Self-managed
systems (i.e.
Postgres, MySQL)
●Public vs. private
endpoints
●On-prem vs. cloud
●Public internet
●VPC/VNet Peering
●Transit Gateway
●AWS PrivateLink,
Azure Private Link,
GCP Private
Service Connect
●AWS
●Azure
●GCP
Overview of networking scenarios for fully managed
connectors
18
Public endpoints
Peering or TGWv1
[Dedicated]
PrivateLink/PSC
[Dedicated]
PrivateLink/PSC
[Enterprise]
Public endpoint
Public endpoint w/
static egress IPs
Private endpoint w/
direct connection
Private endpoint w/
public DNS
Private endpoint w/
private DNS
Kafka Cluster Networking
External System Networking
Supported
Coming soon
(FY24)
Not applicable
Public Egress IPs to public endpoints
19
Source
Connector
Kafka Sink
Connector
Data
Source
Data
Sink
/ /
Static egress IPs provide an allow list of never changing IP addresses used by fully managed
connectors for outbound connections, restricting access to your data systems for only authorized clients
*By default cluster and the sink must be in the same cloud and region
Public
Internet
Static IP
range
Static IP
range
Public
Internet
Public cluster
DNS terminology
20
What is DNS?
DNS is the system that translates a URL/Hostname to an IP
address:
www.google.com -> 142.251.40.206
Public vs. Private DNS
This refers to whether or not a DNS record is able to be
resolved over the public internet or only in a private network
(i.e. VPC or on-prem)
Tip: Run dig <hostname> from your laptop to check
DNS Forwarding to private endpoints
21
Available now on AWS and Azure; GCP support coming soon
Customer’s network
Source/Sink
Connector
Data Sources/
Sinks
OR
VPC
Peering
Transit
Gateway
/
Kafka
Dedicated cluster
DNS forwarding enables fully managed connectors to resolve hostnames of private IP addresses by
forwarding the DNS lookup requests to customer-hosted or private DNS zones
PrivateLink terminology
22
What is PrivateLink?
PrivateLink allows for unidirectional connectivity between a
service consumer and a service provider. Communication
between then goes through the cloud service provider’s
backbone.
Consumers vs.
Providers
Service providers: Set up a PrivateLink service which is attached
to a load balancer, the targets of which becomes what is available
through the service
Service consumers: Connect to a PL service by creating an
endpoint that clients connect directly to
Egress Access Points to private endpoints
23
Available on AWS now; Azure Private Link and GCP PSC support coming soon
Source/Sink
Connector
Kafka
Dedicated cluster
Egress Access Points enable private connectivity with your source/sink systems by directly connecting
to a PrivateLink Service from Confluent to your external data systems
Confluent
Egress Gateway
CSP source/sink
SaaS or 3rd party
source/sink
Customer’s network
Data source/sink
Egress Access Point
Egress Access Point
Egress Access Point
A connector’s state is stored in following internal
Kafka topics
24
Internal Kafka Topics Property Description
Configuration topic config.storage.topic
Stores the configuration of all the connectors
and tasks
Offsets topic offset.storage.topicStores the offsets of the source connectors
Status topic status.storage.topicStores the current state of connectors and tasks
Note: The offsets for sink connectors are stored in the __consumer_offset topic like any other
consumer application.
Connector offset management enables seamless
migrations from one connector to another
25
123456 87 109 1112
Producers
Writes
Consumer A
(offset=4)
Consumer B
(offset=7)
Reads
Old New
Connect offset management enables:
Seamless migrations from
self-managed to fully managed
connectors without data duplication
Message replay starting from a
specific offset during disaster
recovery
Skip bad records causing issues
that can’t be addressed with
existing error-handling features
Migrating connectors from self-managed to fully
managed without any data loss or duplication
26
Get last offset of
self-managed sink
connector
Create fully managed
connector
Pause self-managed
connector
Delete self-managed
connector
Point fully-managed
connector to last offset of
the self managed
connector
1 2 3
45
*Note that for source connectors this will be system dependent as not all source
systems store offsets