Confluent:AWS - GameDay.pptx

Ahmed791434 105 views 67 slides Jun 30, 2023
Slide 1
Slide 1 of 67
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67

About This Presentation

AWS & Confluent GameDay


Slide Content

AWS & Confluent GameDay Ahmed Zamzam Senior Partner Solution Engineer

Agenda 10:30 - 10:35 Welcome & Introductions 10:35 - 10:50 Data Analytics on AWS 10:50 - 11:30 Unlock Value with Confluent on AWS 11:30 - 12:15 Lunch & Networking 12:15 - 14:30 GameDay Workshop 14:30 - 15:00 Wrap Up

AWS & Confluent Team Devadyuti Das Sr Partner SA – AWS Bans Sago Sr SE - Confluent Ahmed Zamzam Sr Partner SA - Confluent

Logistics Team Hash Code Wifi Access: SSID: Guest / Pswd: BrokenWires@@2019 Dietary Requirements for Lunch

Feedback - QR Code Survey Link https://amazonmr.au1.qualtrics.com/jfe/form/SV_2ca9vpakNF46Iui Alternatively use the QR Code

AWS for Real-time Analytics and Data Streaming Devadyuti Das Partner Sales Solutions Architect Amazon Web Services (AWS) #AWS #Confluent #GameDay

Out-of-box integration with popular services Unified operations and billing experience Certified and validated by AWS AWS Native Services Top-5 Global ISV for S3 Data Volume 3rd-Party ISV Services Unified Billing Experience Transact directly through AWS Marketplace with flexible building schedules to remove friction and burn down existing AWS commits Unified security & management Enable identical and secure network setups with other AWS Workloads via AWS PrivateLink Extend existing CloudWatch monitoring to Confluent using our fully managed connector Amazon RDS Ready AWS Lambda Ready Amazon Redshift Ready AWS PrivateLink Ready AWS Outposts Ready Validated Service Designations Confluent integrations with AWS

Scalable data lakes Purpose-built for performance and cost Serverless and easy to use Unified data access, security, and governance Built-in machine learning AWS analytics pillars

Amazon OpenSearch Service Amazon Aurora Amazon EMR Amazon SageMaker Amazon DynamoDB Amazon Redshift Amazon S3 INNOVATE MODERNIZE UNIFY Start anywhere Modern data strategy on AWS

The benefits of data lakes Catalog Store all your data in open formats Decouple storage from compute Cost-effectively scale storage to exabytes Process data in place Choice of analytical and ML engines

Federated Query Amazon Redshift Spectrum query Amazon S3 Data lake export Operational databases Query live data and maintain materialized views BI and analytics apps Connect apps to analyze and visualize your data Amazon S3 data lake Keep up to exabytes of data in Amazon S3 SQL Amazon Redshift ML Amazon Redshift Materialized views Data sharing Data marketplaces for third-party data ML and analytics services Analyze open standards- based data formats No ETL! No data movement Secure and consistent data sharing Billions of predictions right within your data warehouse No data duplication ML in SQL Use your favorite BI tool 1,300+ third-party datasets, 350+ data providers Break thru data silos, analyze data across AWS services

Stream processing Visualize Source Mobile device Metering Click streams IoT sensors AWS SDKs Kinesis Agent/KPL Stream ingestion ` Apache Kafka Amazon Kinesis Data Streams Amazon Kinesis Data Firehose In-stream storage Apache Kafka Amazon Kinesis Data Streams Amazon Kinesis Data Analytics AWS Glue Streaming Amazon EMR Destination/Storage Amazon DynamoDB Amazon EMR Amazon S3 Amazon Redshift Amazon OpenSearch Amazon QuickSight Apache Kafka Streaming analytics on AWS

S Confluent Cloud on AWS – reference architecture S3

Unlock value of your data in real-time with Confluent and AWS Ahmed Zamzam Senior Partner Solutions Architect

Agenda ksqlDB 101 Introduction to ksqlDB Serverless Streaming ETL Demo Q/A Rearchitected Kafka, together with the features you need to rapidly deploy production use cases Data streaming with Confluent Real-time Analytics with Confluent and AWS Building event-streaming applications real time analytics pipelines using Confluent and AWS Using Lambda and ksqlDB together

Rich frontend customer experiences CONFLUENT MAKES REAL-TIME DATA STREAMS TOP PRIORITY “We need to shift our thinking from everything at rest to everything in motion .” 16 Rich customer experiences Real-time event s Real-time Event Streams A Sale A shipment A Trade A Customer Experience Data driven operations

Why Real-time data? Source: Perishable insights, Mike Gualtieri, Forrester Data loses value quickly over time Real-time Seconds Minutes Hours Days Months Value of data to decision-making Preventive/Predictive Actionable Reactive Historical Time critical decisions Traditional “batch” business intelligence Information half-life in decision-making

Typical real-time data pipeline Data continuously generated at a high velocity from different sources like IoT devices, Application logs, Online transactions, etc.. Source Data captured and stored in the order it was received for set duration of time, and can be replayed indefinitely. Event Streaming Process, analyse and action on the data as soon as it is generated and in the order it was received Stream Processing Sink data different destinations. Dara Lakes (most common) and/or different Databases Presentation

Everywhere Be everywhere our customers want to be Cloud-Native Re-imagined Kafka experience for the Cloud Complete Enable developers to reliably & securely build next-gen apps faster The Confluent Product Advantage

Leave Kafka reliability worries behind with 99.99% uptime SLA and 10x built-in durability Never worry about Kafka storage limits again with Infinite Storage that’s 10x more scalable and performant Scale and shrink to handle 0 to GBps+ workloads and peak customer demands 10x faster and easier 10x Kafka Confluent Cloud offers a truly fully managed, cloud-native data streaming platform for Apache Kafka, with 10x faster scaling, infinitely more storage, and built-in resilience Resiliency Storage Elasticity

Confluent Platform The Enterprise Distribution of Apache Kafka Confluent Cloud Apache Kafka reengineered for the cloud Self-managed software Fully managed service VM Deploy on any platform, on premises, or cloud Available on Confluent: Everywhere

Federated streaming, hybrid and multi-cloud. Data syndication and replication across and between clouds and on-premises, with self-service APIs, data governance, and visual tooling. Reliable & real-time data streams between all customer sites, so you can run always-on streaming analytics on the data of the entire enterprise, despite regional or cloud provider outages. Everywhere: Cluster Linking Global Central Nervous System

“We are in the business of selling and renting clothes. We are not in the business of managing an event streaming platform… If we had to manage everything ourselves, I would’ve had to hire at least 10 more people to keep the systems up and running .” Architecture planning Cluster sizing Cluster provisioning Broker settings Zookeeper management Partition placement and data durability Source/sink connectors development and maintenance Monitoring and reporting tools setup Software patches and upgrades Security controls and integrations Failover design and planning Mirroring and geo-replication Streaming data governance Load rebalancing and monitoring Expansion planning & execution Utilization optimization and visibility Cluster migrations Infrastructure & performance upgrades / enhancements I N V E S T M E N T & T I M E V A L U E 1 2 3 4 5 Experimentation/ early interest Central nervous system Mission critical, disparate LOBs Identify a project Mission-critical, connected LOBs Key challenges Operational burden and resources Manage and scale platform to support ever-growing demand Security and governance Ensure streaming data is as safe and secure as data-at-rest as Kafka usage scales Real-time connectivity and processing Leverage valuable legacy data to power modern, cloud-based applications and experiences Global availability Maintain high availability across environments with minimal downtime Kafka is hard in experimentation. It only gets harder (and riskier) as you add mission-critical data and use cases. Operationalizing Kafka on your own is difficult

Discover, understand, and trust your data streams Where did data come from? Where is it going? Where, when, and how was it transformed? What’s the common taxonomy? What is the current state of the stream? Stream Catalog Increase collaboration and productivity with self-service data discovery Stream Lineage Understand complex data relationships and uncover more insights Stream Quality Deliver trusted, high-quality event streams to the business “Confluent’s Stream Governance suite will play a major role in our expanded use of data in motion and creation of a central nervous system for the enterprise. With the self-service capabilities in stream catalog and stream lineage, we’ll be able to greatly simplify and accelerate the onboarding of new teams working with our most valuable data ."

Instantly connect popular data sources & sinks 120+ prebuilt connectors 100+ Confluent supported 20+ partner supported, Confluent verified

Better together: Confluent and AWS

Confluent has deep AWS service integrations

Together Confluent and AWS empower Endless Use Cases across many Industries Retail Healthcare Finance & Banking Transportation Common in all Industries Inventory Management Personalized Promotions Product Development & Introduction Sentiment Analysis Streaming Enterprise Messaging Systems of Scale for High Traffic Periods Connected Health Records Data Confidentiality & Accessibility Dynamic Staff Allocation Optimization Integrated Treatment Proactive Patient Care Real-Time Monitoring Early-On Fraud Detection Capital Management Market Risk Recognition & Investigation Preventive Regulatory Scanning Real-Time What-If Analysis Trade Flow Monitoring Advanced Navigation Environmental Factor Processing Fleet Management Predictive Maintenance Threat Detection & Real-Time Response Traffic Distribution Optimization Data Pipelines Hybrid Cloud Integration Microservices Security and Fraud Customer 360 Streaming ETL

Stream Processing on AWS and Confluent

Stateless Stateful Two types of Stream Processing

ksqlDB at a glance 31 What is it? ksqlDB is an event-streaming database for working with streams and tables of data All the key features of a modern streaming solution Aggregations Joins Windowing Event-time Dual query support Exactly-once semantics Out-of-order handling User-defined functions CREATE TABLE activePromotions AS SELECT rideId, qualifyPromotion(distanceToDst) AS promotion FROM locations GROUP BY rideId EMIT CHANGES How does it work? It separates compute from storage, and scales elastically in a fault-tolerant manner It remains highly available during disruption, even in the face of failure to a quorum of its servers

Streams and Tables { "event_ts" : "2020-02-17T15:22:00Z" , "person" : "robin" , "location" : "Leeds" } { "event_ts" : "2020-02-17T17:23:00Z" , "person" : "robin" , "location" : "London" } { "event_ts" : "2020-02-17T22:23:00Z" , "person" : "robin" , "location" : "Wakefield" } { "event_ts" : "2020-02-18T09:00:00Z" , "person" : "robin" , "location" : "Leeds" } +--------------------+-------+---------+ |EVENT_TS |PERSON |LOCATION | +--------------------+-------+---------+ |2020-02-17 15:22:00 |robin |Leeds | |2020-02-17 17:23:00 |robin |London | |2020-02-17 22:23:00 |robin |Wakefield| |2020-02-18 09:00:00 |robin |Leeds | +-------+---------+ |PERSON |LOCATION | +-------+---------+ |robin |Leeds | Kafka topic +-------+---------+ |PERSON |LOCATION | +-------+---------+ |robin |London | +-------+---------+ |PERSON |LOCATION | +-------+---------+ |robin |Wakefield| +-------+---------+ |PERSON |LOCATION | +-------+---------+ |robin |Leeds | ksqlDB Table ksqlDB Stream Stream (append-only series of events): Topic + Schema Table: state for given key Topic + Schema 32

Pull and Push queries in ksqlDB Push query Pull query Tells you: Exits: Point in time value Immediately All value changes Never 33 Pull queries have some limitations currently: only available against tables require the object to be materialised (i.e. a TABLE built using an aggregate) Currently only available for single key lookup with optional window range Push queries are applicable to all stream and table objects, and indicated with an EMIT CHANGES clause.

Stateful aggregations in ksqlDB { "event_ts" : "2020-02-17T15:22:00Z" , "person" : "robin" , "location" : "Leeds" } { "event_ts" : "2020-02-17T17:23:00Z" , "person" : "robin" , "location" : "London" } { "event_ts" : "2020-02-17T22:23:00Z" , "person" : "robin" , "location" : "Wakefield" } { "event_ts" : "2020-02-18T09:00:00Z" , "person" : "robin" , "location" : "Leeds" } Kafka topic SELECT PERSON, COUNT(*) AS LOCATION_CHANGES FROM MOVEMENTS GROUP BY PERSON; +-------+------------------+ |PERSON | LOCATION_CHANGES | +-------+------------------+ |robin | 1 | +-------+------------------+ |PERSON | UNIQUE_LOCATIONS | +-------+------------------+ |robin | 1 | SELECT PERSON, COUNT_DISTINCT(LOCATION)AS UNIQUE_LOCATIONS FROM MOVEMENTS GROUP BY PERSON; +-------+------------------+ |PERSON | LOCATION_CHANGES | +-------+------------------+ |robin | 2 | +-------+------------------+ |PERSON | UNIQUE_LOCATIONS | +-------+------------------+ |robin | 2 | +-------+------------------+ |PERSON | LOCATION_CHANGES | +-------+------------------+ |robin | 3 | +-------+------------------+ |PERSON | UNIQUE_LOCATIONS | +-------+------------------+ |robin | 3 | +-------+------------------+ |PERSON | LOCATION_CHANGES | +-------+------------------+ |robin | 4 | +-------+------------------+ |PERSON | UNIQUE_LOCATIONS | +-------+------------------+ |robin | 3 | Aggregations can be across the entire input, or windowed (TUMBLING, HOPPING, SESSION) 34

Stateful aggregations in ksqlDB { "event_ts" : "2020-02-17T15:22:00Z" , "person" : "robin" , "location" : "Leeds" } { "event_ts" : "2020-02-17T17:23:00Z" , "person" : "robin" , "location" : "London" } { "event_ts" : "2020-02-17T22:23:00Z" , "person" : "robin" , "location" : "Wakefield" } { "event_ts" : "2020-02-18T09:00:00Z" , "person" : "robin" , "location" : "Leeds" } Kafka topic CREATE TABLE PERSON_MOVEMENTS AS SELECT PERSON, COUNT_DISTINCT(LOCATION) AS UNIQUE_LOCATIONS, COUNT(*) AS LOCATION_CHANGES FROM MOVEMENTS GROUP BY PERSON; PERSON_ MOVEMENTS Internal ksqlDB state store 35

36 Filters CREATE STREAM high_readings AS SELECT sensor, reading, FROM readings WHERE reading > 41 EMIT CHANGES; ** The above “Create Stream As Select (CSAS)” statement is a Persistent Query that runs indefinitely and produces the resulting stream backed by a new topic “high readings” .

37 Joins CREATE STREAM enriched_readings AS SELECT reading, area, brand_name, FROM readings INNER JOIN brands b ON b.sensor = readings.sensor EMIT CHANGES;

38 Aggregate CREATE TABLE avg_readings AS SELECT sensor, AVG(reading) AS location FROM readings GROUP BY sensor EMIT CHANGES; ** The above “ Create Table As Select (CTAS)” statement is a Persistent Query that runs indefinitely and produces the resulting table backed by a new topic “avg_readings” .

Kafka clients Kafka streams ksqlDB ConsumerRecords< String , String > records = consumer . poll( 100 ); Map< String , Integer > counts = new DefaultMap< String , Integer > (); for ( ConsumerRecord< String , Integer > record : records) { String key = record . key(); int c = counts . get(key) c += record . value() counts . put(key, c) } for (Map .Entry< String , Integer > entry : counts . entrySet()) { int stateCount; int attempts; while (attempts ++ < MAX_RETRIES ) { try { stateCount = stateStore . getValue(entry . getKey()) stateStore . setValue(entry . getKey(), entry . getValue() + stateCount) break ; } catch (StateStoreException e) { RetryUtils . backoff(attempts); } } } builder .stream( "input-stream" , Consumed.with(Serdes.String(), Serdes.String())) .groupBy((key, value) -> value) .count() .toStream() .to( "counts" , Produced.with(Serdes.String(), Serdes.Long())); SELECT x, count ( * ) FROM stream GROUP BY x EMIT CHANGES ; Flexibility Simplicity 3 modalities of stream processing with Confluent

2. Stateless Stream processing with AWS Lambda Event source mapping Lambda service Confluent Kafka sink connector Sink connector polls Kafka partitions and invokes your function Lambda can be invoked synchronously or asynchronously At least once semantics Provides a dead letter queue (DLQ) for any failed invocations Sink connector scales up to a soft maximum of 10 connectors Lambda service polls the Kafka partitions and invokes your Lambda function synchronously Starts with one concurrent poller and customer function Scaling Lambda service checks every 3 minutes if scaling is needed Starts with 1 poller and scales up to ≤ #partitions Batch records based on a BatchSize or Batchwindow

Enrich Transaction events for Fraud scoring Customer Transaction Jay $10 ksqlDB

Enrich Transaction events for Fraud scoring Customer Transaction Avg 7 days Num trans 10m Jay $10 $8.5 1

Enrich Transaction events for Fraud scoring Customer Transaction Avg 7 days Num trans 10m Jay $10 $8.5 1 Amazon SageMaker AWS Lambda ksqlDB

3. Kinesis Data Analytics Integrating Kinesis Data Analytics with Confluent allows you unlocks many use-cases with AWS AI/ML Services

When to use which?

ksqlDB Kafka Streams Kinesis Data Analytics Lambda Fully Managed ✅ — ✅ ✅ TYPE Stateful and Stateless Stateful and Stateless Stateful and Stateless Stateless FAULT TOLERANCE Exactly once Exactly once Exactly once At-least once UDF SUPPORT ✅ (self-managed) ✅ (self-managed) ✅ ✅ LATENCY FAST VERY FAST VERY FAST FAST When to use which?

Demo: Streaming ETL (Data cleaning) with ksqlDB and Lambda

Unicorn.Rentals New Hire Orientation CTO, Unicorn.Rentals

LARM: LEGENDARY ANIMAL RENTAL MARKET FANTASY REALITY

Luck Could have happened to anyone. Just a roll of the dice. 13% Software The actual software that was developed. Not that important. 6% Leadership There is nothing more important than the hands at the wheel of the ship. 60% Partners/3 rd Party Products If you don’t know how to do it, let someone else do it. 33% Independent analysis by BoozKinsey GartForest, ltd. KEY SUCCESS DRIVERS

Current Architecture "Who’s spending less than $20?” “How many rentals by a customer?” “How old are our customers?” RDBMS Unicorns Marketing Finance Legal Millions of records Customers On-prem

Current Problems Time Out Error Wrong Format "Who’s spending less than $20?” “How many rentals by a customer?” “How old are our customers?” RDBMS Unicorns Marketing Finance Legal Millions of records Customers On-prem

New Architecture RDBMS Unicorns Customers Marketing Finance Legal Millions of records Data Streaming and Transformation AWS Cloud On-prem

Make History

Chief PowerPoint Officer, Unicorn.Rentals Unicorn.Rentals Technical Overview

Access the Team Dashboard

Team Dashboard Overview

Set Team Name

Team AWS Account

SSH Access

Working in teams

Getting help

Support reminder RAISING HAND IN CHIME CALLS Is not REQUEST HELP IN REMOTE ASSISTANCE MODULE

Navigating the Quest UI

Access

Logging In

Overview