Bangalore Meetup - Enable realtime machine learning with streaming data

weimeilin1 50 views 28 slides May 13, 2024
Slide 1
Slide 1 of 28
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28

About This Presentation

Enable realtime machine learning with streaming data


Slide Content

Fresh Predictions
Using Real-Time
Datafor Machine
Learning
Christina Lin
The Redpanda Lady
With
Redpanda
Data
Transforms

Christina Lin
Developer Advocate, Redpanda
aka. The Redpanda Lady
© 2024 REDPANDA DATA
SOA
WebSphere
DB2
Sybase
Oracle
MQ
J2EE
EJB
DevOps
Microservice
EIP
K8s
Agile
Integration
Data
Mesh
Active MQ
Living data stack
Resilience - handle failures and scale gracefully
Elasticity – infrastructure that can scale dynamically
Decentralization - data ownership, empowering
individual teams
Performance - low latency and high throughput
Autonomy – self service, define quality, and access
Nimble - efficient data movement
Distributed-distributed data processing for cloud native
Agility – quickly respond to change in data

Agenda
• Streamlined data ingestion and transformation
• Real-time machine learning
• Demo
© 2024 REDPANDA DATA

© 2024 REDPANDA DATA
LLM
RAG
GenAI
Prompt
Engineering
Natural
Language
Generation
Natural
Language
Processing
Deep
Learning
Vector/ Semantic
search
Neural
Network

Application
© 2024 REDPANDA DATA
LLMLLMLLM
How do you build application with AI?

© 2024 REDPANDA DATA
How do you build application with AI?
When is the next eclipse
when where is the best
place to see it?
April 8, 2024 are in
Exmouth, Australia and
East Timor

Application
© 2024 REDPANDA DATA
LLMLLM
LLM
How do you build application with AI?
•Performance problem
•Incorrect, unpredictable result
•Text-based, hard to customize
with small set of data
•$$$$$$$

© 2024 REDPANDA DATA
EventsEventsEventsEvent
Data Layer
Model
Prediction
Model
Testing
Model
Training
Machine Learning
EventsEventsEventsEvent
DatasetDatasetDatasetDatasetDataset
EventsEventsEventsReference
dataInference
Model
Registry
APPAPP
ModelModelModel
Streaming Architecture for AI

© 2024 REDPANDA DATA
Customized
Model
Customized
ModelCustomized
Model
LLM
LLM
Better AI implementation
Retrieval
Augmented
Generation
Customized Domain
trained models
Customized Domain
trained models
Fine-tuned

© 2024 REDPANDA DATA
RAG & Stream & EDA
Broker
APP
LLM
Vector
DB
APP
ModelService
APP
Model
Broker
Aggregate

© 2024 REDPANDA DATA
RAG & Stream & EDA
Broker
NPC1
LLM
Broker
NPC2
LLM
NPC3
LLM
WebSocket
Topic
Topic
Topic

© 2024 REDPANDA DATA
EventsEventsEventsEvent
Data Layer
Model
Prediction
Model
Testing
Model
Training
Machine Learning
EventsEventsEventsEvent
DatasetDatasetDatasetDatasetDataset
EventsEventsEventsReference
dataInference
Model
Registry
APPAPP
ModelModelModel
Streaming Architecture for AI

Redpanda in 3 mins
BrokerZookeeper/
KRaft
JVM
Page
Cache
Page
Cache
Page
Cache
Schema
RegistryHttp ProxyClient
Connector
Debezium
Client
Disk

Redpanda in 3 mins
BrokerZookeeper/
KRaft
JVM
Page
Cache
Page
Cache
Page
Cache
Schema
RegistryHttp ProxyClient
Connector
Debezium
Client
Disk
WASM

© 2024 REDPANDA DATA
Stateless
StreamingPipeline
Transform
format Change, masking, filtering, validating
Dispatch, Wiretap
Spilt, multiple destinationControl
reroute
Normalize/ Denormalize
Enrich
Multiple ingestion
Stateful
StreamingPipeline
Complex event processing
Time-window based processing
Enrich
Multiple ingestion
Micro batch Pipeline
Transform for large output (Dataset)
Partitioning Split workload
Analytics
batch
Pipeline
Analytics large volume (legacy)
Transform large output (Dataset, legacy)
Transport large unstructured data
Better scalability for pipelines

Data
Pipeline
Broker
© 2024 REDPANDA DATA
Data Ping-Pong
Data
Pipeline
Over the Network
- Slow
Data
Pipeline

© 2023 REDPANDA DATA
Redpanda Data Transform
Stateless
StreamingPipelineTransform
format Change, masking, filtering,
validating
Dispatch, Wiretap
Spilt, multiple destinationControl
reroute
Normalize/ Denormalize
Enrich
Multiple ingestion
WASM
WebAssembly
Binary instruction format for a stacked-based VM.
Portable compilation
GoRust
JSPython
Ruby

rpk
cloud loginChoose my
fav language!
Builds the
WebAssembly module
Define transformation rules
rpk transform build
rpk transform init
rpk transform deploy
--input-topic=customer
--output-topic=customer_masked
Deploy transformation to cluster
customer
customer_masked
customer
customer_masked
customer
customer_masked
Replicate
across clusters
Redpanda Data Transforms

cloud login
customer
customer_masked
customer
customer_masked
customer
customer_masked
Replicate
across clusters
customer
partition 1 customer_masked
partition 1
Load to cache
Customer age: 34

Customer age: 3*
Transform
Write back to disk with DMA
Thread per Core
(Quick to process data)
Redpanda Data Transform

© 2024 REDPANDA DATA
Customized
Model
Customized
ModelCustomized
Model
LLM
LLM
Better AI implementation
Retrieval
Augmented
Generation
Customized Domain
trained models
Customized Domain
trained models
Fine-tuned

Demo -Real-Time Data for Machine Learning
Machine Learning
lifecycle
Data ETL
Feature
Engineering
Model Training
Deploy/Experi
ment
PredictionMonitor
Problem

Application
MLOps
Real time food delivery
result – Raw data
In broker processing
data on the fly, in
broker avoid data
ping-pong
Process cleaned
features and param
data set
Continuous real-
time data
training for ML
Dynamic
Model
Updating
Real-time inference
Demo -Real-Time Data for Machine Learning
bit.ly/redpanda-india

redpanda-0redpanda-1
redpanda-2
redpanda-
console
Redpanda Cluster
Jupytor
Notebook
TensorFlow
Simulator
producer.py
Redpanda Cluster
Simulator
producer.py
Demo -Real-Time Data for Machine Learning
bit.ly/redpanda-india

redpanda-
console
Redpanda Cluster
Simulator
producer.py
redpanda-0
LL
redpanda-1
LL
redpanda-2
LL
Demo -Real-Time Data for Machine Learning
bit.ly/redpanda-india

redpanda-0redpanda-1
redpanda-2
redpanda-
console
Redpanda Cluster
Jupytor
Notebook
TensorFlow
Simulator
producer.py
Redpanda
Transforms
Redpanda
Transformsbuild
deploy
Redpanda Cluster
Simulator
producer.py
Redpanda
Transforms
Redpanda
Transforms
Demo -Real-Time Data for Machine Learning
bit.ly/redpanda-india

redpanda-
console
Redpanda Cluster
Simulator
producer.py
redpanda-0
LL
redpanda-1
LL
redpanda-2
LL
Demo -Real-Time Data for Machine Learning
bit.ly/redpanda-india

redpanda-0redpanda-1
redpanda-2
redpanda-
console
Redpanda Cluster
Jupytor
Notebook
TensorFlow
Simulator
producer.py
ML Model
training
consumer.py
model
Real-time
inference
app.py
model
Demo -Real-Time Data for Machine Learning
bit.ly/redpanda-india

© 2024 REDPANDA DATA
Redpanda University
Free, self-paced online learning
https://university.redpanda.com
•Learn the fundamentals of data streaming
and Redpanda
•Install Redpanda and use the rpk CLI to
configure it
•Create producers and consumers
in Java, Python and NodeJS
•Sign up today for free!