Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
weimeilin1
183 views
19 slides
Jun 06, 2024
Slide 1 of 19
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
About This Presentation
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of statel...
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Size: 3.83 MB
Language: en
Added: Jun 06, 2024
Slides: 19 pages
Slide Content
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines Christina Lin
Online SaaS Services • Not all brokers are running at it’s full capacity AWS, GCP Control Plane AWS, GCP AWS, GCP AWS, GCP AWS, GCP A A A A Not all brokers are at its full capacity!
Consumer Broker Data Ping-Pong Data Pipeline Over the Network - Slow Data Pipeline
Efficient Safe Great UX </> C++ Memory Thread
P P P P P P P P P P P P P P P P P P P P P P P P P P P Client Streaming
JavaScript C/C++ Rust Go 01110101101 11010101010 01101010110 11100010101 WebAssembly
Browser Sandbox Modules Memory Table pointers values Function Safe W eb A ssembly S ystem I nterface Portable Operating System Interface File system Network Environment Variable Clock Clock Arguments No right to access resource beyond sandbox
Gas metering in CPU Memory, restrict memory used. Pre-allocating memory Core 1 Core 2 wasm wasm VM mem VM mem mem Broker
rpk cloud login Choose my fav language! Builds the WebAssembly module Define transformation rules rpk transform build rpk transform init rpk transform deploy --input-topic=customer --output-topic=customer_masked Deploy transformation to cluster customer customer_masked customer customer_masked customer customer_masked Replicate across clusters
Stateless Streaming Pipeline Transform format Change, masking, filtering, validating Dispatch, Wiretap Spilt, multiple destination Control reroute Normalize/ Denormalize Enrich Multiple ingestion Stateful Streaming Pipeline Complex event processing Time-window based processing Enrich Multiple ingestion Micro batch Pipeline Transform for large output (Dataset) Partitioning Split workload A nalytics batch Pipeline A nalytics large volume ( legacy ) Transform large output (Dataset, legacy ) Transport large unstructured data Better scalability for pipelines
Redpanda Data Transform Stateless Streaming Pipeline Transform format Change, masking, filtering, validating Dispatch, Wiretap Spilt, multiple destination Control reroute Normalize/ Denormalize Enrich Multiple ingestion WASM WebAssembly Binary instruction format for a stacked-based VM. Portable compilation Go Rust JS Python Ruby
Demo
Redpanda University Free, self-paced online learning https://university.redpanda.com Learn the fundamentals of data streaming and Redpanda Install Redpanda and use the rpk CLI to configure it Create producers and consumers in Java, Python and NodeJS Sign up today for free!