Harnessing WebAssembly for Real-time Stateless Streaming Pipelines

weimeilin1 183 views 19 slides Jun 06, 2024
Slide 1
Slide 1 of 19
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19

About This Presentation

Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of statel...


Slide Content

Harnessing WebAssembly for Real-time Stateless Streaming Pipelines Christina Lin

Online SaaS Services • Not all brokers are running at it’s full capacity AWS, GCP Control Plane AWS, GCP AWS, GCP AWS, GCP AWS, GCP A A A A Not all brokers are at its full capacity!

Consumer Broker Data Ping-Pong Data Pipeline Over the Network - Slow Data Pipeline

Efficient Safe Great UX </> C++ Memory Thread

P P P P P P P P P P P P P P P P P P P P P P P P P P P Client Streaming

Parallelism 6 Partitions Producer Consumer Consumer Consumer

Efficient Safe Great UX </> C++ Memory Thread

JavaScript C/C++ Rust Go 01110101101 11010101010 01101010110 11100010101 WebAssembly

Browser Sandbox Modules Memory Table pointers values Function Safe W eb A ssembly S ystem I nterface Portable Operating System Interface File system Network Environment Variable Clock Clock Arguments No right to access resource beyond sandbox

Gas metering in CPU Memory, restrict memory used. Pre-allocating memory Core 1 Core 2 wasm wasm VM mem VM mem mem Broker

void displaylog(int n); int max(int num1, int num2) { if (num1 > num2) result = num1; else result = num2; displaylog(result); return result; } (module (type $ FUNCSIG$vi ( func (param i32))) (import "env" "displaylog" ( func $displaylog (param i32))) (table 0 anyfunc ) (memory $0 1) (export "memory" (memory $0)) (export "max" ( func $max)) ( func $max (; 1 ;) (param $0 i32) (param $1 i32)(result i32) (call $displaylog ( tee_local $0 (select ( get_local $0) ( get_local $1) (i32.gt_s ( get_local $0) ( get_local $1)) ) ) ) ( get_local $0) ) ) 0055fa0 6974 656d 632e 6f6c 656e 7453 6972 676e 0055fb0 01a8 7409 6d69 2e65 6144 6574 01a9 2817 0055fc0 742a 6d69 2e65 6f4c 6163 6974 6e6f 2e29 0055fd0 6f6c 6b6f 7075 01aa 2813 742a 6d69 2e65 0055fe0 6954 656d 2e29 6461 5364 6365 01ab 740a 0055ff0 6d69 2e65 7a74 6573 ac74 0e01 6974 656d 0056000 462e 7869 6465 6f5a 656e 01ad 2814 742a 0056010 6d69 2e65 6f4c 6163 6974 6e6f 2e29 6567 0056020 ae74 1101 6974 656d 612e 6f74 5b69 7473 0056030 6972 676e af5d 1d01 6974 656d 702e 7261 0056040 6573 614e 6f6e 6573 6f63 646e 5b73 7473 0056050 6972 676e b05d 1601 6974 656d 702e 7261 0056060 6573 6953 6e67 6465 664f 7366 7465 01b1 0056070 7411 6d69 2e65 6f6c 6461 6f4c 6163 6974 0056080 6e6f 01b2 740f 6d69 2e65 6f6c 6461 7a54 0056090 6e69 6f66 01b3 741b 6d69 2e65 6f4c 6461 00560a0 6f4c 6163 6974 6e6f 7246 6d6f 5a54 6144 00560b0 6174 01b4 7409 6d69 2e65 706f 6e65 01b5 00560c0 740b 6d69 2e65 7270 6165 6e64 01b6 2813 00560d0 742a 6d69 2e65 6164 6174 4f49 2e29 6572 00560e0 6461 01b7 740e 6d69 2e65 7a74 6573 4e74 00560f0 6d61 b865 1001 6974 656d 742e 737a 7465 0056100 664f 7366 7465 01b9 740e 6d69 2e65 7a74 0056110 6573 5274 6c75 ba65 0c01 6974 656d 612e 0056120 7362 6144 6574 01bb 740f 6d69 2e65 7a74 0056130 7572 656c 6954 656d 01bc 740d 6d69 2e65 0056140 7a74 6573 4e74 6d75 01bd 7417 6d69 2e65 Efficient

Transforms software development kit func myTransform ( evt redpanda.WriteEvent ) ([] redpanda.Record , error) {      var data struct {...}      if err := xml.Unmarshal ( evt.Record ().Value, &data); err != nil {          return nil, err      }      v, err := json.Marshal (&data)      if err != nil {          return nil, err      }      return redpanda.Record {          Key:   evt.Record ().Key,          Value: v,      }, nil  } Great UX

rpk cloud login Choose my fav language! Builds the WebAssembly module Define transformation rules rpk transform build rpk transform init rpk transform deploy --input-topic=customer --output-topic=customer_masked Deploy transformation to cluster customer customer_masked customer customer_masked customer customer_masked Replicate across clusters

Stateless Streaming Pipeline Transform format Change, masking, filtering, validating Dispatch, Wiretap Spilt, multiple destination Control reroute Normalize/ Denormalize Enrich Multiple ingestion Stateful Streaming Pipeline Complex event processing Time-window based processing Enrich Multiple ingestion Micro batch Pipeline Transform for large output (Dataset) Partitioning Split workload A nalytics batch Pipeline A nalytics large volume ( legacy ) Transform large output (Dataset, legacy ) Transport large unstructured data Better scalability for pipelines

Redpanda Data Transform Stateless Streaming Pipeline Transform format Change, masking, filtering, validating Dispatch, Wiretap Spilt, multiple destination Control reroute Normalize/ Denormalize Enrich Multiple ingestion WASM WebAssembly Binary instruction format for a stacked-based VM. Portable compilation Go Rust JS Python Ruby

Demo

Redpanda University Free, self-paced online learning https://university.redpanda.com   Learn the fundamentals of data streaming and Redpanda Install Redpanda and use the rpk CLI to configure it Create producers and consumers in Java, Python and NodeJS Sign up today for free!

Questions? LinkedIn: www.linkedin.com/in/weimeilin X: @ Christina_wm Email: [email protected]