Blazingly-Fast:Introduction to Apache Fury Serialization
shawnckyang
78 views
33 slides
Oct 16, 2024
Slide 1 of 33
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
About This Presentation
1. What is Apache Fury
2. How to use Apache Fury
3. Inside Apache Fury: Why Fury is so Fast
4. The Community of Apache Fury
Size: 28.88 MB
Language: en
Added: Oct 16, 2024
Slides: 33 pages
Slide Content
Shawn Yang | Founder of Apache Fury Blazingly-Fast :Introduction to Apache Fury Serialization
About Me Shawn Yang(@ chaokunyang ): Senior Software Engineer at Ant Group, Founder of Apache Fury, experienced at Distributed Systems: 2016 : Microservices & Cloud Native 2017~2018 : Data Processing Systems(Hadoop 、 Spark 、 Flink 、 Pandas) 2018 : Joined Ant Group 2019~2021 : Streaming 、 Online ML Systems 、 Ray ; Created Fury ; 2022~2023 : Distributed Pandas & Numpy Framework 2023~ : Distributed Multi-modal & Unstructured Processing Systems, Video Processing Architecture
1. What is Apache Fury 3. Inside Apache Fury 2. How to use Apache Fury 4. The Community of Apache Fury
1. What is Apache Fury Community Over Code
What is Apache Fury? Apache Fury (incubating) is a blazingly-fast multi-language serialization framework powered by dynamic codegen and zero-copy, providing up to 170x performance and easy-to-use intuitive programing model . Cross-Language: Java/Python/Golang/ JavaScript/Scala/Rust/… High Performance: up to 170x faster Easy to Use: IDL less, reference, polymorphism support
Key Takeaway 1 - Intuitive Programing Model IDL-free : serialize automatically, no need for IDL definition, compilation and conversion. Language- native type systems. Support optional shared reference and circular reference . Support object polymorphism , serialize interface/trait.
Key Takeaway 2 - High Performance Fastest framework in JVM-Serializers benchmark. 20~170x faster than JDK/Hessian/ Kryo . Generate serializer code at runtime: minimize memory access, virtual method invocation. Meta packing & share: just one memory copy for meta writing.
Examples: High performance + Compact Encoding 6x faster than Protobuf serialization; 4x faster than Protobuf deserialization; 45% of Protobuf serialized size; Meta packing/share/compression: Minimize meta payload cost in protobuf KV layout; Strict mode(smaller): no space cost for type forward/ backward compatibility; Generate more efficient code at runtime
Fury Usage in bigdata processing systems Fury make enc-to-end performance of Flink CDC faster by 1X! Fury make serialization in streaming index building using Flink DataStream: Serialization: 5~20X faster Deserialization: 10~80X faster
Fury Usage in Recommendation system Massive features of many products for serialization, the end-to-end latency of VIP online search and recommendation system reduced by 30+ ms. https:// www.vip.com /
Fury usage in CloudIDE (NodeJS) OpenSumi 2.0 Based on JSON RP C : slow serialization , high bitrate , lagged in network lag 。 OpenSumi 3.0 Based on Apache Fury : Faster serialization, smaller payloads Gateway parse Browser/Server
2. How to Use Apache Fury Community Over Code
Use Apache Fury For Java Serialization JDK Serialization Data Type JDK Des erialization Fury Serialization Setup Fury Fury Des erialization 1. safety by Fury.register 2.Much faster than JDK
Use Apache Fury For Scala Serialization JDK Serialization Data JDK Des erialization Fury Serialization Fury Des erialization 1. Support any scala types 2. Much faster than JDK/ kryo /chill Setup Fury
Use Apache Fury in Graalvm Native Image Data Type Fury Serialization Setup Fury Fury Des erialization Configure this util class for build time init ilization No reflection No proxy/ reflection graalvm json file config 2. 50x faster than JDK
Use Apache Fury For Xlang Serialization Data Type ProtoBuf Define and compile IDL first Fury Manual convert from/to java object Due to lack OOP support 1. NO IDL and manual convert 2. 4 ~10X Faster in Java 3. R eference and polymorphism Shared/Circular reference not supported
3. Inside Apache Fury Community Over Code
Apache Fury Overview High Performance Compact Encoding: minimize cost of reference and polymorphism Zero-Copy+Offheap : minimize copy cost Meta packing & compress: minimize struct meta cost Easy to use IDL less, no compile and convert Native type systems Polymorphic, serialize inferface type Shared/Circular reference Native Type System Polymorphism Circular & Shared Ref Zero Copy Lazy Deserialization IDL -Less Meta String Meta Packing& Share Type Forward Compatibility H omogeneous container stats Protocol Cache Optimization Field reorder Vectorization Async Compilation Switch JUMP Dynamic codegen Java Python NodeJS Optimization Compile-time codegen C++ Rust Rust C++ Scala C# Dart Java Python NodeJS JavaScript GoLang Language Application Big Data Flink Spark Hbase CDC Messaging Kafka RocketMQ Akka RPC Dubbo SOFA GRPC
Serialization Of Basic Types Int32 : ZigZag + VarInt Little Endian Raw Bytes Int64 ZigZag + VarInt Raw Bytes SLI: Small Int: 4 Bytes Big Int: 9 Bytes Bool: 0 for false, 1 for true Int8/Int16: Raw Bytes Float32/Float64 : IEEE 754 Bit Format String : Compatible with Native String to minimize encoding cost Latin1 UTF16 UTF8 Enum: encoded by index Array 1D numeric array: size+ endian + buffer Others: encoded as list Type: Unsigned Varint Namespace + TypeName
S truct Serialization Schema Consistent Protocol : No field name or types meta, write fields by pre-sorted order, no kv cost in protobuf / json Deserialization use schema of current type Schema Evolution Protocol(For online service) : Pack fields meta of struct, compress by multiple encoding Share meta across multiple objects without wrrting every time like Protobuf /JSON Write fields data using schema consistent protocol Deserialization use serialized schema
Sp eed up Struct Serialization by codegen No Codegen Dynamic Codegen Data Type Use codegen to reduce memory access, type dispatch and method call cost. Merge multiple memory checks ahead; Compilation optimization: big method split to small methods, inline almost all method calls.
Java Codegen Process Runtime Codegen Express ion DSL ExprTree Builder ExprTree Optimizer Codegen Context Unsafe Access Type Infer Janino Compiler Codegen Generator Expression DSL If/ IsNull / Not / Comparator / While/ ForLoop / ForEach / ZipForEach / Return Control Expr Arithmetic / Literal / Accesso r/ Cast / SetField / Invoke / StaticInvoke / NewInstance / NewArray / Referenc e/ Assign / AssignArrayElement Non-Control Expr Type Inference Build Expression Tree Optimize Expressions Generate Java Code Generate Byte Code Load Bytecode as Class Create Serializer Switch to new Serializer Serialize data Runtime Codegen (Sync/Async) Other dynamic languages follows similar process, static languages uses macro to generate code
Java Compilation Example Generated Code Example JVM C2 JIT Result Type Def Profiling
S truct Serialization in C++/Rust Schema Consistent Mode S c hema Evolution Mode Data Type Generate serialization code at compile time How to handle received serialized data with different schema: Generate two deserialization method at compile time : Schema Consistent : Best performance, cover most code path ; Schema Inconsistent : Use Switch JUMP to reduce virtual method call, run fast too;
List/Set/Collection Serialization Iterate collection to compute header(4bits): Whether track elements reference Whether elements have null value Whether elements have same type Whether all elements type are declared type Concate header and collection size as a varint No need to write elements meta Collection Stats Elements of Most List/Set/Collection have same type and not null ; Many collection have size < 10 Collection iteration is fast ; Elements of Most Go, Rust, C++ collections: not reference, not polymorphic; How to encode Minimize cost of Reference and polymorphism!
Chunk by Chunk Map Serialization Map Stats Types of map keys are same, keys of Map are not null mostly Types of map value are same, map values are not null mostly KV of maps in Go, Rust, C++ are not reference, not polymorphic mostly H eader Precompute is costly Map is a tree, iterate a map is costly compatible to serialization Chunk by Chunk Encoding Split map into chunks Every chunk has same meta for all key items and value items. Write a new chunk when meta is different Chunk Header( 8 bits ): Whether track elements reference Whether elements have null value Whether elements have same type Whether all elements type are declared type Same for value header … Map Layout: Minimize cost of Reference and polymorphism!
Speed up container field serialization List field List with codegen Data type Use codegen to reduce memory access, type dispatch and method call cost. Generate multiple code path for polymorphic types Map field Map with codegen
4 . The Community of Apache Fury Community Over Code
The History of Apache Fury Side Project Community Project 2019.7 Inner Source; Used by Flink , Lindorm , SOFA , HSF, Dubbo, Voyager Further integrated to AIOS, Search and Rec Systems The Very Top Initial Version Finished as a Side Project; Used for Ray and Online ML Systems Open Sourced to Public Github Trending, Reddit, HackerNews headline; Used in Alibaba, VIP.com , Tuhu , Ctrip Corps, etc ; Joined Apache Incubator Community Building Made 3 releases in ASF; Made Xlang implementation; 2021.2 2022.7 2023.7 2023.12 2024.67 Failed to Open Source Open Sourcing to make the project sustainable Blocked by internal Open Source review (see next page)
The Power of Community 40+ new contributors in last 8 months; 300+ Pull Request in last 8 months; 500+ issues in last 8 months; 6 releases under ASF in last 5 months;
Contribution (thanks all!) From Community
Roadmap Of Apache Fury Xlang Serialization New Serialization Protocol For Python/Golang/ Rust ; Schema Evolution For Xlang ; Codegen For Java Xlang Serialization; Object Graph Serialization Protocol For C++; Ecosystem : GRPC Integration : Compatible with GRPC + Protobuf API Spring/ Flink /Spark/ Akka / Pekko / Quarkus Integration; Document & Developer Experience Improvement