At Sentry we handle hundreds of thousands of events a second — from tiny metric to huge memory dump. What started as a small Django application had to be scaled up to support global points of presence, lower latency and significantly higher throughput. This talk walks through our experience buildi...
At Sentry we handle hundreds of thousands of events a second — from tiny metric to huge memory dump. What started as a small Django application had to be scaled up to support global points of presence, lower latency and significantly higher throughput. This talk walks through our experience building a Rust based ingestion service to replace our Python one, the lessons we have learned and where we made changes from the original design.
Size: 2.45 MB
Language: en
Added: Jun 24, 2024
Slides: 30 pages
Slide Content
Ingesting in Rust Armin Ronacher Principal Architect at Sentry
Armin Ronacher ( he/him) Principal Architect at Sentry I created Flask I love Queues, Pipelines and everything related Away from work I juggle three kids (not literally)
Setting the Stage
Errors
Replays
Tracing
Profiling
Rust at Sentry
Why is there Rust to Begin With? Initially personal interest, unrelated to Sentry Sentry was built on Python Good FFI between Rust and Python meant extension modules were an option Allowed efficient code sharing of performance critical code Source map library in Rust, exposed to Python
Rust in the Pipeline Pieces we will cover here Relay (Ingestion / Aggregation Service) Symbolicator (Stackwalk, Symbolize, Source Maps) Python Modules (Shared Code) * * a lot of it is forward looking
Did it work?
Rust Again? Short: w ould pick it again Benefits are obvious Downsides are obvious too It might work for you, it might not
Relay Goals What it has to do: Efficiently Enforce Quotas Geo-distributed deployments Perform Sampling Data Format Conversion and Forwarding Perform Metrics Extraction PII Stripping
Picking Rust Sentry’s language stack was Python originally We acquired some Rust experience via out CLI Built a Python extension module reusing CLI code successfully Liked the idea of reusing code across our stack
How to Relay Sentry’s /store/ endpoint was written in Python “Anything goes” (Bad for Rust) No schema, inconsistent behavior Customers sent custom data Various quirks (submit via transparent pixel) Unreasonably lenient
Relay as Library Fix up existing protocol where possible Write new protocol normalizer in Rust, expose to Python Side-by-side run old and new normalization, compare results Learnings: serde has a lot of flaws
Serde Limits Rather low recursion limits Old Python JSON code accepted Infinity/NaN Python <-> Python event submission contained lots of NaNs Workaround: pass over byte stream to replace “NaN” with “0 “ etc. Limited Data Model Serde relies on problematic in-band signalling for data model foreign types Recursion without Limits Complex recovery from errors, not enough state
In Band Signalling Large integer “42”: {"$serde_json::private::Number": "42"} https://github.com/serde-rs/serde/issues/1183 https://github.com/serde-rs/serde/issues/1463
Large Error Types Boxing errors internally creates a significant performance improvement
10000% Speedup Apply offsets to N tokens, accidentally perform memmove over M bytes N times
Picking Frameworks
Relay first Steps old tokio + actix + actix-web (pre async/await) actix looked appealing: actor framework, resonates actix has no reasonable concept of backpressure management over time became a mess
Relay Today Axum Custom service layer async/await (mostly) More explicit back-pressure management
PyO3
Python Modules Export via PyO3 Rust code to Python Use maturin to create and distribute wheels Very good solution for GIL and borrow management
Thank you! Let’s connect. Armin Ronacher Principal Architect at Sentry