5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X

Five hours to 7. 7 seconds : How database tricks sped up Rust linting by over 2000x Predrag Gruevski Maintainer , Trustfall & cargo-semver-checks

Hi, I’m Predrag ( he/him) Author of cargo-semver-checks Rust linter Author of Trustfall query engine Firmly believe that everything can be a database

Semver in Rust Semantic versioning = “only major versions can break APIs” Always-on, cannot opt out Easy to violate: breaking changes aren’t always obvious Violations are painful: users’ builds broken, maintainers scrambling to fix

1 in 6 of the top 1000 crates have broken semver at least once*

1 in 6 of the top 1000 crates have broken semver at least once* * not the maintainers’ fault; none of us would have fared any better

cargo-semver-checks “clippy for semver” — lint for semver violations before publishing Use as: cargo semver-checks && cargo publish Going to become part of cargo itself: https://github.com/obi1kenobi/cargo-semver-checks/issues/61

Make it work, then make it fast MVP version was useful, but not the fastest Runtime scaled as O(n^2) with library size Most libraries are small, so no perf problem Medium-sized libraries are at higher risk, linting is worth a slight slowdown Lots of adoption — libp2p, cargo_metadata, tokio, … Adding ~1 min to the CI publish job is no problem, right?

Sweet spot of bad scaling

a ws-sdk-ec2 tries linting semver… $ cd aws-sdk-ec2 $ cargo semver-checks Checking aws-sdk-ec2 v0.24.0 -> v0.24.0 (no change) Starting 40 checks, 0 unnecessary

aws-sdk-ec2 tries linting semver… $ cd aws-sdk-ec2 $ cargo semver-checks Checking aws-sdk-ec2 v0.24.0 -> v0.24.0 (no change) Starting 40 checks, 0 unnecessary

aws-sdk-ec2 tries linting semver… $ cd aws-sdk-ec2 $ cargo semver-checks Checking aws-sdk-ec2 v0.24.0 -> v0.24.0 (no change) Starting 40 checks, 0 unnecessary Completed [18178.109s] 40 checks; 40 passed, 0 unnecessary

Let’s make that 2354x faster $ cd aws-sdk-ec2 $ cargo semver-checks Checking aws-sdk-ec2 v0.24.0 -> v0.24.0 (no change) Starting 40 checks, 0 unnecessary Completed [ 7.722s] 40 checks; 40 passed, 0 unnecessary

Overview Semver linting is a special kind of querying With the right tools, anything can be a database Query optimization — as an API Takeaways

Example: pub fn removed from API https://github.com/obi1kenobi/semver-examples/compare/main...easy_01

$ cargo semver-checks Parsing easy_01 v0.1.0 (current) Parsing easy_01 v0.1.0 (baseline) Checking easy_01 v0.1.0 -> v0.1.0 (no change) Completed [ 0.011s] 45 checks; 44 passed, 1 failed, 0 unnecessary --- failure function_missing: pub fn removed or renamed --- Description: A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely. ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.22.1/src/lints/ function_missing.ron Failed in: function easy_01::add, previously in file semver-examples/easy_01/old/src/lib.rs:1 Final [ 0.012s] semver requires new major version: 1 major and 0 minor checks failed

A special kind of querying Query: Find public functions that existed in the previous version, b ut don’t exist in the new version.

A special kind of querying Query: Find public functions that existed in the previous version, but don’t exist in the new version. Each result of this query is a semver violation! Declarative — business logic separate from implementation We can optimize implementation without affecting business logic! Dozens of queries like this — one per way to break semver

Naive execution — O(n^2) Check N functions against N functions, then filter for matching path.

Naive execution — O(n^2) Check N functions against N functions, then filter for matching path. O(N) O(N)

P redicate pushdown — O(n) execution For each of N functions, look up matching one in O(1). O(N) O(1)

Predicate pushdown — O(n) execution For each of N functions, look up matching one in O(1). This is the goal! O(N) O(1)

Overview Semver linting is a special kind of querying With the right tools, anything can be a database Query optimization — as an API Takeaways

Our data source Use rustdoc to get JSON description of APIs Wrinkle: different Rust versions have different JSON formats Reimplementing queries for each new version is prohibitive! Ideally: queries over abstract data model that doesn’t change

Semver requires “s eeing like a compiler” We’re solving for “could the new code possibly work” For example: x.foo() might call method on x itself or on any of its traits Moving foo() from x itself to one of its traits is not breaking Requires compiler-like logic: n ame resolution, trait resolution, etc.

SQL isn’t a good fit SQL views and PL/SQL not suitable for compiler-like logic Would need complex & costly ETL for each different JSON format ETL runs all work eagerly — computes stuff we might not use!

Using the Trustfall query engine Engine with “foreign data wrappers” as a first-class feature Query any combination of data sources Just need an adapter for the data source Compiler-like logic goes in the adapter Battle-tested ideas: 7+ years in production Engine built in Rust; adapters can be Rust / Python / JS / WASM

Using the Trustfall query engine

What does this look like in practice?

Overview Semver linting is a special kind of querying With the right tools, anything can be a database Query optimization — as an API Takeaways

Example Trustfall queries “Look up all functions in the crate.”

{ Crate { item { // all items in the crate: functions, structs, enums, etc. ... on Function { // select only functions, discard all else // get the paths under which this function is importable // (there might be zero, one, or more than one!) importable_path { path @ output } // (optional) more query clauses } } } }

{ Crate { item { // all items in the crate: functions, structs, enums, etc. ... on Function { // select only functions, discard all else // get the paths under which this function is importable // (there might be zero, one, or more than one!) importable_path { path @ output } // (optional) more query clauses } } } } Must iterate through all functions, n o opportunity to apply index.

// Lightly simplified code fn resolve_crate_items <' a >( contexts : ContextIterator <' a , Vertex <' a >>, // iterator of crates whose items we resolve hints : & ResolveEdgeInfo , // hints about how data is getting used ) -> ContextOutcomeIterator <' a , Vertex <' a >, ... > { resolve_neighbors_with ( contexts , | crate_vertex | { resolve_items_slow_path ( crate_vertex ) // return all items }) }

// Lightly simplified code fn resolve_crate_items <' a >( contexts : ContextIterator <' a , Vertex <' a >>, // iterator of crates whose items we resolve hints : & ResolveEdgeInfo , // hints about how data is getting used ) -> ContextOutcomeIterator <' a , Vertex <' a >, ... > { resolve_neighbors_with ( contexts , | crate_vertex | { resolve_items_slow_path ( crate_vertex ) // return all items }) } Including non-functions! Trustfall handles further filtering, like selecting functions only.

Example Trustfall queries “Look up all functions in the crate.” “Look up the function at a specific import path.”

s pecific function

{ Crate { item { // all items in the crate: functions, structs, enums, etc. ... on Function { // select only functions, discard all else // get the paths under which this function is importable // (there might be zero, one, or more than one!) importable_path { path @ filter ( op : "=" , value : [ "$path" ]) @ output } // (optional) more query clauses } } } }

{ Crate { item { // all items in the crate: functions, structs, enums, etc. ... on Function { // select only functions, discard all else // get the paths under which this function is importable // (there might be zero, one, or more than one!) importable_path { path @ filter ( op : "=" , value : [ "$path" ]) @ output } // (optional) more query clauses } } } } Query variable, statically known value. Use index lookup!

// Lightly simplified code fn resolve_crate_items <' a >(...) -> _ { // No luck on finding a fast path. Go through all items the slow way. resolve_neighbors_with ( contexts , | crate_vertex | { resolve_items_slow_path ( crate_vertex ) }) }

// Lightly simplified code fn resolve_crate_items <' a >(...) -> _ { // Is the `importable_path` edge being resolved in a subsequent step? if let Some ( neighbor_info ) = hints . first_edge ( "importable_path" ) { // Is the `path` value statically known? If so, apply the index! if let Some ( candidate_value ) = neighbor_info . statically_required_property ( "path" ) { return resolve_neighbors_with ( contexts , | crate_vertex | { resolve_items_by_importable_path ( crate_vertex , candidate_value . clone ()) }); } } // No luck on finding a fast path. Go through all items the slow way. resolve_neighbors_with ( contexts , | crate_vertex | { resolve_items_slow_path ( crate_vertex ) }) }

// Lightly simplified code fn resolve_crate_items <' a >(...) -> _ { // Is the `importable_path` edge being resolved in a subsequent step? if let Some ( neighbor_info ) = hints . first_edge ( "importable_path" ) { // Is the `path` value statically known? If so, apply the index! if let Some ( candidate_value ) = neighbor_info . statically_required_property ( "path" ) { return resolve_neighbors_with ( contexts , | crate_vertex | { resolve_items_by_importable_path ( crate_vertex , candidate_value . clone ()) }); } } // No luck on finding a fast path. Go through all items the slow way. resolve_neighbors_with ( contexts , | crate_vertex | { resolve_items_slow_path ( crate_vertex ) }) } O(1) O(n)

Example Trustfall queries “Look up all functions in the crate.” “Look up the function at a specific import path.” “Look up the function at the same import path as this other function.” ( i.e. our semver lint)

This is not a single fixed value, it’s different for each function we’re checking. Its value is only known dynamically .

// Lightly simplified code fn resolve_crate_items <' a >(...) -> _ { // Is the `importable_path` edge being resolved in a subsequent step? if let Some ( neighbor_info ) = hints . first_edge ( "importable_path" ) { // Is the `path` value statically or dynamically known? If so, apply the index! if let Some ( candidate_value ) = neighbor_info . statically_required_property ( "path" ) { return resolve_neighbors_with ( contexts , | crate_vertex | { resolve_items_by_importable_path ( crate_vertex , candidate_value . clone ()) }); } // No luck on finding a fast path. Go through all items the slow way. // < prior slow path here > }

// Lightly simplified code fn resolve_crate_items <' a >(...) -> _ { // Is the `importable_path` edge being resolved in a subsequent step? if let Some ( neighbor_info ) = hints . first_edge ( "importable_path" ) { // Is the `path` value statically or dynamically known? If so, apply the index! if let Some ( candidate_value ) = neighbor_info . statically_required_property ( "path" ) { return resolve_neighbors_with ( contexts , | crate_vertex | { resolve_items_by_importable_path ( crate_vertex , candidate_value . clone ()) }); } else if let Some ( dynamic ) = neighbor_info . dynamically_required_property ( "path" ) { return dynamic . resolve_with ( contexts , | crate_vertex , candidate_value | { resolve_items_by_importable_path ( crate_vertex , candidate_value ) }); } } // No luck on finding a fast path. Go through all items the slow way. // < prior slow path here > }

// Lightly simplified code fn resolve_crate_items <' a >(...) -> _ { // Is the `importable_path` edge being resolved in a subsequent step? if let Some ( neighbor_info ) = hints . first_edge ( "importable_path" ) { // Is the `path` value statically or dynamically known? If so, apply the index! if let Some ( candidate_value ) = neighbor_info . statically_required_property ( "path" ) { return resolve_neighbors_with ( contexts , | crate_vertex | { resolve_items_by_importable_path ( crate_vertex , candidate_value . clone ()) }); } else if let Some ( dynamic ) = neighbor_info . dynamically_required_property ( "path" ) { return dynamic . resolve_with ( contexts , | crate_vertex , candidate_value | { resolve_items_by_importable_path ( crate_vertex , candidate_value ) }); } } // No luck on finding a fast path. Go through all items the slow way. // < prior slow path here > } O(1) O(n) O(1)

Predicate pushdown — O(n) execution For each of N functions, look up matching one in O(1). This is the goal! O(N) O(1)

Predrag Gruevski @PredragGruevski / @[email protected] https://predr.ag/blog Happy querying!

5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx