ZJIT: Building a Next Generation Ruby JIT

Maxime Chevalier-Boisvert @ RubyKaigi 2025
ZJIT: Building a Next Generation Ruby JIT

Program
●A short history of YJIT
●Why a new JIT? Limitations of YJIT
●Two key design changes
●Two major new features of ZJIT
●Current status of the project
●What to expect for Ruby 3.5.0?

A short (and incomplete) history of YJIT

The OG risk-takers
The YJIT (microJIT) project was originally started in 2020 by 3 engineers at Shopify.
Alan Wu
@alanwusx
Aaron Patterson
@tenderlove
Maxime
@Love2Code

MicroJIT prototype
●Before YJIT, small prototype, ~3 months, we had:
○Up to 10% speedup on smaller benchmarks
○MicroJIT outperforming vanilla Ruby by ~1% on Liquid benchmark
○Underperformed on railsbench, slowed things down
●Main weakness: spend too much time entering/leaving JIT code

YJIT - Just-In-Time Compiler for CRuby

●Green light to work on a “real” JIT
●Nine months to deliver:
○Double-digit speedups on optcarrot
○Single-digit speedups on railsbench
○Low warm up time. Deliver speedups within less than 30 seconds.
●Better success than expected
○“Clear double-digit speedups on realistic benchmarks, sometimes over 20%.”
○“YJIT is able to run SFR and serve real web requests”

RubyKaigi Takeout 2021

Ruby3x3

The OG risk-takers
The YJIT (microJIT) project was originally started in 2020 by 3 engineers at Shopify.
Alan Wu
@alanwusx
Aaron Patterson
@tenderlove
Maxime
@Love2Code

Thank you team YJIT!
This project would not have succeeded without the sustained hard work
of my amazing teammates at Shopify and GitHub.
Alan Wu
@alanwusx
Aaron Patterson
@tenderlove
Maxime
@Love2Code
This project would not be
possible without them!
Noah Gibbs
@codefolio
John Hawthorn
@jhawthorn
Eileen Uchitelle
@eileencodes
Jean Boussier
Kevin Newton
@kddnewton

Thank you team YJIT!
This project would not be possible without the sustained hard work
of my amazing teammates at Shopify and GitHub!
Alan Wu
@alanwusx
Aaron Patterson
@tenderlove
Maxime
@Love2Code
Noah Gibbs
@codefolio
John Hawthorn
@jhawthorn
Eileen Uchitelle
@eileencodes
Jean Boussier
Kevin Newton
@kddnewton
Takashi Kokubun
@k0kubun
Jemma Issroff
@JemmaIssroff

Thank you team YJIT!
This project would not be possible without my amazing colleagues at Shopify and GitHub!
Alan Wu
@alanwusx
Aaron Patterson
@tenderlove
Maxime
@Love2Code
Noah Gibbs
@codefolio
John Hawthorn
@jhawthorn
Eileen Uchitelle
@eileencodes
Jean Boussier
@byroot
Kevin Newton
@kddnewton
Takashi Kokubun
@k0kubun
Jemma Issroff
@JemmaIssroff
Jimmy Miller
@jimmyhmiller
Adam Hess
@theHessParker
Kevin Menard
@nirvdrum
Randy Stauner
@rwstauner

Max Bernstein
@tekknolagi
Aiden Fox Ivey
@aidenfoxivey.com
2025: one slide doesn’t suffice anymore!
New contributors joined and are helping us make Ruby better

Max Bernstein
@tekknolagi
Aiden Fox Ivey
@aidenfoxivey.com
2025: one slide doesn’t suffice anymore!
New contributors joined and are helping us make Ruby better
止められないプログラミ
ングの魔法使い !

https://speed.yjit.org as of 2025-04-09

Congratulations! We did it!

We transformed the Ruby
performance landscape!

But…

https://tech.timee.co.jp

THIS MICROBENCHMARK IS BULL****!!!!!1

Ok, but if this benchmark is so basic,
why do we do so poorly?

I asked ChatGPT how to
say “sorry” in Japan ??????

I would like to sincerely apologize to the Ruby
community as well as the people of Japan for YJIT 3.4’s
underwhelming performance.

We could do better
●This is a microbenchmark, but we could do much better
○JS is more than 10x faster on this microbenchmark
○Why couldn’t Ruby match JS’s speed on simple loops?
●Not possible everywhere:
○How much can you speed up string concat?
●But still! What if we could double YJIT’s peak performance?
○Many obvious optimizations that we’re currently not doing
○Those speedups would translate into real workloads too
○Real workloads use loops as well
??????

Survival of the fastest
●Performance is a feature
○People motivated to upgrade to Ruby 3.3 for performance improvements
○History tells us that we can never have enough compute
●It’s also a matter of survival
○If you want Ruby to survive long-term, performance definitely matters
○If your language is not fast, it will eventually die ??????
●The relentless march of progress
○Optimize performance, reduce cost
○Eliminate/remove all bottlenecks
●Let’s not let Ruby ever be a bottleneck
○Let's build a truly world-class Ruby JIT!

Introducing ZJIT
●For the last 2.5 months, YJIT team has been working on ZJIT
●It’s not just a better YJIT
●Prototype of a next generation Ruby JIT
●Incorporates learnings from 4.5 years of YJIT work
●Designed to be more maintainable/extensible
●Designed to last for next 20+ years of CRuby!
●I believe ZJIT could be a game changer

How do you pronounce ZJIT?

ZJIT’s core architecture

YJIT’s architecture and limitations
●YJIT compiles MRI’s YARV bytecode into machine code
○Relatively simple architecture
○Good for getting something working quickly
○Originally built with a limited time budget
●Grew incrementally
○Added a cross-platform assembler
○Inlining for small/trivial methods
○Custom codegen for core C methods
○Context metadata compression
●Difficult to extend and improve

ZJIT will be a method-based JIT
●Engineering is all about tradeoffs
○Find the best tradeoff given different constraints
○Fast, good, cheap, pick two
●YJIT was based on Lazy Basic Block Versioning (LBBV)
○New/experimental JIT architecture based on my PhD research
○Goal: build a simple JIT with good performance quickly
●Design change: ZJIT will be a method-based JIT instead
○It won’t be based on LBBV
○This is a deliberate choice on my part

Why not LBBV?
●Could we design a more advanced JIT based on LBBV?
○Maybe! I have some ideas
○This is an open research question
●Build a more advanced LBBV JIT is a research project
○Many unknowns
○Don’t want to impose that risk on team, Shopify, the Ruby community
○Ruby is not my personal research project
●Safer bet: traditional, established way to build a JIT
○Known design, risk minimization
○High likelihood of success
○We have many existence proofs

ZJIT: a textbook JIT
●Aiming for ZJIT to have a design that is “more standard”
○Like something you would read about in a compiler textbook
●Want ZJIT to have more “standard” foundations
○This doesn't mean we can't build interesting things on top
○Method-based JIT can be very modular/extensible
○YJIT team published a paper. We can publish again in the future.
●Benefits:
○Low risk: proven design, we know it works
○More accessible to newcomers, Ruby core developers
○Extensible and maintainable

Future research
●A more standard design doesn’t mean “boring”
●Once stable foundations are in place, cool experiments are possible
●PhD advisor Marc Feeley and Olivier Melançon published SBBV
○Static Basic-Block Versioning
○Offshoot/continuation of my PhD work
●SBBV can be done in a method-based context
○Not part of the foundations of the JIT
○Self-contained. Low-risk experiment.

ZJIT’s Intermediate Representation (IR)

YJIT stands for YARV JIT
●YJIT directly compiles YARV bytecode into machine code
○Makes for a simple architecture
○… But it’s not optimal for a JIT
○YARV bytecode is designed for an interpreter
●To maximize interpreter performance, you want bigger instructions
○Minimize dispatch overhead in your interpreter loop
○Give your C compiler bigger chunks of code to optimize
○Create bigger instructions that do more work
○You end up with very “CISC” instructions
●For a JIT compiler, you kind of want the opposite

What a JIT wants
●JIT compilers typically have an Intermediate Representation (IR)
○The IR is how a compiler internally represents code
○It's the compiler's internal "language"
●What you want in a JIT IR:
○Decompose complex semantics into composable primitives
○Smaller instructions that only do fewer things
○Have as little internal control flow as possible
○Makes the code easier reason about
○Easier to analyze and optimize
○A more minimalistic, “RISC-like” instruction set

Key takeaway: RISC is good
Key takeaway: “RISC is good”

Static Single Assignment (SSA)
●ZJIT is going to have an SSA-based IR
●SSA’s key property: instructions produce immutable outputs
○Each IR value can only be assigned once (single assignment!)
○Enables easier algebraic transformations of programs
●I learned about SSA in university (~2006)
○It was developed at IBM in the 1980s
○You can read about it in many compiler textbooks
●Still widely-used across the industry
○It’s probably the most widely-used type of JIT compiler IR
○De-facto standard, also used by LLVM and GCC,
○Proven to be flexible and robust

SSA
POPL 1988

== disasm: #<ISeq:[email protected]:1 (1,0)-(7,3)>
local table (size: 1, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] n@0<Arg>
0000 getlocal_WC_0 n@0 ( 3)[LiCa]
0002 putobject_INT2FIX_1_
0003 opt_le <calldata!mid:<=, argc:1, ARGS_SIMPLE>[CcCr]
0005 branchunless 9
0007 putobject_INT2FIX_1_
0008 leave [Re]
0009 getlocal_WC_0 n@0 ( 6)[Li]
0011 putself
0012 getlocal_WC_0 n@0
0014 putobject_INT2FIX_1_
0015 opt_minus <calldata!mid:-, argc:1, ARGS_SIMPLE>[CcCr]
0017 opt_send_without_block <calldata!mid:factorial, argc:1,
FCALL|ARGS_SIMPLE>
0019 opt_mult <calldata!mid:*, argc:1, ARGS_SIMPLE>[CcCr]
0021 leave

fn factorial:
bb0(v0:BasicObject):
v2:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_LE)
v5:Fixnum = GuardType v0, Fixnum
v7:BoolExact = FixnumLe v5, v2
v8:CBool = Test v7
IfFalse v8, bb1(v0)
v10:Fixnum[1] = Const Value(1)
Return v10
bb1(v12:BasicObject):
v14:BasicObject = PutSelf
v15:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_MINUS)
v18:Fixnum = GuardType v12, Fixnum
v20:Fixnum = FixnumSub v18, v15
PatchPoint MethodRedefined(Object@0x10310ef60, factorial@0xb481)
v29:BasicObject[VALUE(0x10312cda8)] = GuardBitEquals v14, VALUE(0x10312cda8)
v30:BasicObject = SendWithoutBlockDirect v29, :factorial (0x16d3ce660), v20
PatchPoint CalleeModifiedLocals(v30)
v25:BasicObject = SendWithoutBlock v12, :*, v30
PatchPoint CalleeModifiedLocals(v25)
Return v25

First key new feature of ZJIT

CRuby calls are complex and slow
●Calls in CRuby (and YJIT) are very expensive
○Many complex corner cases to handle
○Two stacks (value stack and CFP stack)
○Lots of work setting up complex CFP object
●YJIT can run a calls microbenchmark 8x to 12x faster than CRuby
○But this is still well below what is possible
○The bulk of the code YJIT generates is method calls!
○Massive code size is bad for i-cache, instruction decoding, etc.
●We need to do something about this
○Other languages have much faster calls, Ruby should too!

Fast JIT-to-JIT calls
●YJIT calls are suboptimal
○Uses jmp CPU instructions
○Long code sequence to setup VM stack, CFP object
●To get max performance
○Need to use CPU’s call / ret instructions
○Use only the C stack, not the Ruby VM & CFP stacks
●Design ZJIT to use C stack for JIT-to-JIT calls
○Frame unwinding will be done using C return addresses
○Upside: much faster calls
○Downside: exiting to the interpreter is slower
●Engineering is all about tradeoffs
○The Shinkansen is fast but it can’t do sharp turns

Early experiments
●Takashi Kokubun got ZJIT calls using call/ret insns working ?????? ??????
○PR merged just a few days ago!
●Super early experiment: recursive fibonacci microbenchmark
○This is not how you should compute fibonacci numbers
○Very intensive use of method calls, recursion

# Interpreter (master)
$ ruby -v ~/tmp/fib.rb
ruby 3.5.0dev (2025-04-13T07:55:52Z send-iseq e84b495b38) +PRISM [x86_64-linux]
0.117s

# YJIT (3.4.2)
$ ruby -v --yjit-call-threshold=1 ~/tmp/fib.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
0.016s

# ZJIT (master)
$ ruby -v --zjit-call-threshold=34 --zjit-num-profiles=3 ~/tmp/fib.rb
ruby 3.5.0dev (2025-04-13T07:55:52Z send-iseq e84b495b38) +ZJIT +PRISM [x86_64-linux]
0.010s

Second key new feature of ZJIT

JIT compilation is a balancing act
●Must juggle multiple tradeoffs
○Performance of the generated code
○Warmup time: how long the JIT compiler takes to generate code
○Memory overhead: how much memory the JIT compiler uses
●The JIT compiler runs at the same time as your program
○Competing for resources with the running program
○The JIT has to “pay for itself”
●Effectively operating in a resource-constrained environment
○Have to minimize memory and CPU time usage
○In doing so, you limit what the JIT compiler can do

Reusing compilation work
●Redeploying code multiple times a day on 3 bazillion servers
○99.99% of the code never changes between deploys
○What if we could save/reuse compilation work?
●Could we serialize/persist machine code?
●Advantages:
○Can “hit the ground running”, warm up much faster
○Can potentially spend more time compiling code
○Higher optimization levels become more worthwhile
●Challenges:
○We need to save some metadata as well
○Save de-optimization information
○Compiled code contains pointers (e.g. ISEQs, Ruby strings)
●Not trivial, but definitely not impossible either

ZJIT’s current status

Where is ZJIT today?
●We started development just ~2.5 months ago
○We’ve implemented a custom SSA IR
○Comparisons, fixnum operations
○Control flow: if-else, while loops
○Method calls
○Constant/type propagation, dead code elimination
●We can run simple microbenchmarks
○Like the recursive factorial and fibonacci
●We’re faster than the interpreter!
●Faster than YJIT on some microbenchmarks

“I agree with making ZJIT upstream. And I feel
no worry about the migration, since I trust the
team with merging process.”

Upstreaming ZJIT
●Discussed with Ruby core developers on Tuesday
●Blessing to upstream from Matz ?????? ⛪ ♦
●Upstreaming should happen in the 3-5 weeks
●Command line option to enable as with YJIT:
ruby --zjit
●Keep in mind that right now, ZJIT is not at all in a usable state
○Still in the process of laying down the foundations ??????????????????
○ZJIT should be more usable around Q3/Q4 2025

What to expect for Ruby 3.5.0?
●YJIT will still be available in Ruby 3.5.0
○We won’t remove YJIT until we’re confident ZJIT is faster and just as stable
●ZJIT should be included as part of Ruby 3.5.0
●You may need to run ./configure --enable-zjit to build it
●It may be possible to build both YJIT and ZJIT in the same binary (TBD)
●We’re hoping to match YJIT’s performance for this release
○We’ll likely beat YJIT on more microbenchmarks very soon
○However, beating YJIT on larger, more realistic benchmarks will take some time

Next steps
●Fast JIT-to-JIT calls (Thanks Kokubun!)
●Ability to side-exit to the interpreter (next few weeks)
○Explained in Kokubun’s talk yesterday
○Enable more extensive testing (CRuby test suite)
○Benchmarking on speed.yjit.org
●Implement polymorphic inline caches
●Gradually grow what ZJIT supports (lots of work)
●Measure and optimize compilation speed
●Tuning the compiler, adjusting thresholds etc.

Thank you for listening! :)

To learn more…
●Follow our work on the Rails At Scale blog:
○https://railsatscale.com/
●Come talk to us after this talk!
○Ufuk, Kokubun, Max Bernstein, Alan, Aaron & myself
●Shopify’s Ruby and Rails infrastructure is hiring!
○Compiler experts for ZJIT
○C/Rust systems programmers
○World-class Ruby & Rails experts
○Use our QR code to apply!

To learn more…
●Follow our work on the Rails At Scale blog:
○https://railsatscale.com/
●Come talk to us after this talk!
○Ufuk, Kokubun, Max Bernstein, Alan, Aaron & myself
●Shopify’s Ruby and Rails infrastructure is hiring!
○Compiler experts for ZJIT
○C/Rust systems programmers
○World-class Ruby & Rails experts
○Use our QR code to apply!
https://candidate.shopify.com/form/ruby-kaigi-2025

ZJIT: Building a Next Generation Ruby JIT

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

ZJIT: Building a Next Generation Ruby JIT

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 33

Slide 34

Slide 35

Slide 36

Slide 38

Slide 42

Slide 43

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77

Slide 79

Slide 80

Slide 81

Slide 83

Slide 84

Slide 85

Slide 87

Slide 88

Slide 89

Slide 90

Slide 91

Slide 92

Slide 93

Slide 94

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

MGV Residential Design projects for different clients, including a New Mexico Adobe project-1-.pdf

EUNITED_Advocacy and Public Engagement through Visual Media

DESIGN THINKINGGG PPT 2 TOPIC IDEATION.pptx

DESIGN THINKING CHAPTER 1 PPTT PPT 1.pptx

Hinduism and Its History - PowerPoint Slides.pptx