ZJIT: Building a Next Generation Ruby JIT

maximechevalierboisv1 555 views 94 slides May 06, 2025
Slide 1
Slide 1 of 94
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94

About This Presentation

YJIT can make Ruby code run faster, but this is a balancing act, because the JIT compiler itself must consume both memory and CPU cycles to compile and optimize your code while it is running. Furthermore, in large-scale production environments such as those of GitHub, Shopify and Stripe, we end up i...


Slide Content

Maxime Chevalier-Boisvert @ RubyKaigi 2025
ZJIT: Building a Next Generation Ruby JIT

Program
●A short history of YJIT
●Why a new JIT? Limitations of YJIT
●Two key design changes
●Two major new features of ZJIT
●Current status of the project
●What to expect for Ruby 3.5.0?

A short (and incomplete) history of YJIT

The OG risk-takers
The YJIT (microJIT) project was originally started in 2020 by 3 engineers at Shopify.
Alan Wu
@alanwusx
Aaron Patterson
@tenderlove
Maxime
@Love2Code

MicroJIT prototype
●Before YJIT, small prototype, ~3 months, we had:
○Up to 10% speedup on smaller benchmarks
○MicroJIT outperforming vanilla Ruby by ~1% on Liquid benchmark
○Underperformed on railsbench, slowed things down
●Main weakness: spend too much time entering/leaving JIT code

YJIT - Just-In-Time Compiler for CRuby


●Green light to work on a “real” JIT
●Nine months to deliver:
○Double-digit speedups on optcarrot
○Single-digit speedups on railsbench
○Low warm up time. Deliver speedups within less than 30 seconds.
●Better success than expected
○“Clear double-digit speedups on realistic benchmarks, sometimes over 20%.”
○“YJIT is able to run SFR and serve real web requests”

RubyKaigi Takeout 2021

Ruby3x3

The OG risk-takers
The YJIT (microJIT) project was originally started in 2020 by 3 engineers at Shopify.
Alan Wu
@alanwusx
Aaron Patterson
@tenderlove
Maxime
@Love2Code

Thank you team YJIT!
This project would not have succeeded without the sustained hard work
of my amazing teammates at Shopify and GitHub.
Alan Wu
@alanwusx
Aaron Patterson
@tenderlove
Maxime
@Love2Code
This project would not be
possible without them!
Noah Gibbs
@codefolio
John Hawthorn
@jhawthorn
Eileen Uchitelle
@eileencodes
Jean Boussier
Kevin Newton
@kddnewton

Thank you team YJIT!
This project would not be possible without the sustained hard work
of my amazing teammates at Shopify and GitHub!
Alan Wu
@alanwusx
Aaron Patterson
@tenderlove
Maxime
@Love2Code
Noah Gibbs
@codefolio
John Hawthorn
@jhawthorn
Eileen Uchitelle
@eileencodes
Jean Boussier
Kevin Newton
@kddnewton
Takashi Kokubun
@k0kubun
Jemma Issroff
@JemmaIssroff

Thank you team YJIT!
This project would not be possible without my amazing colleagues at Shopify and GitHub!
Alan Wu
@alanwusx
Aaron Patterson
@tenderlove
Maxime
@Love2Code
Noah Gibbs
@codefolio
John Hawthorn
@jhawthorn
Eileen Uchitelle
@eileencodes
Jean Boussier
@byroot
Kevin Newton
@kddnewton
Takashi Kokubun
@k0kubun
Jemma Issroff
@JemmaIssroff
Jimmy Miller
@jimmyhmiller
Adam Hess
@theHessParker
Kevin Menard
@nirvdrum
Randy Stauner
@rwstauner

Max Bernstein
@tekknolagi
Aiden Fox Ivey
@aidenfoxivey.com
2025: one slide doesn’t suffice anymore!
New contributors joined and are helping us make Ruby better

Max Bernstein
@tekknolagi
Aiden Fox Ivey
@aidenfoxivey.com
2025: one slide doesn’t suffice anymore!
New contributors joined and are helping us make Ruby better
止められないプログラミ
ングの魔法使い !

https://speed.yjit.org as of 2025-04-09

Congratulations! We did it!

We transformed the Ruby
performance landscape!

But…

https://tech.timee.co.jp

THIS MICROBENCHMARK IS BULL****!!!!!1

Ok, but if this benchmark is so basic,
why do we do so poorly?

I asked ChatGPT how to
say “sorry” in Japan ??????

I would like to sincerely apologize to the Ruby
community as well as the people of Japan for YJIT 3.4’s
underwhelming performance.

We could do better
●This is a microbenchmark, but we could do much better
○JS is more than 10x faster on this microbenchmark
○Why couldn’t Ruby match JS’s speed on simple loops?
●Not possible everywhere:
○How much can you speed up string concat?
●But still! What if we could double YJIT’s peak performance?
○Many obvious optimizations that we’re currently not doing
○Those speedups would translate into real workloads too
○Real workloads use loops as well
??????

Survival of the fastest
●Performance is a feature
○People motivated to upgrade to Ruby 3.3 for performance improvements
○History tells us that we can never have enough compute
●It’s also a matter of survival
○If you want Ruby to survive long-term, performance definitely matters
○If your language is not fast, it will eventually die ??????
●The relentless march of progress
○Optimize performance, reduce cost
○Eliminate/remove all bottlenecks
●Let’s not let Ruby ever be a bottleneck
○Let's build a truly world-class Ruby JIT!

Introducing ZJIT
●For the last 2.5 months, YJIT team has been working on ZJIT
●It’s not just a better YJIT
●Prototype of a next generation Ruby JIT
●Incorporates learnings from 4.5 years of YJIT work
●Designed to be more maintainable/extensible
●Designed to last for next 20+ years of CRuby!
●I believe ZJIT could be a game changer

How do you pronounce ZJIT?

ZJIT’s core architecture

YJIT’s architecture and limitations
●YJIT compiles MRI’s YARV bytecode into machine code
○Relatively simple architecture
○Good for getting something working quickly
○Originally built with a limited time budget
●Grew incrementally
○Added a cross-platform assembler
○Inlining for small/trivial methods
○Custom codegen for core C methods
○Context metadata compression
●Difficult to extend and improve

ZJIT will be a method-based JIT
●Engineering is all about tradeoffs
○Find the best tradeoff given different constraints
○Fast, good, cheap, pick two
●YJIT was based on Lazy Basic Block Versioning (LBBV)
○New/experimental JIT architecture based on my PhD research
○Goal: build a simple JIT with good performance quickly
●Design change: ZJIT will be a method-based JIT instead
○It won’t be based on LBBV
○This is a deliberate choice on my part

Why not LBBV?
●Could we design a more advanced JIT based on LBBV?
○Maybe! I have some ideas
○This is an open research question
●Build a more advanced LBBV JIT is a research project
○Many unknowns
○Don’t want to impose that risk on team, Shopify, the Ruby community
○Ruby is not my personal research project
●Safer bet: traditional, established way to build a JIT
○Known design, risk minimization
○High likelihood of success
○We have many existence proofs

ZJIT: a textbook JIT
●Aiming for ZJIT to have a design that is “more standard”
○Like something you would read about in a compiler textbook
●Want ZJIT to have more “standard” foundations
○This doesn't mean we can't build interesting things on top
○Method-based JIT can be very modular/extensible
○YJIT team published a paper. We can publish again in the future.
●Benefits:
○Low risk: proven design, we know it works
○More accessible to newcomers, Ruby core developers
○Extensible and maintainable

Future research
●A more standard design doesn’t mean “boring”
●Once stable foundations are in place, cool experiments are possible
●PhD advisor Marc Feeley and Olivier Melançon published SBBV
○Static Basic-Block Versioning
○Offshoot/continuation of my PhD work
●SBBV can be done in a method-based context
○Not part of the foundations of the JIT
○Self-contained. Low-risk experiment.

ZJIT’s Intermediate Representation (IR)

YJIT stands for YARV JIT
●YJIT directly compiles YARV bytecode into machine code
○Makes for a simple architecture
○… But it’s not optimal for a JIT
○YARV bytecode is designed for an interpreter
●To maximize interpreter performance, you want bigger instructions
○Minimize dispatch overhead in your interpreter loop
○Give your C compiler bigger chunks of code to optimize
○Create bigger instructions that do more work
○You end up with very “CISC” instructions
●For a JIT compiler, you kind of want the opposite

What a JIT wants
●JIT compilers typically have an Intermediate Representation (IR)
○The IR is how a compiler internally represents code
○It's the compiler's internal "language"
●What you want in a JIT IR:
○Decompose complex semantics into composable primitives
○Smaller instructions that only do fewer things
○Have as little internal control flow as possible
○Makes the code easier reason about
○Easier to analyze and optimize
○A more minimalistic, “RISC-like” instruction set

Key takeaway: RISC is good
Key takeaway: “RISC is good”

Static Single Assignment (SSA)
●ZJIT is going to have an SSA-based IR
●SSA’s key property: instructions produce immutable outputs
○Each IR value can only be assigned once (single assignment!)
○Enables easier algebraic transformations of programs
●I learned about SSA in university (~2006)
○It was developed at IBM in the 1980s
○You can read about it in many compiler textbooks
●Still widely-used across the industry
○It’s probably the most widely-used type of JIT compiler IR
○De-facto standard, also used by LLVM and GCC,
○Proven to be flexible and robust

SSA
POPL 1988

== disasm: #<ISeq:[email protected]:1 (1,0)-(7,3)>
local table (size: 1, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] n@0<Arg>
0000 getlocal_WC_0 n@0 ( 3)[LiCa]
0002 putobject_INT2FIX_1_
0003 opt_le <calldata!mid:<=, argc:1, ARGS_SIMPLE>[CcCr]
0005 branchunless 9
0007 putobject_INT2FIX_1_
0008 leave [Re]
0009 getlocal_WC_0 n@0 ( 6)[Li]
0011 putself
0012 getlocal_WC_0 n@0
0014 putobject_INT2FIX_1_
0015 opt_minus <calldata!mid:-, argc:1, ARGS_SIMPLE>[CcCr]
0017 opt_send_without_block <calldata!mid:factorial, argc:1,
FCALL|ARGS_SIMPLE>
0019 opt_mult <calldata!mid:*, argc:1, ARGS_SIMPLE>[CcCr]
0021 leave

fn factorial:
bb0(v0:BasicObject):
v2:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_LE)
v5:Fixnum = GuardType v0, Fixnum
v7:BoolExact = FixnumLe v5, v2
v8:CBool = Test v7
IfFalse v8, bb1(v0)
v10:Fixnum[1] = Const Value(1)
Return v10
bb1(v12:BasicObject):
v14:BasicObject = PutSelf
v15:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_MINUS)
v18:Fixnum = GuardType v12, Fixnum
v20:Fixnum = FixnumSub v18, v15
PatchPoint MethodRedefined(Object@0x10310ef60, factorial@0xb481)
v29:BasicObject[VALUE(0x10312cda8)] = GuardBitEquals v14, VALUE(0x10312cda8)
v30:BasicObject = SendWithoutBlockDirect v29, :factorial (0x16d3ce660), v20
PatchPoint CalleeModifiedLocals(v30)
v25:BasicObject = SendWithoutBlock v12, :*, v30
PatchPoint CalleeModifiedLocals(v25)
Return v25

fn factorial:
bb0(v0:BasicObject):
v2:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_LE)
v5:Fixnum = GuardType v0, Fixnum
v7:BoolExact = FixnumLe v5, v2
v8:CBool = Test v7
IfFalse v8, bb1(v0)
v10:Fixnum[1] = Const Value(1)
Return v10
bb1(v12:BasicObject):
v14:BasicObject = PutSelf
v15:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_MINUS)
v18:Fixnum = GuardType v12, Fixnum
v20:Fixnum = FixnumSub v18, v15
PatchPoint MethodRedefined(Object@0x10310ef60, factorial@0xb481)
v29:BasicObject[VALUE(0x10312cda8)] = GuardBitEquals v14, VALUE(0x10312cda8)
v30:BasicObject = SendWithoutBlockDirect v29, :factorial (0x16d3ce660), v20
PatchPoint CalleeModifiedLocals(v30)
v25:BasicObject = SendWithoutBlock v12, :*, v30
PatchPoint CalleeModifiedLocals(v25)
Return v25

fn factorial:
bb0(v0:BasicObject):
v2:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_LE)
v5:Fixnum = GuardType v0, Fixnum
v7:BoolExact = FixnumLe v5, v2
v8:CBool = Test v7
IfFalse v8, bb1(v0)
v10:Fixnum[1] = Const Value(1)
Return v10
bb1(v12:BasicObject):
v14:BasicObject = PutSelf
v15:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_MINUS)
v18:Fixnum = GuardType v12, Fixnum
v20:Fixnum = FixnumSub v18, v15
PatchPoint MethodRedefined(Object@0x10310ef60, factorial@0xb481)
v29:BasicObject[VALUE(0x10312cda8)] = GuardBitEquals v14, VALUE(0x10312cda8)
v30:BasicObject = SendWithoutBlockDirect v29, :factorial (0x16d3ce660), v20
PatchPoint CalleeModifiedLocals(v30)
v25:BasicObject = SendWithoutBlock v12, :*, v30
PatchPoint CalleeModifiedLocals(v25)
Return v25

fn factorial:
bb0(v0:BasicObject):
v2:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_LE)
v5:Fixnum = GuardType v0, Fixnum
v7:BoolExact = FixnumLe v5, v2
v8:CBool = Test v7
IfFalse v8, bb1(v0)
v10:Fixnum[1] = Const Value(1)
Return v10
bb1(v12:BasicObject):
v14:BasicObject = PutSelf
v15:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_MINUS)
v18:Fixnum = GuardType v12, Fixnum
v20:Fixnum = FixnumSub v18, v15
PatchPoint MethodRedefined(Object@0x10310ef60, factorial@0xb481)
v29:BasicObject[VALUE(0x10312cda8)] = GuardBitEquals v14, VALUE(0x10312cda8)
v30:BasicObject = SendWithoutBlockDirect v29, :factorial (0x16d3ce660), v20
PatchPoint CalleeModifiedLocals(v30)
v25:BasicObject = SendWithoutBlock v12, :*, v30
PatchPoint CalleeModifiedLocals(v25)
Return v25

fn factorial:
bb0(v0:BasicObject):
v2:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_LE)
v5:Fixnum = GuardType v0, Fixnum
v7:BoolExact = FixnumLe v5, v2
v8:CBool = Test v7
IfFalse v8, bb1(v0)
v10:Fixnum[1] = Const Value(1)
Return v10
bb1(v12:BasicObject):
v14:BasicObject = PutSelf
v15:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_MINUS)
v18:Fixnum = GuardType v12, Fixnum
v20:Fixnum = FixnumSub v18, v15
PatchPoint MethodRedefined(Object@0x10310ef60, factorial@0xb481)
v29:BasicObject[VALUE(0x10312cda8)] = GuardBitEquals v14, VALUE(0x10312cda8)
v30:BasicObject = SendWithoutBlockDirect v29, :factorial (0x16d3ce660), v20
PatchPoint CalleeModifiedLocals(v30)
v25:BasicObject = SendWithoutBlock v12, :*, v30
PatchPoint CalleeModifiedLocals(v25)
Return v25

fn factorial:
bb0(v0:BasicObject):
v2:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_LE)
v5:Fixnum = GuardType v0, Fixnum
v7:BoolExact = FixnumLe v5, v2
v8:CBool = Test v7
IfFalse v8, bb1(v0)
v10:Fixnum[1] = Const Value(1)
Return v10
bb1(v12:BasicObject):
v14:BasicObject = PutSelf
v15:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_MINUS)
v18:Fixnum = GuardType v12, Fixnum
v20:Fixnum = FixnumSub v18, v15
PatchPoint MethodRedefined(Object@0x10310ef60, factorial@0xb481)
v29:BasicObject[VALUE(0x10312cda8)] = GuardBitEquals v14, VALUE(0x10312cda8)
v30:BasicObject = SendWithoutBlockDirect v29, :factorial (0x16d3ce660), v20
PatchPoint CalleeModifiedLocals(v30)
v25:BasicObject = SendWithoutBlock v12, :*, v30
PatchPoint CalleeModifiedLocals(v25)
Return v25

fn factorial:
bb0(v0:BasicObject):
v2:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_LE)
v5:Fixnum = GuardType v0, Fixnum
v7:BoolExact = FixnumLe v5, v2
v8:CBool = Test v7
IfFalse v8, bb1(v0)
v10:Fixnum[1] = Const Value(1)
Return v10
bb1(v12:BasicObject):
v14:BasicObject = PutSelf
v15:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_MINUS)
v18:Fixnum = GuardType v12, Fixnum
v20:Fixnum = FixnumSub v18, v15
PatchPoint MethodRedefined(Object@0x10310ef60, factorial@0xb481)
v29:BasicObject[VALUE(0x10312cda8)] = GuardBitEquals v14, VALUE(0x10312cda8)
v30:BasicObject = SendWithoutBlockDirect v29, :factorial (0x16d3ce660), v20
PatchPoint CalleeModifiedLocals(v30)
v25:BasicObject = SendWithoutBlock v12, :*, v30
PatchPoint CalleeModifiedLocals(v25)
Return v25

fn factorial:
bb0(v0:BasicObject):
v2:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_LE)
v5:Fixnum = GuardType v0, Fixnum
v7:BoolExact = FixnumLe v5, v2
v8:CBool = Test v7
IfFalse v8, bb1(v0)
v10:Fixnum[1] = Const Value(1)
Return v10
bb1(v12:BasicObject):
v14:BasicObject = PutSelf
v15:Fixnum[1] = Const Value(1)
PatchPoint BOPRedefined(INTEGER_REDEFINED_OP_FLAG, BOP_MINUS)
v18:Fixnum = GuardType v12, Fixnum
v20:Fixnum = FixnumSub v18, v15
PatchPoint MethodRedefined(Object@0x10310ef60, factorial@0xb481)
v29:BasicObject[VALUE(0x10312cda8)] = GuardBitEquals v14, VALUE(0x10312cda8)
v30:BasicObject = SendWithoutBlockDirect v29, :factorial (0x16d3ce660), v20
PatchPoint CalleeModifiedLocals(v30)
v25:BasicObject = SendWithoutBlock v12, :*, v30
PatchPoint CalleeModifiedLocals(v25)
Return v25

First key new feature of ZJIT

CRuby calls are complex and slow
●Calls in CRuby (and YJIT) are very expensive
○Many complex corner cases to handle
○Two stacks (value stack and CFP stack)
○Lots of work setting up complex CFP object
●YJIT can run a calls microbenchmark 8x to 12x faster than CRuby
○But this is still well below what is possible
○The bulk of the code YJIT generates is method calls!
○Massive code size is bad for i-cache, instruction decoding, etc.
●We need to do something about this
○Other languages have much faster calls, Ruby should too!

Fast JIT-to-JIT calls
●YJIT calls are suboptimal
○Uses jmp CPU instructions
○Long code sequence to setup VM stack, CFP object
●To get max performance
○Need to use CPU’s call / ret instructions
○Use only the C stack, not the Ruby VM & CFP stacks
●Design ZJIT to use C stack for JIT-to-JIT calls
○Frame unwinding will be done using C return addresses
○Upside: much faster calls
○Downside: exiting to the interpreter is slower
●Engineering is all about tradeoffs
○The Shinkansen is fast but it can’t do sharp turns

Early experiments
●Takashi Kokubun got ZJIT calls using call/ret insns working ?????? ??????
○PR merged just a few days ago!
●Super early experiment: recursive fibonacci microbenchmark
○This is not how you should compute fibonacci numbers
○Very intensive use of method calls, recursion

# Interpreter (master)
$ ruby -v ~/tmp/fib.rb
ruby 3.5.0dev (2025-04-13T07:55:52Z send-iseq e84b495b38) +PRISM [x86_64-linux]
0.117s

# YJIT (3.4.2)
$ ruby -v --yjit-call-threshold=1 ~/tmp/fib.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
0.016s

# ZJIT (master)
$ ruby -v --zjit-call-threshold=34 --zjit-num-profiles=3 ~/tmp/fib.rb
ruby 3.5.0dev (2025-04-13T07:55:52Z send-iseq e84b495b38) +ZJIT +PRISM [x86_64-linux]
0.010s

Second key new feature of ZJIT

JIT compilation is a balancing act
●Must juggle multiple tradeoffs
○Performance of the generated code
○Warmup time: how long the JIT compiler takes to generate code
○Memory overhead: how much memory the JIT compiler uses
●The JIT compiler runs at the same time as your program
○Competing for resources with the running program
○The JIT has to “pay for itself”
●Effectively operating in a resource-constrained environment
○Have to minimize memory and CPU time usage
○In doing so, you limit what the JIT compiler can do

Reusing compilation work
●Redeploying code multiple times a day on 3 bazillion servers
○99.99% of the code never changes between deploys
○What if we could save/reuse compilation work?
●Could we serialize/persist machine code?
●Advantages:
○Can “hit the ground running”, warm up much faster
○Can potentially spend more time compiling code
○Higher optimization levels become more worthwhile
●Challenges:
○We need to save some metadata as well
○Save de-optimization information
○Compiled code contains pointers (e.g. ISEQs, Ruby strings)
●Not trivial, but definitely not impossible either

ZJIT’s current status

Where is ZJIT today?
●We started development just ~2.5 months ago
○We’ve implemented a custom SSA IR
○Comparisons, fixnum operations
○Control flow: if-else, while loops
○Method calls
○Constant/type propagation, dead code elimination
●We can run simple microbenchmarks
○Like the recursive factorial and fibonacci
●We’re faster than the interpreter!
●Faster than YJIT on some microbenchmarks

“I agree with making ZJIT upstream. And I feel
no worry about the migration, since I trust the
team with merging process.”

Upstreaming ZJIT
●Discussed with Ruby core developers on Tuesday
●Blessing to upstream from Matz ?????? ⛪ ♦
●Upstreaming should happen in the 3-5 weeks
●Command line option to enable as with YJIT:
ruby --zjit
●Keep in mind that right now, ZJIT is not at all in a usable state
○Still in the process of laying down the foundations ??????????????????
○ZJIT should be more usable around Q3/Q4 2025

What to expect for Ruby 3.5.0?
●YJIT will still be available in Ruby 3.5.0
○We won’t remove YJIT until we’re confident ZJIT is faster and just as stable
●ZJIT should be included as part of Ruby 3.5.0
●You may need to run ./configure --enable-zjit to build it
●It may be possible to build both YJIT and ZJIT in the same binary (TBD)
●We’re hoping to match YJIT’s performance for this release
○We’ll likely beat YJIT on more microbenchmarks very soon
○However, beating YJIT on larger, more realistic benchmarks will take some time

Next steps
●Fast JIT-to-JIT calls (Thanks Kokubun!)
●Ability to side-exit to the interpreter (next few weeks)
○Explained in Kokubun’s talk yesterday
○Enable more extensive testing (CRuby test suite)
○Benchmarking on speed.yjit.org
●Implement polymorphic inline caches
●Gradually grow what ZJIT supports (lots of work)
●Measure and optimize compilation speed
●Tuning the compiler, adjusting thresholds etc.

Thank you for listening! :)

To learn more…
●Follow our work on the Rails At Scale blog:
○https://railsatscale.com/
●Come talk to us after this talk!
○Ufuk, Kokubun, Max Bernstein, Alan, Aaron & myself
●Shopify’s Ruby and Rails infrastructure is hiring!
○Compiler experts for ZJIT
○C/Rust systems programmers
○World-class Ruby & Rails experts
○Use our QR code to apply!

To learn more…
●Follow our work on the Rails At Scale blog:
○https://railsatscale.com/
●Come talk to us after this talk!
○Ufuk, Kokubun, Max Bernstein, Alan, Aaron & myself
●Shopify’s Ruby and Rails infrastructure is hiring!
○Compiler experts for ZJIT
○C/Rust systems programmers
○World-class Ruby & Rails experts
○Use our QR code to apply!
https://candidate.shopify.com/form/ruby-kaigi-2025

To learn more…
●Follow our work on the Rails At Scale blog:
○https://railsatscale.com/
●Come talk to us after this talk!
○Ufuk, Kokubun, Max Bernstein, Alan, Aaron & myself
●Shopify’s Ruby and Rails infrastructure is hiring!
○Compiler experts for ZJIT
○C/Rust systems programmers
○World-class Ruby & Rails experts
○Use our QR code to apply!
https://candidate.shopify.com/form/ruby-kaigi-2025