How difficult is to get a JIT right? Talk from ESGU 2024

esug 65 views 36 slides Sep 17, 2024
Slide 1
Slide 1 of 36
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36

About This Presentation

Talk from ESGU 2024: How difficult is to get a JIT right?

PDF: http://archive.esug.org/ESUG2024/day3/02-guille-vm-profiling.pdf


Slide Content

How difficult is

to get a JIT right?
Guillermo Polito - ESUG’24
[email protected]
Evref
fervE

Quick About Me: Guille
2
Evref
fervE
•Pronounced giʃe (guichet in FR, ~ghisheh in EN?)
•Now: Researcher at Inria - Lille
•Pharo Contributor since ~2010
•Keywords: compilers, testing, test generation
•Interests: tooling, benchmarking, ???, board games, batman, concurrency
If any of that interests you, come talk to me!
[email protected]
@guillep

Debugging Assembly Code
3
AARCH64X64IA32

Debugging Assembly Code
Without looking at it
4
AARCH64X64IA32

5
The Pharo VM
INTERPRETER
JIT COMPILER
GARBAGE
COLLECTOR
Back
Front
FFI
Concurrency

INTERPRETER
JIT COMPILER
GARBAGE
COLLECTOR
Back
Front
FFI
Concurrency

6
DRUID
Input
Output I am performed at VM
building time (AoT)
VM developers avoid to
write and maintain the
frontend of the JIT
compiler code
(language dependencies)
Context: Druid JIT compiler generation

7
DRUID
Input
Output I am performed at VM
building time (AoT)
VM developers avoid to
write and maintain the
frontend of the JIT
compiler code
(language dependencies)
Context: Druid JIT compiler generation
INTERPRETER
JIT COMPILER
GARBAGE
COLLECTOR
Back
Front
FFI
Concurrency

JIT Compiler
Interpreter
Druid by example: the addition primitive
8

A Couple of Months Ago
9
INTERPRETER
JIT COMPILER
223 bytecodes and 10 primitives
225 bytecodes and 130 primitives
46%
70%

10
INTERPRETER
JIT COMPILER
223 bytecodes and 10 primitives
225 bytecodes and 130 primitives
46%
70%
A Couple of Months Ago

Generated JIT-Compiler
11

Generated JIT-Compiler
12

Chameneos KNucleotide RegexDNA
Richards File Tests Kernel Tests
Opal Tests
Some Initial benchmarks

À la par with the interpreter
14
Chameneos KNucleotide RegexDNA
Richards File Tests Kernel Tests
Opal Tests

Slightly faster?
15
Chameneos KNucleotide RegexDNA
Richards File Tests Kernel Tests
Opal Tests

And much slower too!
16
Chameneos KNucleotide RegexDNA
Richards File Tests Kernel Tests
Opal Tests

Where does the time go?
17

Analysing Instruments Profiles
18
$ xctrace export

Analyzing Samples
19
Sample 1
Sample 2


Sample N

Analyzing Samples
20
Sample 1
Sample 2


Sample N
Stack Traces

Analyzing Samples
21
Sample 1
Sample 2


Sample N
Stack Traces

Analyzing Samples
22
Sample 1
Sample 2


Sample N
Stack Traces
Primitive from
Machine Code
Interpreter

Group Traces Using Heuristics
23
Sample 1
Sample 2


Sample N
JIT compilation
Interpreter
GC
Sample 1
Sample 7
Sample 2
Sample 18992


High-level VM Profile
•Time spent in
•Interpreter
•JIT compilation
•JIT compiled code
•GC
•Primitives
•…
24
Some Bench

Scenario 1: Cross-JIT Profiling
25

Hot Paths and Our Partial JIT Implementation
•Cogit is all or nothing compiler
•Hot path is not compiled!
26

Scenario 2: Cross-Version Profiling
27

Differential Profiling + Absolute Values
28
Stock Druid Interpreter Stock Druid Interpreter

Differential Profiling + Absolute Values
29
Worse Quality MC
Stock Druid Interpreter Stock Druid Interpreter

Differential Profiling
30
More time in Primitives
Stock Druid Interpreter Stock Druid Interpreter

Drill-down in MC -> Primitives
31
Sample 1
Sample 2


Sample N
Stack Traces
Primitives!
Primitive Samples

Differential MC->Primitive Profiling
32

Differential MC->Primitive Profiling
33
Low-hanging fruits

•2x faster!

than interpreter on avg
•Almost there:
•~0.7x manual JIT
•Missing
•static type predictions
•peephole optimizations on conditionals
After Some Bit of “well-placed” Work :)
34

What’s next?
•Linux integration:
•Perf support
•Matéo Boury
•Tracking Pharo’s performance:
•Performance dashboards
•Benchmark Generation
•daily, monthly, yearly
35

Takeaways
•Integrate with tools that do their job well (Instruments, Perf)
•Simple custom tools help debugging complex VM scenarios
•Tests first for good behavior
•Bench first for good performance!
36