Writing GPU-Ready AI Models in Pure Java with Babylon

AnaMariaMihalceanu1 1 views 28 slides Oct 08, 2025
Slide 1
Slide 1 of 28
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28

About This Presentation

Project Babylon introduces the experimental Code Reflection technology that lets you define machine learning logic in plain Java code, without needing Python or external model files. It then uses Foreign Function and Memory (FFM) API to connect your code to native runtimes like ONNX Runtime for fast...


Slide Content

Writing GPU-ReadyAI Models in
Pure Javawith Babylon
Ana-Maria Mihalceanu
Senior Developer Advocate
Java Platform Group @ Oracle
Lize Raes
Senior Developer Advocate
Java Platform Group @ Oracle

Copyright © 2025, Oracle and/or its affiliates 2
CPU
•General-purpose
processor with few,
complex cores
•Ideal for serial work
where one operation
depends on the other.
GPU
•Highly parallel
processor with many
simple cores.
•Ideal for parallel work
(SIMD)
and matrix
multiplications.
CU
Shorthand for 'see you'
Cu
Cuprum (copper), highly
conductive metal
Central Processing Unit vs Graphical Processing Unit
Anti-confusion chart

Why GPU?
Massive Parallelism
Thousands of small cores that can perform many
arithmetic operations simultaneously.
High Throughput and Memory Bandwidth
GPU memory hierarchy and bandwidth are optimized for
bulk data operations.
Energy Efficiency for Highly Parallel Workloads
GPUs deliver more performance per watt compared to
scaling CPU clusters.
3 Copyright © 2025, Oracle and/or its affiliates
Source: https://pixabay.com/photos/gpu-graphic-card-pcb-hardware-4885250/

GPU for AI?
Deep Learning models have many layers,
multiplications and inputs. Perfect for GPU!
4 Copyright © 2025, Oracle and/or its affiliates
Source https://www.researchgate.net/pu blic ation/ 378171318_Utilising_Machine_Learning_to_Predict_Myo cardial_Infarctio n_by_Elec troc ardiogram_Derived_Respiratio n

From Code to Hardware
Java code traditionally runs on CPU
What does ‘running on the GPU’ imply?
5 Copyright © 2025, Oracle and/or its affiliates
Source Code
(Java)
IR (Bytecode)
Interpreter
(JVM JIT)
Machine Code CPU
Legend:
IR = Intermediate Representation
PTX = Parallel Thread Execution
Going from written code to machine code that runs on vendor-specific hardware
Source Code (e.g.
CUDA/C++)
IR
(eg. PTX)
Runtime/Driver Machine Code GPU
Going from written code to machine code that runs on vendor-specific hardware

Diverse GPU Vendors
Copyright © 2025, Oracle and/or its affiliates 6
Vendor IR name (internal compiler IR) Runtime / execution layer
NVIDIA PTX (Parallel Thread Execution) IR CUDA, CUDNN, TensorRT
AMD LLVM IR / GCN ISA
*
(via ROCm) ROCm, MlOpen, HIP runtime
Intel SPIR-V (Standard Portable IR for Vulkan/OpenCL)oneAPI Level Zero, OpenVINO runtime
Apple AIR
(Apple Intermediate Representation)
Metal Performance Shaders, Core ML
ARM/Mali NIR
(for Mesa stack; IR used in open drivers)
Compute Library, Arm NN
*ISA = Instruction Set Architecture

About Us
Ana-Maria Mihalceanu
Senior Developer Advocate @Oracle
Lize Raes
Senior Developer Advocate @Oracle
7 Copyright © 2025, Oracle and/or its affiliates

RUNTIME
Deep Learning Models
8 Copyright © 2025, Oracle and/or its affiliates
MODEL
•.pt
•.pb
•.onnx
•.gguf
•...
Graph
(ops + layers)
Weights
IN (img, tokens, …) OUT (cat., tokens, …)
Loads Model
Dispatches Load
to HW
→ PyTorch
→ TensorFlow Runtime
→ ONNX Runtime
→ Llama.cpp

Copyright © 2025, Oracle and/or its affiliates 9
ONNX
Open Neural Network
Exchange, format for
sharing AI models +
runtime
Onyx
Dark gemstone, often
black or banded, used in
jewelry and tabletops
Onix
210kg-weighing ground-
type Pokémon shaped
like a stone serpent
Oh niks
Flemish for 'oh nothing'
Open Neural Network Exchange
Anti-confusion chart

Open Neural Network eXchange (ONNX)
10 Copyright © 2025, Oracle and/or its affiliates
ONNX
Model
Input
Data
Output
Result
In-Memory
Graph
Graph
Partitioner
Provider
Registry
Parallel, Distributed Graph Runner
Execution Providers
CPU GPU-EP Other
1.Interoperable format for machine-learning models
2. Runtime for executing ONNX models

ONNX and Java
ONNX from Java Perspective
The Java platform knows nothing about ONNX.
Java considers ONNX runtime a foreign (native) library.
Java considers the ONNX programming model a foreign
programming model.
11 Copyright © 2025, Oracle and/or its affiliates

Deploy and Execute an ONNX Model
Demo https://github.com/LizeRaes/babylon/tree/fer
12 Copyright © 2025, Oracle and/or its affiliates
ONNX Native Library
(libonnxruntime.dylib | libonnxruntime.dll |
libonnxruntime.so)
Foreign Function & Memory (FFM) Java
Bindings
ONNX Model
(emotion-ferplus-8.onnx)
Java Client Classification/Probabilities
jextract
(https://jdk.java.net/jextract/)
Memory
Layouts
Var handles
Function
Descriptors
Method
Handles
Image
(.png)

Demo: Running Loads on GPU via
ONNX Runtime, in Java
13 Copyright © 2025, Oracle and/or its affiliates

What’s Inside an ONNX Model?
Model metadata
•ir_version (ONNX spec version)
•producer name (e.g. "pytorch", "skl2onnx")
•opset version (the set of available ONNX operators)
•optional metadata strings (author, domain, description, training info)
Graph structure
•Nodes = operators (Conv, Relu, MatMul, etc.)
•Edges = tensors flowing between nodes
•Each node stores its inputs, outputs, and attributes (e.g. kernel size, stride)
Initializers (weights)
•The learned parameters (weights, biases, embeddings, etc.) are stored as raw tensors
inside the file.
•These can be large chunks of binary data (float32, int64, etc.).
Inputs and outputs
•Names, shapes, and data types of expected model inputs and outputs.
•Example: input is float[1, 1, 64, 264], output is float[1, 8].
14 Copyright © 2025, Oracle and/or its affiliates
https://netron.app/

@CodeReflection helps identify areas of Java source code to reflect over
and give access to as code models at compile time and runtime.
Extend Java Reach to Foreign Programming Models with Project Babylon

15 Copyright © 2025, Oracle and/or its affiliates
func @"f" ()void -> {
%0 : java.io.PrintStream = field.load@"java.lang.System::out()java.io.PrintStream";
%1 : java.lang.String = constant @"Hello !";
invoke %0 %1 @"java.io.PrintStream::print(java.lang.String)void";
return;
};
@CodeReflection
static void f() {
System.out.println("Hello !");
}
Input Java Code
Java Code Model
public static void f();
Code:
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String Hello !
5: invokevirtual #4 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
8: return
JVM Bytecode
Reflect
Foreign Code Model
Translate
(eg. autodiff)
Lower

Architecture of Java ONNX Prototype
16 Copyright © 2025, Oracle and/or its affiliates
ONNX runtime (ORT)
Foreign Function & Memory API
Panama ONNX binding
ONNX model authored using
Java ONNX API
JDK
Library
Application
onnxruntime_c_api.h
Java ONNX API &
Code Model Transformer
Java code
Native code
jextract

How to run A Java Code Model on ONNX Runtime
Demo https://github.com/openjdk/babylon/tree/code-reflection/cr-examples/onnx
17 Copyright © 2025, Oracle and/or its affiliates
Java Code Model
Code Reflection API
Java ONNX Script Library
Tensor…
ir.OnnxOp..
ir.OnnxType
compiler.OnnxTransformer..
OnnxRuntime…
FFM Bindings
foreign.OrtApi..
foreign.OrtGenApi
ONNX Runtime ONNX GenAI Runtime
OnnxOperators
ir.OnnxOps
proto.OnnxBuilder..
ONNX
Specs & Sources
OpGen
ProtoGen
jextract

Demo: Running a Java Model
on ONNX Runtime
18 Copyright © 2025, Oracle and/or its affiliates

Java on the GPU
19 Copyright © 2025, Oracle and/or its affiliates

Copyright © 2025, Oracle and/or its affiliates 20
Kernel (seed)
Inner edible part of a
grain or nut
Kernel (OS core)
Central part of an
operating system that
manages hardware and
software
Kernel (ML function)
Similarity function,
mathematical tool to
measure similarity by
mapping data into higher
dimensions
Kernel (GPU function)
Small function that runs in
parallel across many threads on
a GPU
Kernel
Anti-confusion chart
©Avadhoot Tavhare
Compute
Kernel
Thread 1 Thread 2
Thread 4Thread 3

Heterogeneous Accelerator Toolkit (HAT)
"Leaning on the work of others from Panama, Babylon and Class-File API"
21 Copyright © 2025, Oracle and/or its affiliates
CPU
GPU or FPGA
JVM
Application code
HAT Programming Model
HAT
Panama + Code Reflection
Native
Java
Application
Library
JDK
GPU library &
kernel compiler
Accelerator
'Jextracted' or Panama FFM Native
Code
Native vendor provided
runtime/framework
Pluggable
Backend

Heterogeneous Accelerator Toolkit (HAT)
"Leaning on the work of others from Panama, Babylon and Class-File API"
22 Copyright © 2025, Oracle and/or its affiliates
CPU
GPU or FPGA
JVM
Application code
HAT Programming Model
HAT
Panama + Code Reflection
Native
Java
Application
Library
JDK
GPU library &
kernel compiler
Accelerator
'Jextracted' or Panama FFM Native
Code
Native vendor provided
runtime/framework
CUDA
OpenCL
LevelZero
HIP
Java
?

What Does HAT Offer?
An NDRange style kernel parallel programming model
•Other programming models (Triton, OpenMP/TornadoVM annotated loops)
could be supported
A compute programming model
•For coordinating multiple kernel dispatches and minimizing buffer transfers
using Java
A pluggable backend abstraction
•GPU vendors can showcase their device capabilities
•'Pure Java' multi-threaded and sequential backends
Interface mapped/wrapped Panama FFM MemorySegments
•Access to off-heap data via Java friendly accessors
•Data can be efficiently passed between Java and non-Java compute nodes
23 Copyright © 2025, Oracle and/or its affiliates
Application
Heterogeneous Accelerator Toolkit
( HAT)
GPU FPGACPU
Panama FFM

Vendor
Native Runtime
Babylon JDK JVM

Access Code Models of Kernels via @CodeReflection
Copyright © 2025, Oracle and/or its affiliates
public class Square {
@CodeReflection
public static void kernel(@RO KernelContext kc,
@RW S32Arr s32Arr) {
if (kc.x<kc.maxX){
s32Arr.array(kc.x,
s32Arr.array(kc.x) * s32Arr.array(kc.x));
}
}
}
24
Source: https://jjfumero.github.io/posts/2025/02/07/babylon-and-tornadovm

Heterogeneous Accelerator Toolkit (HAT) Programming Model
public class Square {
@CodeReflection
public static void kernel(@RO KernelContext kc, @RW S32Arr s32Arr) {
if (kc.x<kc.maxX){
s32Arr.array(kc.x, s32Arr.array(kc.x) * s32Arr.array(kc.x));
}
}
@CodeReflection
public static void compute(@RO ComputeContext cc, @RW S32Array s32Array) {
cc.dispatchKernel(s32Array.length(), kc -> kernel(kc, s32Array));
}
}
25 Copyright © 2025, Oracle and/or its affiliates
Accelerator acc = // get a suitable GPU or Java Accelerator
acc.compute(cc -> Square.compute(cc, s32Arr));
Kernel Code
Compute Code
Regular Java Code

Heterogeneous Accelerator Toolkit (HAT) in Action
Demo: https://github.com/openjdk/babylon/tree/code-reflection/hat/examples/violajones
26 Copyright © 2025, Oracle and/or its affiliates
A precomputed Haar Cascade :
•N Stages (one shown)
Stage:
• Tree of Haar Features (three shown)

Each Haar Feature
•0-3 'rectangles'
•Threshold value



27 Copyright © 2025, Oracle and/or its affiliates

Thank you
Java for AI by Paul Sandoz, Thu 9 Oct @ 9.30, Room 5
ONNX-Based Generative AI LLMs in Java with Project Babylon by Adam Sotona, Thu 9 Oct @ 13.50, Room 9
Copyright © 2025, Oracle and/or its affiliates 29