General purpose
training jobs
Cluster federation
GPU workloads mgmt Storage mgmtModel mgmt
Open models
Closed models
owned by your org
Fine-tuned models
Runtime mgmt
(e.g., autoscaling, routing)
vLLM
Nvidia
Triton
Ollama
Fine-tuning jobs
API usage audits
K8s cluster K8s cluster
Files
Vector DBs
Jupyter Notebooks
Training jobs
Kueue
Dex
API authn/authz
●Advanced capabilities surpassing standard inference runtimes,
such as vLLM
○Optimized request serving and GPU management
○Multiple inference runtime support
○Multiple model support
○Built-in RAG integration
Open models
from Hugging Face
Private models in
customers’
environment
Fine-tuned models
generated with
LLMariner
vLLM Ollama
Nvidia Triton
Inference Server
Hugging Face
TGI
Upcoming Experimental
Private cloud
Public cloud
LLMariner
Control Plane
LLMariner
Agent
Cloud Y
Cloud A
LLMariner
Control Plane
K8s cluster
LLMariner
Control Plane
LLMariner
Agent
Cloud Y
Cloud B
LLMariner
Agent
※ No need to open incoming ports in worker clusters, only outgoing port 443 is required