●Advanced capabilities surpassing standard inference runtimes,
such as vLLM
○Optimized request serving and GPU management
○Multiple inference runtime support
○Multiple model support
○Built-in RAG integration
Open models
from Hugging Face
Private models in
customers’
environment
Fine-tuned models
generated with
LLMariner
vLLM Ollama
Hugging Face
TGI
Nvidia
Tensor-RT LLM
Upcoming
Private cloud
Public cloud
LLMariner
Control Plane
LLMariner
Agent
Cloud Y
Cloud A
LLMariner
Control Plane
K8s cluster
LLMariner
Control Plane
LLMariner
Agent
Cloud Y
Cloud B
LLMariner
Agent
※ No need to open incoming ports in worker clusters, only outgoing port 443 is required