LLMariner - Transform your Kubernetes Cluster Into a GenAI platform

kenjikaneda2 1,233 views 15 slides Oct 11, 2024

Slide 1 of 15

About This Presentation

Overview of LLMariner (https://llmariner.ai)

Size: 1.07 MB

Language: en

Added: Oct 11, 2024

Slides: 15 pages

Slide Content

© 2024 CloudNatix, All Rights Reserved
LLMariner
Provide a unified AI/ML platform
with efficient GPU and K8s management

LLMariner
LLM
(inference,
fine-tuning, RAG)
Workbench
(Jupyter Notebook)
Non-LLM
Training

public/private cloud
GPUs
(gen G1,
arch A1)
GPUs
(gen G2,
arch A2)
public/private cloud
GPUs
(gen G1,
arch A1)
GPUs
(gen G2,
arch A2)
public/private cloud
GPUs
(gen G1,
arch A1)
GPUs
(gen G2,
arch A2)

© 2024 CloudNatix, All Rights Reserved
Example Use Cases
●Develop LLM applications with the API that is compatible with
OpenAI API
○Leverage existing ecosystem to build applications

●Fine-tune models while keeping data safely and securely
in your on-premise datacenter

Code auto-completionChat bot

© 2024 CloudNatix, All Rights Reserved
Key Features
●LLM Inference
●LLM fine-tuning
●RAG
●Jupyter Notebook
●General-purpose training

●Flexible deployment model
●Efficient GPU management
●Security / access control
●GPU visibility/showback
(*)

●Highly-reliable GPU
management
(*)

For AI/ML team For infrastructure team
(*) under development

© 2024 CloudNatix, All Rights Reserved
High Level Architecture
Worker GPU K8s
cluster
LLMariner Agent
for AI/ML
Worker GPU K8s
cluster
LLMariner Agent
for AI/ML
Worker GPU K8s
cluster
LLMariner Agent
for AI/ML
Control plane K8s
cluster
LLMariner Control
Plane for AI/ML
API endpoint

© 2024 CloudNatix, All Rights Reserved
Features for AI/ML team and infra team
APIs for the AI/ML team
K8s cluster
OpenAI-compatible API
(chat completion, embedding, RAG, fine-tuning, …)
Workbench with
Jupyter Notebooks
Inference engine

User mgmt

General purpose
training jobs
Cluster federation
GPU workloads mgmt Storage mgmtModel mgmt
Open models
Closed models
owned by your org
Fine-tuned models
Runtime mgmt
(e.g., autoscaling, routing)
vLLM
Nvidia
Triton
Ollama
Fine-tuning jobs
API usage audits
K8s cluster K8s cluster
Files
Vector DBs
Jupyter Notebooks
Training jobs
Kueue
Dex
API authn/authz

API key mgmt Orgs & projects mgmt
Cluster mgmt Secure session mgmt

© 2024 CloudNatix, All Rights Reserved
LLM Inference Serving
●Compatible with OpenAI API
○Can leverage the existing ecosystem and applications

●Advanced capabilities surpassing standard inference runtimes,
such as vLLM
○Optimized request serving and GPU management
○Multiple inference runtime support
○Multiple model support
○Built-in RAG integration

© 2024 CloudNatix, All Rights Reserved
Multiple Model and Runtime Support
●Multiple model support

●Multiple inference runtime support

Open models
from Hugging Face
Private models in
customers’
environment
Fine-tuned models
generated with
LLMariner
vLLM Ollama
Nvidia Triton
Inference Server
Hugging Face
TGI
Upcoming Experimental

© 2024 CloudNatix, All Rights Reserved
Cluster X
Optimized Inference Serving
●Efficiently utilize GPU to achieve high throughput and low latency
●Key technologies:
○Autoscaling
○Model-aware request load balancing & routing
○Multi-model management & caching
○Multi-cluster/cloud federation
LLMariner
Inference Manager Engine
vLLM
Llama 3.1
vLLM
Gemma 2
Autoscaling
vLLM
Llama 3.1
Ollama
Deepseek
Coder
Cluster Y

© 2024 CloudNatix, All Rights Reserved
Built-in RAG Integration
●Use API compatible OpenAI to manage vector stores and files
○Use Milvus as an underlying vector DB
●Inference engine retrieves relevant data when processing requests

File

File

File
Upload and create
embeddings
LLMariner
Inference
Engine
Retrieve data

© 2024 CloudNatix, All Rights Reserved
GPU K8s cluster
Beyond LLM Inference
●Provide LLM fine-tuning, general-purpose training, and Jupyter
Notebook management

●Empower AI/ML teams to harness the full power of GPUs in a
secure self-contained environment
Supervised
Fine-tuning Trainer

© 2024 CloudNatix, All Rights Reserved
A Fine-tuning Example
●Submit a fine-tuning job using the OpenAI Python library
○Fine-tuned job runs in an underlying Kubernetes cluster
●Enforce quota with integration with open source Kueue
K8s cluster
GPU GPU GPU GPU
Fine-tuning
job
Fine-tuning
job
Quota enforcement
with Kueue
submit

© 2024 CloudNatix, All Rights Reserved
Project X
Enterprise-Ready Access Control
●Control API scope with “organizations” and “projects”
○A user in Project X can access fine-tuned models generated by
other users in project X
○A user in Project Y cannot access the fine-tuned models in X
●Can be integrated with a customer’s identity management platform
(e.g., SAML, OIDC)
Project Y
User 1
User 2
Fine-tuned
model
User 3
create
read
cannot access

© 2024 CloudNatix, All Rights Reserved
Supported Deployment Models
Single public cloud
Single private cloud
Air-gapped env
Appliance

Hybrid cloud
(public & private)

Multi-cloud federation

Private cloud
Public cloud
LLMariner
Control Plane
LLMariner
Agent
Cloud Y
Cloud A
LLMariner
Control Plane
K8s cluster
LLMariner
Control Plane
LLMariner
Agent
Cloud Y
Cloud B
LLMariner
Agent
※ No need to open incoming ports in worker clusters, only outgoing port 443 is required

LLMariner - Transform your Kubernetes Cluster Into a GenAI platform

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

LLMariner - Transform your Kubernetes Cluster Into a GenAI platform

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx