Everything You Need to Know About Running LLMs Locally

Everything You Need to Know About
Running LLMs Locally

All Things Open: 2025
Cedric Clyburn
Senior Developer Advocate
@cedricclyburn

2
We’ve got a lot to cover today!

We’ve got a lot to cover today!
Wait, so you can
run your own
language models…
completely local?

We’ve got a lot to cover today!
and there are plenty of
open source tools to do so?!
Wait, so you can
run your own
language models…
completely local?

We’ve got a lot to cover today!
and there are plenty of
open source tools to do so?!
Wait, so you can
run your own
language models…
completely local?
But there’s over 2 million
models, which to pick?

We’ve got a lot to cover today!
and there are plenty of
open source tools to do so?!
Wait, so you can
run your own
language models…
completely local?
But there’s over 2 million
models, which to pick?
or how can i use my own
PDF’s or codebase or API’s?

Today’s Schedule
▸Demo #1: Model
serving & RAG
▸Demo #2: Code
assistance
▸Demo #3: Adding AI
features to apps
(Agentic AI)

▸Running your own
AI & LLMs
▸How to choose the
right model?
▸Integrating your
data & codebase!
Agenda
7
Session Slides
red.ht/local-llm

Why’s everyone running
their own AI models?
8

9
Why run a model locally?

For Developers
Familiarity with the Development Environment and adherence
of the developers to their “local developer experience” in
particular for testing and debugging
Convenience & Simplicity
Direct Access to Hardware
Ease of Integration
Simplify the integration of the model with existing systems
and applications that are already running locally.

For Organizations
Data Privacy and Security
Data is the fuel for AI, and a differentiator factor (quality, quantity,
qualiﬁcation). Keeping data on-premises ensures sensitive information doesn’t
leave the local environment → crucial for privacy-sensitive applications
Cost Control
While there is an initial investment in hardware and setup, running locally can
potentially reduce ongoing costs of cloud computing services and alleviate the
vendor-locking played by Amazon, MSFT, Google
Regulatory Compliance
Some industries have strict regulations about where and how data is
processed
Take advantage of total AI customization and control
Customization & Control
Easily train or ﬁne-tune your own model, from the
convenience of the developer’s local machine.

10
But the stack can be a bit overwhelming!

11
But the stack can be a bit overwhelming!
2024 MAD (Machine
learning, Artiﬁcial
Intelligence & Data)
Landscape

12
But the stack can be a bit overwhelming!
2024 MAD (Machine
learning, Artiﬁcial
Intelligence & Data)
Landscape

13

14
Average developer trying to download & manage models, conﬁgure serving runtimes, quantize and
compress LLMs, ensure correct prompt templates… (Colorized, 2023)

So, what open source
tech can help us run AI?
15

Fortunately, there’s a lot… for every use case!

Fortunately, there’s a lot… for every use case!
17

18
▸Simple CLI: “Docker” style tool for running
LLMs locally, ofﬂine, and privately
▸Extensible: Basic model customization
(Modelﬁle) and importing of ﬁne-tuned LLMs
▸Lightweight: Efﬁcient and resource-friendly.
▸Easy API: API for both inferencing and Ollama
itself (ex. download models)
For simple model downloading & serving
Tool #1: Ollama
LLM Tools

19
▸Research-Based: UC Berkeley project to
improve model speeds and GPU consumption
▸Standardized: Works with Hugging Face &
OpenAI API.
▸Versatile: Supports NVIDIA, AMD, Intel, TPUs &
more.
▸Scalable: Manages multiple requests efﬁciently,
ex. with Kubernetes as an LLM runtime
For scaling things up in production environments
Tool #2: vLLM
LLM Tools

20
▸AI in Containers: Run models with
Podman/Docker with no conﬁg needed.
▸Registry Agnostic: Freedom to pull models
from Hugging Face, Ollama, or OCI registries.
▸GPU Optimized: Auto-detect & accelerate
performance.
▸Flexible: Supports llama.cpp, vLLM,
whisper.cpp & more.
To make AI boring by using containers
Tool #3: Ramalama
LLM Tools

21
▸For App Builders: Choose from various recipes
like RAG, Agentic, Summarizers
▸Curated Models: Easily access Apache 2.0
open-source options.
▸Container Native: Easy app integration and
movement from local to production.
▸Interactive Playgrounds: Test & optimize
models with your custom prompts and data.
For developers looking to build AI features
Tool #4: Podman AI Lab
LLM Tools

Cool! But, what speciﬁc
model should I be using?
22

23
There are plenty of open, and closed model choices!

24
Darn, we’re back to here again!

25
But again, we’ll use another video game analogy!

26
But again, we’ll use another video game analogy!

27
▸It depends on the use case that you want to
tackle.
▸DeepSeek models excel in reasoning tasks
and complex problem-solving.
▸Granite SLM models perform well in various
NLP tasks and multimodal applications.
▸Mistral and LLaMA are particularly strong in
summarization and sentiment analysis.
Well… it depends!
So, which model should you select?
Model Selection

28
Our data isn’t always in one format, it’s text, image, audio, etc.
But not all models are the same!
Model Selection
Text Image
Unimodal
text-to-image
text-to-text
image-to-text
image-to-image
text-to-code
Text Image Audio Video
Multimodal
any-to-any
✓Single data input
✓Less resources
✓Single modality
✓Limited depth and
accuracy
✓Multiple data inputs
✓More resources
✓Multiple modality
✓Better understanding
and accuracy
OR

29
Kind of like how our apps are compiled for various architectures!
Also! There’s a naming convention
Model Selection
ibm-granite/granite-3.0-8b-base
Family name
Model architecture and
version Number of parameters
Model ﬁne-tuned to be
a baseline
Mixtral-8x7B-Instruct-v0.1
Family name
Model version
Number of
parameters Model ﬁne-tuned for
instructive tasks
Architecture type

30
For example, DeepSeek-R1 or Llama 3.3-405B
How to deploy a larger model?
Model Selection
Let’s say you
want the best
benchmarks with
a frontier model

31
What about model size?

32
For example, DeepSeek-R1 or Llama 3.3-405B
How to deploy a larger model?
Model Selection
Neither of these
situations is ideal :)

33
It’s a way to compress models, think like a .zip or .tar
Well, most models for local usage are quantized!
Model Selection
▸Quantization: A technique to compress
LLMs by reducing numerical precision.
▸Converts high-precision weights (FP32) into
lower-bit formats (FP16, INT8, INT4).
▸Reduces memory footprint, making models
easier to deploy.

34
It’s a way to compress models, think like a .zip or .tar
Well, most models for local usage are quantized!
Model Selection
▸The Beneﬁt? Run LLMs on “any” device, not
just your local machine but IoT & Edge too
▸Results in faster and lighter models that still
maintain reasonable accuracy
･Testing with Llama 3.1, for W4A16-INT
resulted in 2.4x performance speedup
and 3.5x model size compression
▸Works on GPUs & CPUs!
Source:
https://neuralmagic.com/blog/we-ran-over-half-a-million-evaluations-on-quantized-llms-heres-what-we-found

35
Check it out on Hugging Face & save resources on LLM serving!
& there’s a open repository of Quantized Models
Model Selection
Comprehensive Validation Extensive SelectionBroad Collection
Instinct
GPUs
CPUs
TPUs
Formats
●W4/8A16
●W8A8-INT8
●W8A8-FP8
●2:4 sparse
Hardware
Algorithms
●GPTQ / AWQ
●SmoothQuant
●SparseGPT
●RTN
Llama Qwen
Mistral DeepSeek
Gemma
Phi
Molmo Granite Nemotron
Cut GPU costs in half ready-to-deploy
inference-optimized checkpoints

AI Engine? Check ✔
AI Model? Check ✔
What about your data?
36

37
Data
Interfaces
Pull in documents (PDF), web
results, and agents together.

Tools: AnythingLLM,
OpenWebUI, LM Studio
Prompting &
Building Apps

Code Assistance

Tuning models with private
data for enterprise use cases
is too complex for non-data
scientists.
Enterprise AI use cases span
data center, cloud & edge and
can’t be constrained to a
single public cloud service.
AI + Your Data
How can you integrate AI with your unique data?
Fortunately, many tools exist for this too!
Ask a question to a PDF &
receive citations!

38

Code
Assistance
Use a model as a pair
programmer, to generate and
explain your codebase.

Tools: Continue, Cody,
Cursor, Windsurf
Prompting &
Building Apps

Code Assistance

Tuning models with private
data for enterprise use cases
is too complex for non-data
scientists.
Enterprise AI use cases span
data center, cloud & edge and
can’t be constrained to a
single public cloud service.
AI + Your Data
How can you integrate AI with your unique data?
Fortunately, many tools exist for this too!

No more copy/pasting, it’s
part of the IDE!

39
Prompting &
Building Apps
Experiment with data, build
Proof of Concepts, and
integrate AI into apps.

Tools: Podman AI Lab,
Docker Gen AI Stack
Prompting &
Building Apps

Code Assistance

Tuning models with private
data for enterprise use cases
is too complex for non-data
scientists.
Enterprise AI use cases span
data center, cloud & edge and
can’t be constrained to a
single public cloud service.
AI + Your Data
How can you integrate AI with your unique data?
Fortunately, many tools exist for this too!

Starting
points for
common AI
apps

40
Data
Interfaces
Pull in documents (PDF), web
results, and agents together.

Tools: AnythingLLM,
OpenWebUI, LM Studio
Prompting &
Building Apps
Experiment with data, build
Proof of Concepts, and
integrate AI into apps.

Tools: Podman AI Lab, Docker
Gen AI Stack

AI + Your Data
How can you integrate AI with your unique data?
Fortunately, many tools exist for this too!

Code Assistance

Use a model as a pair
programmer, to generate and
explain your codebase.

Tools: Continue, Cody, Cursor,
Windsurf

Demo Time!
Model serving
& RAG
41
Demo Repository:
https://github.com/rh-aiservices-bu/deploy-local-llms-talk-demo
Code Repository URL
red.ht/local-llm-demo

Demo Time!
Code assistants
42
Code Repository URL
red.ht/local-llm-demo
Demo Repository:
https://github.com/rh-aiservices-bu/deploy-local-llms-talk-demo

Demo Time!
Adding AI features to
apps (aka Agentic AI)
43
Code Repository URL
red.ht/local-llm-demo
Demo Repository:
https://github.com/rh-aiservices-bu/deploy-local-llms-talk-demo

44
Data
Interfaces
Pull in documents (PDF), web
results, and agents together.

Tools: AnythingLLM,
OpenWebUI, LM Studio
Prompting &
Building Apps
Experiment with data, build
Proof of Concepts, and
integrate AI into apps.

Tools: Podman AI Lab, Docker
Gen AI Stack

AI + Your Data
How can you integrate AI with your unique data?
Fortunately, many tools exist for this too!

Code Assistance

Use a model as a pair
programmer, to generate and
explain your codebase.

Tools: Continue, Cody, Cursor,
Windsurf

45
Thank you! You’re awesome!
Session Slides
red.ht/local-llm
▸Running your own
AI & LLMs
▸How to choose the
right model?
▸Integrating your
data & codebase!
Feel free to connect
on LinkedIn!

CONFIDENTIAL designator
linkedin.com/company/red-hat
youtube.com/user/RedHatVideo
s
facebook.com/redhatinc
twitter.com/RedHat
Join the DevNation
Red Hat Developer serves the builders. The problem solvers who
create careers with code. Let’s keep in touch!
●Join Red Hat Developer at developers.redhat.com/register
●Follow us on any of our social channels
●Visit dn.dev/upcoming for a schedule of our upcoming events

Red Hat Developer
Build here. Go anywhere.
Thank you
46

Everything You Need to Know About Running LLMs Locally

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Everything You Need to Know About Running LLMs Locally

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx