Large Language Models (LLMs) - Level 3 Slides

H2O.ai Conﬁdential

H2O.ai Conﬁdential
Intro to h2oGPT
by Andreea Turcu

H2O.ai Conﬁdential
Agenda A bit of context
What are GPTs?
Why know what LLMs are?
LLMs origins
What is h2oGPT?
Boosting your productivity with h2oGPT
Limitations of Existing models
Beneﬁts of Open Source models
Demo of h2oGPT

H2O.ai Conﬁdential
v
What are GPTs?

H2O.ai Conﬁdential
v

H2O.ai Conﬁdential
v
Why should I know what LLMs are?

H2O.ai Conﬁdential
v
Why should I know what LLMs are?
Large language models like GPT have diverse business uses:

●automating content
●extracting insights from data,
●personalizing marketing,
●enabling virtual assistants,
●analyzing data,
●facilitating voice-based interactions and translations, etc.

H2O.ai Conﬁdential
v
What are LLMs?

- LLMs (Language Models) are computational models for understanding and generating
human language.
- They are trained on vast amounts of text data.
- LLMs learn grammar, vocabulary, and contextual relationships.
- They can generate coherent and contextually relevant text based on given prompts.
- Collaboration with AI systems becomes more efficient.
- Responsible use and enhanced user experiences can be achieved.

H2O.ai Conﬁdential
v
LLM Origins
Transformers are deep feed-forward neural networks that leverage a machine learning
mechanism called (self) attention and have seen wild success in natural language
processing problems
h2oGPT
The world’s best
completely open
source LLM and
permissible for
commercial use
2023
ChatGPT
Interactive interface
for users to interact
directly with GPT3
and GPT4 modeling
frameworks
2022
GPT
Auto-regressive
language modeling
where the goal is to
predict the next
token
2020
BERT
Bidirectional Encoder
Representations from
Transformers.
Model designed to
recover masked tokens
2019
Encoder-Decoder
(Seq2Seq)
Original Transformer
Architecture for
Machine Translation or
Sequence-to-Sequence
Problems
2017
Reference: https://arxiv.org/pdf/2207.09238.pdf

H2O.ai Conﬁdential
v
What is h2oGPT?

H2O.ai Conﬁdential
v

H2O.ai Conﬁdential H2O.ai Conﬁdential

AI Will Boost Productivity by
10x
Continuous but slow improvements in
automatization and productivity.
Productivity in the US has increased by
250% in 70 years.*
In addition to small specialized models, LLMs
are supporting employees in their daily tasks.
Brainstorming, coding, summarization, analysis
No Code and AutoML enables all companies
to build and use highly accurate models for
specialized tasks

1-Click to solve complex business goals
AI is used in automated
mode. Employees are
supervising their AI
co-workers. Robotics leaps
forward by incorporating
LLMs
2023
2022
up to 2021
2024
2025
*2020 | MIT Work of the Future

H2O.ai Conﬁdential H2O.ai Conﬁdential
In addition to small specialized models, LLMs
are supporting employees in their daily tasks.
Brainstorming, coding, summarization, analysis
1-Click to solve complex business goals
2023
2022
2024

H2O.ai Conﬁdential
v
Popular models such as OpenAI's ChatGPT/GPT-4, Anthropic's Claude, Microsoft's Bing AI Chat, Google's
Bard, and Cohere are powerful and effective, they have certain limitations compared to open-source LLMs:

1.Data Privacy and Security: Using hosted LLMs requires sending data to external servers. This can raise
concerns about data privacy, security, and compliance, especially for sensitive information or industries
with strict regulations.
2.Dependency and Customization: Hosted LLMs often limit the extent of customization and control, as
users rely on the service provider's infrastructure and predefined models.
3.Cost and Scalability: Hosted LLMs usually come with usage fees, which can increase significantly with
large-scale applications.
4.Access and Availability: Hosted LLMs may be subject to downtime or limited availability, affecting users'
access to the models.

Limitations of Existing Models

H2O.ai Conﬁdential
v

1.Cost Effective as users can scale the models on their own infrastructure
without incurring additional costs from the service provider.
2.Flexible: Deployed on-premises or on private clouds, ensuring uninterrupted
access and reducing reliance on external providers.
3.Tunable: Allow users to tailor the models to their specific needs, deploy on
their own infrastructure, and even modify the underlying code.

Overall, open-source LLMs offer greater flexibility, control, and cost-effectiveness,
while addressing data privacy and security concerns. They foster a competitive
landscape in the AI industry and empower users to innovate and customize
models to suit their specific needs.
Beneﬁts of Open Source Models

H2O.ai Conﬁdential
v
h2oGPT
●Released as open source under Apache-2.0 license
●Active development: h2oai/h2ogpt
●See a demo
○gpt.h2o.ai
○?????? Hugging Face Spaces

What is it?
●Commercially usable code, data, and models
●Prompt engineering - ability to prepare open-source
datasets for tuning LLMs
●Tuning: Code for ﬁne-tuning large language models
(currently up to 20B parameters) on commodity hardware
and enterprise GPU servers (single or multi node)
Optimizations
■LoRA (low-rank approximation)
■4-bit and 8-bit quantization for memory-efficient
ﬁne-tuning and generation.
●Deployable: Chatbot with UI and Python API
●Evaluation: LLM performance evaluation

The world’s best open source GPT

H2O.ai Conﬁdential
https://gpt-gm.h2o.ai/
https://gpt.h2o.ai/
Demo of h2oGPT!

H2O.ai Conﬁdential
https://gpt-gm.h2o.ai/
https://gpt.h2o.ai/
Demo of h2oGPT!
Disclaimer:
subject to modification
and updates

H2O.ai Conﬁdential

H2O.ai Conﬁdential
Table of Contents

1.Metrics for Evaluating LLM Performance

2.Common Datasets and Evaluation Challenges

3.Comparison of Different LLM Architectures

4.Intro to H2O LLM Eval

Our exploration will encompass fundamental concepts,
like:
- key metrics
- different architectures
- challenges inherent in developing state-of-the-art
natural language models.
Overview

- For evaluating free-form text generation, A/B testing
and Elo scoring are key.
- A/B testing compares responses from different models
blindly to determine superiority.
- Elo scoring treats A/B comparisons as matches,
providing deeper insights into model performance.
Free-form text generation

- GPT-4-0613 is responsible for evaluating all A/B tests.
- To maintain fairness, the sequence of all potential
games is shuffled, and the positions of Models A and B
are randomized before evaluation by GPT-4.
- 1000 bootstrap rounds are conducted, and the median
Elo score from these rounds determines the final score
for each model.
Fair Evaluation Protocol for A/B
Testing with GPT-4

H2O.ai Conﬁdential
Concerns of Data Contamination
- Many evaluations are found online, causing concerns
about data contamination and biased results.

- Models with a large number of parameters may
struggle with handling multiple tasks effectively.

★Code Proﬁciency

★Commonsense Reasoning

★World Knowledge

★Reading Comprehension

★Math Skills

★Big Bench Assessment

★Safety Metrics

A Look at LLAMa-2's Testing Framework

- Researchers used human-preference reward
modeling for optimal model selection in
reinforcement learning.
- Thorough human evaluations assessed
helpfulness and safety.
- Contamination analysis identified potential
data leaks.
Optimal Model Selection in
Reinforcement Learning

Comprehensive Evaluation of LLMa-2
❖ Academic Testing:
Rigorous evaluations across various academic benchmarks, including code proficiency,
commonsense reasoning, world knowledge, reading comprehension, and mathematical
skills.
❖ Safety Metrics:
Dedicated safety benchmarks to assess truthfulness, toxicity, and bias mitigation.
❖ Automated Metrics:
Objective measurements to gauge performance and efficiency.
❖ Human Ratings:
Expert human evaluators provided valuable insights.
❖ Contamination Checks:
Ensuring model integrity by detecting and addressing potential issues.

Quantitative Metrics

●Perplexity, accuracy,
BLEU, and ROUGE
scores.

●Useful for tasks with
ground truth criteria

Evaluating Large Language Models (LLMs)

Comprehensive Frameworks

●HELM and MT-Bench.

●Widely used for
benchmarking LLMs.

Free-Form Text Generation

●Automated metrics may
be ineffective.

●A/B testing and Elo
scoring

Comparing LLM Architectures

➢ Tokens Trained On:
Language capacity.

➢ GPU Power: Training
eﬃciency.

➢ Parameters: Model
complexity.

H2O.ai Conﬁdential
H2O LLM Eval Studio

❏Unified Platform

Comprehensive suite of tools for evaluating and understanding
LLMs.

❏Assess Performance

Evaluate models across dimensions: natural language inference,
question-answering, and text generation.

❏Model Comparison

Compare with state-of-the-art LLMs and benchmarks.
Gain insights into strengths and weaknesses.

H2O.ai Conﬁdential
Fine-tuning
Reﬁning pre-trained
models using
task-speciﬁc data,
enhancing their
performance on
targeted tasks.
Foundation
Powerful language
models trained on
extensive text data,
forming the basis for
various language
tasks.
Explore H2O EvalGPT
01 03
Eval LLMs
Thoroughly assessing
and comparing LLMs
is increasingly vital
due to their
heightened
signiﬁcance and
complexity.
04
05
04
03
02
01
DataPrep
Converting
documents into
instruction pairs, like
QA pairs, facilitating
ﬁne-tuning and
tasks.
02
Database
Effectively utilize
company data with a
database that
seamlessly
integrates new PDFs,
eliminating the need
for model retraining.
05
Applications
Elevate interactions
with advanced
language
comprehension and
LLM-driven response
generation for
enriched user
experiences.
06

H2O.ai Conﬁdential

Large Language Models (LLMs) - Level 3 Slides

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Large Language Models (LLMs) - Level 3 Slides

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

TLE-9-Prepare-Salad-and-Dressing.pptxkkk

LESSON 1 ABOUT MEDIA AND INFORMATION.pptx

GRADE-8-AQUACULTURE-WEEKQ1.pdfdfawgwyrsewru

Feelings PP Game FOR CHILDREN IN ELEMENTARY SCHOOL.pptx

Jeopardy_Figures_of_Speech_Template.pptx [Autosaved].pptx

Jeopardy_Figures_of_Speech.pptxvdsvdsvsdvsd