Large Language Models (LLMs) - Level 3 Slides

0xdata 257 views 41 slides Jul 17, 2024
Slide 1
Slide 1 of 41
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41

About This Presentation

Large Language Models (LLMs) - Level 3: Presentation Slides

Welcome to the Large Language Models (LLMs) - Level 3 course!
These presentation slides have been meticulously crafted by H2O.ai University to complement the course content. You can access the course directly using the below link: https:...


Slide Content

H2O.ai Confidential

H2O.ai Confidential

H2O.ai Confidential

H2O.ai Confidential

H2O.ai Confidential

H2O.ai Confidential

H2O.ai Confidential
Intro to h2oGPT
by Andreea Turcu

H2O.ai Confidential
Agenda A bit of context
What are GPTs?
Why know what LLMs are?
LLMs origins
What is h2oGPT?
Boosting your productivity with h2oGPT
Limitations of Existing models
Benefits of Open Source models
Demo of h2oGPT

H2O.ai Confidential
v
What are GPTs?

H2O.ai Confidential
v

H2O.ai Confidential
v
Why should I know what LLMs are?

H2O.ai Confidential
v
Why should I know what LLMs are?

H2O.ai Confidential
v
Why should I know what LLMs are?
Large language models like GPT have diverse business uses:

●automating content
●extracting insights from data,
●personalizing marketing,
●enabling virtual assistants,
●analyzing data,
●facilitating voice-based interactions and translations, etc.

H2O.ai Confidential
v
What are LLMs?

- LLMs (Language Models) are computational models for understanding and generating
human language.
- They are trained on vast amounts of text data.
- LLMs learn grammar, vocabulary, and contextual relationships.
- They can generate coherent and contextually relevant text based on given prompts.
- Collaboration with AI systems becomes more efficient.
- Responsible use and enhanced user experiences can be achieved.

H2O.ai Confidential
v
LLM Origins
Transformers are deep feed-forward neural networks that leverage a machine learning
mechanism called (self) attention and have seen wild success in natural language
processing problems
h2oGPT
The world’s best
completely open
source LLM and
permissible for
commercial use
2023
ChatGPT
Interactive interface
for users to interact
directly with GPT3
and GPT4 modeling
frameworks
2022
GPT
Auto-regressive
language modeling
where the goal is to
predict the next
token
2020
BERT
Bidirectional Encoder
Representations from
Transformers.
Model designed to
recover masked tokens
2019
Encoder-Decoder
(Seq2Seq)
Original Transformer
Architecture for
Machine Translation or
Sequence-to-Sequence
Problems
2017
Reference: https://arxiv.org/pdf/2207.09238.pdf

H2O.ai Confidential
v
What is h2oGPT?

H2O.ai Confidential
v

H2O.ai Confidential H2O.ai Confidential

AI Will Boost Productivity by
10x
Continuous but slow improvements in
automatization and productivity.
Productivity in the US has increased by
250% in 70 years.*
In addition to small specialized models, LLMs
are supporting employees in their daily tasks.
Brainstorming, coding, summarization, analysis
No Code and AutoML enables all companies
to build and use highly accurate models for
specialized tasks

1-Click to solve complex business goals
AI is used in automated
mode. Employees are
supervising their AI
co-workers. Robotics leaps
forward by incorporating
LLMs
2023
2022
up to 2021
2024
2025
*2020 | MIT Work of the Future

H2O.ai Confidential H2O.ai Confidential
In addition to small specialized models, LLMs
are supporting employees in their daily tasks.
Brainstorming, coding, summarization, analysis
1-Click to solve complex business goals
2023
2022
2024

H2O.ai Confidential H2O.ai Confidential
In addition to small specialized models, LLMs
are supporting employees in their daily tasks.
Brainstorming, coding, summarization, analysis
1-Click to solve complex business goals
2023
2022
2024

H2O.ai Confidential
v
Popular models such as OpenAI's ChatGPT/GPT-4, Anthropic's Claude, Microsoft's Bing AI Chat, Google's
Bard, and Cohere are powerful and effective, they have certain limitations compared to open-source LLMs:

1.Data Privacy and Security: Using hosted LLMs requires sending data to external servers. This can raise
concerns about data privacy, security, and compliance, especially for sensitive information or industries
with strict regulations.
2.Dependency and Customization: Hosted LLMs often limit the extent of customization and control, as
users rely on the service provider's infrastructure and predefined models.
3.Cost and Scalability: Hosted LLMs usually come with usage fees, which can increase significantly with
large-scale applications.
4.Access and Availability: Hosted LLMs may be subject to downtime or limited availability, affecting users'
access to the models.

Limitations of Existing Models

H2O.ai Confidential
v

1.Cost Effective as users can scale the models on their own infrastructure
without incurring additional costs from the service provider.
2.Flexible: Deployed on-premises or on private clouds, ensuring uninterrupted
access and reducing reliance on external providers.
3.Tunable: Allow users to tailor the models to their specific needs, deploy on
their own infrastructure, and even modify the underlying code.

Overall, open-source LLMs offer greater flexibility, control, and cost-effectiveness,
while addressing data privacy and security concerns. They foster a competitive
landscape in the AI industry and empower users to innovate and customize
models to suit their specific needs.
Benefits of Open Source Models

H2O.ai Confidential
v
h2oGPT
●Released as open source under Apache-2.0 license
●Active development: h2oai/h2ogpt
●See a demo
○gpt.h2o.ai
○?????? Hugging Face Spaces

What is it?
●Commercially usable code, data, and models
●Prompt engineering - ability to prepare open-source
datasets for tuning LLMs
●Tuning: Code for fine-tuning large language models
(currently up to 20B parameters) on commodity hardware
and enterprise GPU servers (single or multi node)
Optimizations
■LoRA (low-rank approximation)
■4-bit and 8-bit quantization for memory-efficient
fine-tuning and generation.
●Deployable: Chatbot with UI and Python API
●Evaluation: LLM performance evaluation

The world’s best open source GPT

H2O.ai Confidential
https://gpt-gm.h2o.ai/
https://gpt.h2o.ai/
Demo of h2oGPT!

H2O.ai Confidential
https://gpt-gm.h2o.ai/
https://gpt.h2o.ai/
Demo of h2oGPT!
Disclaimer:
subject to modification
and updates

H2O.ai Confidential

H2O.ai Confidential
Table of Contents

1.Metrics for Evaluating LLM Performance

2.Common Datasets and Evaluation Challenges

3.Comparison of Different LLM Architectures

4.Intro to H2O LLM Eval

Our exploration will encompass fundamental concepts,
like:
- key metrics
- different architectures
- challenges inherent in developing state-of-the-art
natural language models.
Overview

- For evaluating free-form text generation, A/B testing
and Elo scoring are key.
- A/B testing compares responses from different models
blindly to determine superiority.
- Elo scoring treats A/B comparisons as matches,
providing deeper insights into model performance.
Free-form text generation

- GPT-4-0613 is responsible for evaluating all A/B tests.
- To maintain fairness, the sequence of all potential
games is shuffled, and the positions of Models A and B
are randomized before evaluation by GPT-4.
- 1000 bootstrap rounds are conducted, and the median
Elo score from these rounds determines the final score
for each model.
Fair Evaluation Protocol for A/B
Testing with GPT-4

H2O.ai Confidential
Concerns of Data Contamination
- Many evaluations are found online, causing concerns
about data contamination and biased results.

- Models with a large number of parameters may
struggle with handling multiple tasks effectively.

★Code Proficiency

★Commonsense Reasoning

★World Knowledge

★Reading Comprehension

★Math Skills

★Big Bench Assessment

★Safety Metrics


A Look at LLAMa-2's Testing Framework

- Researchers used human-preference reward
modeling for optimal model selection in
reinforcement learning.
- Thorough human evaluations assessed
helpfulness and safety.
- Contamination analysis identified potential
data leaks.
Optimal Model Selection in
Reinforcement Learning

Comprehensive Evaluation of LLMa-2
❖ Academic Testing:
Rigorous evaluations across various academic benchmarks, including code proficiency,
commonsense reasoning, world knowledge, reading comprehension, and mathematical
skills.
❖ Safety Metrics:
Dedicated safety benchmarks to assess truthfulness, toxicity, and bias mitigation.
❖ Automated Metrics:
Objective measurements to gauge performance and efficiency.
❖ Human Ratings:
Expert human evaluators provided valuable insights.
❖ Contamination Checks:
Ensuring model integrity by detecting and addressing potential issues.

Quantitative Metrics

●Perplexity, accuracy,
BLEU, and ROUGE
scores.

●Useful for tasks with
ground truth criteria


Evaluating Large Language Models (LLMs)

Comprehensive Frameworks

●HELM and MT-Bench.

●Widely used for
benchmarking LLMs.





Free-Form Text Generation

●Automated metrics may
be ineffective.

●A/B testing and Elo
scoring

Comparing LLM Architectures


➢ Tokens Trained On:
Language capacity.

➢ GPU Power: Training
efficiency.

➢ Parameters: Model
complexity.

H2O.ai Confidential
H2O LLM Eval Studio


❏Unified Platform

Comprehensive suite of tools for evaluating and understanding
LLMs.

❏Assess Performance

Evaluate models across dimensions: natural language inference,
question-answering, and text generation.

❏Model Comparison

Compare with state-of-the-art LLMs and benchmarks.
Gain insights into strengths and weaknesses.

H2O.ai Confidential
Fine-tuning
Refining pre-trained
models using
task-specific data,
enhancing their
performance on
targeted tasks.
Foundation
Powerful language
models trained on
extensive text data,
forming the basis for
various language
tasks.
Explore H2O EvalGPT
01 03
Eval LLMs
Thoroughly assessing
and comparing LLMs
is increasingly vital
due to their
heightened
significance and
complexity.
04
05
04
03
02
01
DataPrep
Converting
documents into
instruction pairs, like
QA pairs, facilitating
fine-tuning and
tasks.
02
Database
Effectively utilize
company data with a
database that
seamlessly
integrates new PDFs,
eliminating the need
for model retraining.
05
Applications
Elevate interactions
with advanced
language
comprehension and
LLM-driven response
generation for
enriched user
experiences.
06

H2O.ai Confidential

H2O.ai Confidential