Large Language Models (LLMs) - Level 3: Presentation Slides
Welcome to the Large Language Models (LLMs) - Level 3 course!
These presentation slides have been meticulously crafted by H2O.ai University to complement the course content. You can access the course directly using the below link: https:...
Large Language Models (LLMs) - Level 3: Presentation Slides
Welcome to the Large Language Models (LLMs) - Level 3 course!
These presentation slides have been meticulously crafted by H2O.ai University to complement the course content. You can access the course directly using the below link: https://h2o.ai/university/courses/large-language-models-level3/
In this course, we’ll take a deep dive into the H2O.ai Generative AI ecosystem, focusing on LLMs. Whether you’re a seasoned data scientist or just starting out, these slides will equip you with essential knowledge and practical skills.
Size: 5.02 MB
Language: en
Added: Jul 17, 2024
Slides: 41 pages
Slide Content
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
Intro to h2oGPT
by Andreea Turcu
H2O.ai Confidential
Agenda A bit of context
What are GPTs?
Why know what LLMs are?
LLMs origins
What is h2oGPT?
Boosting your productivity with h2oGPT
Limitations of Existing models
Benefits of Open Source models
Demo of h2oGPT
H2O.ai Confidential
v
What are GPTs?
H2O.ai Confidential
v
H2O.ai Confidential
v
Why should I know what LLMs are?
H2O.ai Confidential
v
Why should I know what LLMs are?
H2O.ai Confidential
v
Why should I know what LLMs are?
Large language models like GPT have diverse business uses:
●automating content
●extracting insights from data,
●personalizing marketing,
●enabling virtual assistants,
●analyzing data,
●facilitating voice-based interactions and translations, etc.
H2O.ai Confidential
v
What are LLMs?
- LLMs (Language Models) are computational models for understanding and generating
human language.
- They are trained on vast amounts of text data.
- LLMs learn grammar, vocabulary, and contextual relationships.
- They can generate coherent and contextually relevant text based on given prompts.
- Collaboration with AI systems becomes more efficient.
- Responsible use and enhanced user experiences can be achieved.
H2O.ai Confidential
v
LLM Origins
Transformers are deep feed-forward neural networks that leverage a machine learning
mechanism called (self) attention and have seen wild success in natural language
processing problems
h2oGPT
The world’s best
completely open
source LLM and
permissible for
commercial use
2023
ChatGPT
Interactive interface
for users to interact
directly with GPT3
and GPT4 modeling
frameworks
2022
GPT
Auto-regressive
language modeling
where the goal is to
predict the next
token
2020
BERT
Bidirectional Encoder
Representations from
Transformers.
Model designed to
recover masked tokens
2019
Encoder-Decoder
(Seq2Seq)
Original Transformer
Architecture for
Machine Translation or
Sequence-to-Sequence
Problems
2017
Reference: https://arxiv.org/pdf/2207.09238.pdf
H2O.ai Confidential
v
What is h2oGPT?
H2O.ai Confidential
v
H2O.ai Confidential H2O.ai Confidential
AI Will Boost Productivity by
10x
Continuous but slow improvements in
automatization and productivity.
Productivity in the US has increased by
250% in 70 years.*
In addition to small specialized models, LLMs
are supporting employees in their daily tasks.
Brainstorming, coding, summarization, analysis
No Code and AutoML enables all companies
to build and use highly accurate models for
specialized tasks
1-Click to solve complex business goals
AI is used in automated
mode. Employees are
supervising their AI
co-workers. Robotics leaps
forward by incorporating
LLMs
2023
2022
up to 2021
2024
2025
*2020 | MIT Work of the Future
H2O.ai Confidential H2O.ai Confidential
In addition to small specialized models, LLMs
are supporting employees in their daily tasks.
Brainstorming, coding, summarization, analysis
1-Click to solve complex business goals
2023
2022
2024
H2O.ai Confidential H2O.ai Confidential
In addition to small specialized models, LLMs
are supporting employees in their daily tasks.
Brainstorming, coding, summarization, analysis
1-Click to solve complex business goals
2023
2022
2024
H2O.ai Confidential
v
Popular models such as OpenAI's ChatGPT/GPT-4, Anthropic's Claude, Microsoft's Bing AI Chat, Google's
Bard, and Cohere are powerful and effective, they have certain limitations compared to open-source LLMs:
1.Data Privacy and Security: Using hosted LLMs requires sending data to external servers. This can raise
concerns about data privacy, security, and compliance, especially for sensitive information or industries
with strict regulations.
2.Dependency and Customization: Hosted LLMs often limit the extent of customization and control, as
users rely on the service provider's infrastructure and predefined models.
3.Cost and Scalability: Hosted LLMs usually come with usage fees, which can increase significantly with
large-scale applications.
4.Access and Availability: Hosted LLMs may be subject to downtime or limited availability, affecting users'
access to the models.
Limitations of Existing Models
H2O.ai Confidential
v
1.Cost Effective as users can scale the models on their own infrastructure
without incurring additional costs from the service provider.
2.Flexible: Deployed on-premises or on private clouds, ensuring uninterrupted
access and reducing reliance on external providers.
3.Tunable: Allow users to tailor the models to their specific needs, deploy on
their own infrastructure, and even modify the underlying code.
Overall, open-source LLMs offer greater flexibility, control, and cost-effectiveness,
while addressing data privacy and security concerns. They foster a competitive
landscape in the AI industry and empower users to innovate and customize
models to suit their specific needs.
Benefits of Open Source Models
H2O.ai Confidential
v
h2oGPT
●Released as open source under Apache-2.0 license
●Active development: h2oai/h2ogpt
●See a demo
○gpt.h2o.ai
○?????? Hugging Face Spaces
What is it?
●Commercially usable code, data, and models
●Prompt engineering - ability to prepare open-source
datasets for tuning LLMs
●Tuning: Code for fine-tuning large language models
(currently up to 20B parameters) on commodity hardware
and enterprise GPU servers (single or multi node)
Optimizations
■LoRA (low-rank approximation)
■4-bit and 8-bit quantization for memory-efficient
fine-tuning and generation.
●Deployable: Chatbot with UI and Python API
●Evaluation: LLM performance evaluation
The world’s best open source GPT
H2O.ai Confidential
https://gpt-gm.h2o.ai/
https://gpt.h2o.ai/
Demo of h2oGPT!
H2O.ai Confidential
https://gpt-gm.h2o.ai/
https://gpt.h2o.ai/
Demo of h2oGPT!
Disclaimer:
subject to modification
and updates
H2O.ai Confidential
H2O.ai Confidential
Table of Contents
1.Metrics for Evaluating LLM Performance
2.Common Datasets and Evaluation Challenges
3.Comparison of Different LLM Architectures
4.Intro to H2O LLM Eval
Our exploration will encompass fundamental concepts,
like:
- key metrics
- different architectures
- challenges inherent in developing state-of-the-art
natural language models.
Overview
- For evaluating free-form text generation, A/B testing
and Elo scoring are key.
- A/B testing compares responses from different models
blindly to determine superiority.
- Elo scoring treats A/B comparisons as matches,
providing deeper insights into model performance.
Free-form text generation
- GPT-4-0613 is responsible for evaluating all A/B tests.
- To maintain fairness, the sequence of all potential
games is shuffled, and the positions of Models A and B
are randomized before evaluation by GPT-4.
- 1000 bootstrap rounds are conducted, and the median
Elo score from these rounds determines the final score
for each model.
Fair Evaluation Protocol for A/B
Testing with GPT-4
H2O.ai Confidential
Concerns of Data Contamination
- Many evaluations are found online, causing concerns
about data contamination and biased results.
- Models with a large number of parameters may
struggle with handling multiple tasks effectively.
★Code Proficiency
★Commonsense Reasoning
★World Knowledge
★Reading Comprehension
★Math Skills
★Big Bench Assessment
★Safety Metrics
A Look at LLAMa-2's Testing Framework
- Researchers used human-preference reward
modeling for optimal model selection in
reinforcement learning.
- Thorough human evaluations assessed
helpfulness and safety.
- Contamination analysis identified potential
data leaks.
Optimal Model Selection in
Reinforcement Learning
Comprehensive Evaluation of LLMa-2
❖ Academic Testing:
Rigorous evaluations across various academic benchmarks, including code proficiency,
commonsense reasoning, world knowledge, reading comprehension, and mathematical
skills.
❖ Safety Metrics:
Dedicated safety benchmarks to assess truthfulness, toxicity, and bias mitigation.
❖ Automated Metrics:
Objective measurements to gauge performance and efficiency.
❖ Human Ratings:
Expert human evaluators provided valuable insights.
❖ Contamination Checks:
Ensuring model integrity by detecting and addressing potential issues.
Quantitative Metrics
●Perplexity, accuracy,
BLEU, and ROUGE
scores.
●Useful for tasks with
ground truth criteria
Evaluating Large Language Models (LLMs)
Comprehensive Frameworks
●HELM and MT-Bench.
●Widely used for
benchmarking LLMs.
Free-Form Text Generation
●Automated metrics may
be ineffective.
●A/B testing and Elo
scoring
Comparing LLM Architectures
➢ Tokens Trained On:
Language capacity.
➢ GPU Power: Training
efficiency.
➢ Parameters: Model
complexity.
H2O.ai Confidential
H2O LLM Eval Studio
❏Unified Platform
Comprehensive suite of tools for evaluating and understanding
LLMs.
❏Assess Performance
Evaluate models across dimensions: natural language inference,
question-answering, and text generation.
❏Model Comparison
Compare with state-of-the-art LLMs and benchmarks.
Gain insights into strengths and weaknesses.
H2O.ai Confidential
Fine-tuning
Refining pre-trained
models using
task-specific data,
enhancing their
performance on
targeted tasks.
Foundation
Powerful language
models trained on
extensive text data,
forming the basis for
various language
tasks.
Explore H2O EvalGPT
01 03
Eval LLMs
Thoroughly assessing
and comparing LLMs
is increasingly vital
due to their
heightened
significance and
complexity.
04
05
04
03
02
01
DataPrep
Converting
documents into
instruction pairs, like
QA pairs, facilitating
fine-tuning and
tasks.
02
Database
Effectively utilize
company data with a
database that
seamlessly
integrates new PDFs,
eliminating the need
for model retraining.
05
Applications
Elevate interactions
with advanced
language
comprehension and
LLM-driven response
generation for
enriched user
experiences.
06