GPT-4: A Glimpse into GPT-4 and Let's Demystify

YanXu646657 205 views 38 slides May 17, 2024
Slide 1
Slide 1 of 38
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38

About This Presentation

Introduce GPT-4, a stronger GPT over GPT-3.5 with multi-modal capabilities


Slide Content

D emystify GPT-4 Yan Xu Houston Machine Learning LLM Reading Group Feb 2, 2024

GPT-4: The UnKnown

From Transformer to GPT-4 Training language models to follow instructions with human feedback (GPT-3.5/ InstructGPT ) – over 350B parameters ChatGPT Release Large-scale Multimodal model with better post-training alignment ( GPT-4) – est. 1.5T parameters 06/2017 02/2019 05/2020 03/2022 03/2023 11/2022 Attention Is All You Need 06/2018 Pre-train and Fine-tune Zero-shot In-context few-shot Human Alignment Transformer Architecture Improved Performance & Multi-modal input

D emystify GPT-4

D emystify GPT-4

GPT-4: Mixture of Experts? Easier to train and maintain 220B

Mixture of Experts Division of a task into subtasks (Language, Math, Coding, Science etc.). Develop an expert for each subtask. Use a gating model to decide which expert to use. Pool predictions and gating model output to make a prediction.

Predictable Scaling for efficient model tuning A large focus of the GPT-4 project was building a deep learning stack that scales predictably. The primary reason is that for very large training runs like GPT-4, it is not feasible to do extensive model-specific tuning. To address this, we developed infrastructure and optimization methods that have very predictable behavior across multiple scales. These improvements allowed us to reliably predict some aspects of the performance of GPT-4 from smaller models trained using 1, 000× – 10, 000× less compute.

Predictable Scaling for efficient model tuning

Inverse Scaling

D emystify GPT-4

Capabilities on Academic and Professional Exams USA Biology Olympiad

American Mathematics Competitions Capabilities on Academic and Professional Exams

Capabilities on Academic and Professional Exams

Impact of Data Contamination Overall across most exams, both contamination and vision have relatively little effect.

Impact of RLHF Comparison between GPT-4 base and GPT-4 post-RLHF on exam benchmarks. Averaged across all exams, the base model achieves an average score of 73.7% while the RLHF model achieves an average score of 74.0%, which suggests that post-training does not substantially alter base model capability .

Capabilities on Benchmarks

GPT-4 Multi-lingual Capability

D emystify GPT-4

GPT-4 Visual Inputs

GPT-4 Visual Inputs

GPT-4 Visual Inputs

GPT-4 Visual Inputs: How? No official publication about it. Flamingo visual language model by DeepMind in 2022 may inspire and lead to GPT-4. https:// slideslive.com /38991242/flamingo-a-visual-language-model-for- fewshot -learning

Flamingo: A visual language model for few-shot learning

Flamingo Architecture Overview

Flamingo Architecture Overview

Flamingo Architecture: Gated XATTN-DENSE layers

D emystify GPT-4

Model Mitigations We used a combination of dataset interventions and interventions after pre-training to mitigate harms at the model level. At the pre-training stage, we filtered our dataset mix for GPT-4 to specifically reduce the quantity of inappropriate erotic text content. We did this via a combination of internally trained classifiers and a lexicon-based approach to identify documents that were flagged as having a high likelihood of containing inappropriate erotic content After the pre-training stage, our primary method for shaping GPT-4-launch behavior was RLHF To steer our models at a more fine-grained level, we relied heavily on our models themselves as tools. One of our main tools for steering the model towards appropriate refusals is rule-based reward models (RBRMs).

Rule-based reward models Our rule-based reward models (RBRMs) are a set of zero-shot GPT-4 classifiers. The RBRM takes three inputs: the prompt (optional), the output from the policy model, and a human-written rubric (e.g., a set of rules in multiple-choice style) for how this output should be evaluated. For example, we can provide a rubric that instructs the model to classify a response as one of: a refusal in the desired style a refusal in the undesired style (e.g., evasive) containing disallowed content a safe non-refusal response. Then, on a subset of prompts that we know request harmful content such as illicit advice, we can reward GPT-4 for refusing these requests. Conversely, we can reward GPT-4 for not refusing requests on a subset of known-safe prompts.

Improvements on Safety Metrics

Examples

D emystify GPT-4

Limitations Factuality: The capacity to generate content aligned with factual information, with related issues such as Hallucinations Outdated info Domain specificity

Limitations

Limitations

Conclusion We characterize GPT-4, a large multimodal model with human-level performance on certain difficult professional and academic benchmarks. GPT-4 outperforms existing large language models on a collection of NLP tasks, and exceeds the vast majority of reported state-of-the-art systems (which often include task-specific fine-tuning). We find that improved capabilities, whilst usually measured in English, can be demonstrated in many different languages. We highlight how predictable scaling allowed us to make accurate predictions on the loss and capabilities of GPT-4. GPT-4 presents new risks due to increased capability, and we discuss some of the methods and results taken to understand and improve its safety and alignment. Though there remains much work to be done, GPT-4 represents a significant step towards broadly useful and safely deployed AI systems.

D emystify GPT-4