Fundamentos Modernos de la IA Conclusiones

AliArafatLemusMonter 0 views 59 slides Oct 02, 2025
Slide 1
Slide 1 of 59
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59

About This Presentation

Presentacion final del curso Fundamentos Modernos de la IA, aca se dan las conclusiones y como funcionan los LLMs


Slide Content

CS 188: Artificial Intelligence
Instructors: Cameron Allen and Michael Cohen --- University of California, Berkeley
[These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]

Special Thanks
Ketrina Yim
CS188 Artist

Today’s AI

Large Language Models
§Feature engineering
§Text tokenization
§Word embeddings§Deep neural networks
§Autoregressive models
§Self-attention mechanisms
§Transformer architecture
§Multi-class classification
§Supervised learning
§Self-supervised learning
§Instruction tuning§Reinforcement learning
§… from human feedback (RLHF)
§Policy search
§Policy gradient methods
§Beam search

Deep Neural Networks
§Input: some text
§“The dog chased the”
§Output: more text
§ … “ ball”
§Implementation:
§Linear algebra
§How??

Text Tokenization
https://platform.openai.com/tokenizer

Text Tokenization
https://platform.openai.com/tokenizer

Text Tokenization
https://platform.openai.com/tokenizer

Word Embeddings
§Input: some text
§“The”
§“ dog”
§“ chased”
§“ the”
§Output: more text
§“ ball”
un-embed
embed
embed
embed
embed
tokenize
tokenize
tokenize
tokenize
un-tokenize
[791]
[5679]
[62920]
[279]
[5041]
predict
one-hot

What do word embeddings look like?
§Words cluster by similarity:
ig.ft.com/generative-ai

What do word embeddings look like?
§Features learned in language models:
ig.ft.com/generative-ai

What do word embeddings look like?
§Signs of sensible algebra in embedding space:
[
Efficient estimation of word representations in vector space, Mikolov et al, 2013]

Aside: interactive explainer of modern language models
ig.ft.com/generative-ai

Large Language Models
§Feature engineering
§Text tokenization
§Word embeddings§Deep neural networks
§Autoregressive models
§Self-attention mechanisms
§Transformer architectures
§Multi-class classification
§Supervised learning
§Self-supervised learning
§Instruction tuning§Reinforcement learning
§… from human feedback (RLHF)
§Policy search
§Policy gradient methods
§Beam search

Autoregressive Models
predict
“The”
(pad)
(pad)
(pad)
“ dog”
“The”
“ dog”
(pad)
(pad)
“ chased”
“The”
“ dog”
“ chased”
(pad)
“ the”
“The”
“ dog”
“ chased”
“ the”
“ ball”

Autoregressive Models
§Predict output one piece at a time (e.g. word, token, pixel, etc.)
§Concatenate: input + output
§Feed result back in as new input
§Repeat
E F G H M

Self-Attention Mechanisms

Self-Attention Mechanisms
§Instead of conditioning on
all input tokens equally…
§Pay more attention to
relevant tokens!
￿ ?????? ?????? ?????? L??????
E ?????? ?????? H LI

Self-Attention Mechanisms
ig.ft.com/generative-ai

MLPMLP MLP 2 2 2
2
2
Implementing Self-Attention
1
MLP MLP MLP 1 1 1 MLP MLP MLP 3 3 3
3
31
21 3

MLPMLP MLP 2 2 2
2
2
Implementing Self-Attention
1
MLP MLP MLP 1 1 1 MLP MLP MLP 3 3 3
3
31
21 3
2

MLPMLP MLP 2 2 2
2
2
Implementing Self-Attention
1
MLP MLP MLP 1 1 1 MLP MLP MLP 3 3 3
3
31
21 3
3

MLPMLP MLP 2 2 2
2
2
Implementing Self-Attention
1
MLP MLP MLP 1 1 1 MLP MLP MLP 3 3 3
3
31
21 3
4

Multi-Headed Attention

Multi-Headed Attention
1 2 3
MLP
2 3
MLP
1
MLP
1 2 3
MLP
MLP
Single-headed Multi-headed

Multi-Headed Attention
Head 6: previous word
https://github.com/jessevig/bertviz

Multi-Headed Attention
https://github.com/jessevig/bertviz
Head 4: pronoun references

Transformer Architecture

Transformer Architecture
MLP
LayerNorm
LayerNorm
Multi-Headed
Attention
Transformer
Block
=
Transformer
Block
Transformer
Block
Transformer
Block

Transformer Architecture
Transformer
Block
Tokenize
Embed
Un-embed
Un-tokenize
“The dog chased the”
“ ball”
x

Large Language Models
§Feature engineering
§Text tokenization
§Word embeddings§Deep neural networks
§Autoregressive models
§Self-attention mechanisms
§Transformer architectures
§Multi-class classification
§Supervised learning
§Self-supervised learning
§Instruction tuning§Reinforcement learning
§… from human feedback (RLHF)
§Policy search
§Policy gradient methods
§Beam search

§Do we always need human supervision to learn features?
§Can’t we learn general-purpose features?
§Key hypothesis:
§IF neural network smart enough to predict:
§Next frame in video
§Next word in sentence
§Generate realistic images
§``Translate’’ images
§…
§THEN same neural network is ready to do Supervised Learning from a
very small data-set
Unsupervised / Self-Supervised Learning
Task 1
Task 2

Transfer from Unsupervised Learning
…in
Task 1 = unsupervised
Task 2 = real task

Example Setting
…text
Task 1 = predict next word
Task 2 = predict sentiment

Image Pre-Training: Predict Missing Patch

§Pre-Train: train a large model with a lot of data on a self-
supervised task
§Predict next word / patch of image
§Predict missing word / patch of image
§Predict if two images are related (contrastive learning)§Fine-Tune: continue training the same model on task you care
about
Pre-Training and Fine-Tuning
1
2

Instruction Tuning
§ (learns to mimic human-written text)
§Query:
“What is population of Berkeley?”
§Human-like completion:
“This question always fascinated me!”
§
§Query:
“What is population of Berkeley?”
§Helpful completion:
“It is 117,145 as of 2021 census.”
§Fine-tune on collected examples of helpful human conversations
§Also can use Reinforcement Learning
Task 1 = predict next word
Task 2 = generate helpful text

Reinforcement Learning from Human Feedback
§MDP:
§State: sequence of words seen so far (ex.
“What is population of Berkeley? ”
)
§100,000
1,000
possible states
§Huge, but can be processed with feature vectors or neural networks
§Action: next word (ex.
“It”, “chair”, “purple” ,
…) (so 100,000 actions)
§Hard to compute

when is over 100K actions!
§Transition T: easy, just append action word to state words
§s:
“My name“
a:
“is“
s’:
“My name is“
§Reward R: ???
§Humans rate model completions (ex.
“What is population of Berkeley? ”
)
§“It is 117,145“: +1 “It is 5“: -1 “Destroy all humans“: -1
§Learn a reward model and use that (model-based RL)
§Commonly use policy search (
Proximal Policy Optimization
) but looking into Q Learning

Large Language Models
§Feature engineering
§Text tokenization
§Word embeddings§Deep neural networks
§Autoregressive models
§Self-attention mechanisms
§Transformer architectures
§Multi-class classification
§Supervised learning
§Self-supervised learning
§Instruction tuning§Reinforcement learning
§… from human feedback (RLHF)
§Policy search
§Policy gradient methods
§Beam search

Policy Search

Policy Gradient Methods
1.Initialize policy somehow
2.Estimate policy performance:
3.Improve policy:
§Hill climbing
§Change , evaluate new policy, keep if better
§Gradient ascent
§Estimate , change to ascend gradient: 4.Repeat

Estimating the Policy Gradient
§Define the advantage function:
§Note that expected TD error equals expected advantage:
§
§Policy Gradient Theorem:
§Let denote a trajectory from an arbitrary episode
§§Estimate :
§

Large Language Models
§Feature engineering
§Text tokenization
§Word embeddings§Deep neural networks
§Autoregressive models
§Self-attention mechanisms
§Transformer architectures
§Multi-class classification
§Supervised learning
§Self-supervised learning
§Instruction tuning§Reinforcement learning
§… from human feedback (RLHF)
§Policy search
§Policy gradient methods
§Beam search

Beam Search
7
9
7
8
10
9
8
t=0
9
t=1
8
t=2
8
t=3
9
t=4
10
t=5
9
7
6
8
8
9
3
t=6 t=7 t=8
6
7
7
7
3
9
9
t=9 t=10 t=11
9
9
Random restarts

Beam Search
7
9
8
7
9
9
6
7
7
9
10
3
8
10
5
9
9
7
9
6
9
8
7
7
9
7
8
9
10
3
3
9
8
8
7
6
t=0
t=0
t=0
t=0
9
9
8
7
t=1
t=1
t=1
t=1
10
9
9
8
t=2
t=2
t=2
t=2
8
8
7
6
t=0
t=0
t=0
t=0
9
9
9
8
t=1
t=1
t=1
t=1
10
10
9
9
t=2
t=2
t=2
t=2
Parallel search Beam search

Beam Search
ig.ft.com/generative-ai

Large Language Models
§Feature engineering
§Text tokenization
§Word embeddings§Deep neural networks
§Autoregressive models
§Self-attention mechanisms
§Transformer architectures
§Multi-class classification
§Supervised learning
§Self-supervised learning
§Instruction tuning§Reinforcement learning
§… from human feedback (RLHF)
§Policy search
§Policy gradient methods
§Beam search

Language models build a structured concept space

Can other data (images/audio/…) be put in this space?

Can we build a single model of all data types?

Can we build a single model of all data types?
[PaLM-E, Driess et al, 2023]

Tracking Progress
[OpenAI]
§How well AI can
do human tasks

Forecasting Progress
§Scaling Laws extrapolate:
§If we [make model bigger / add more data / …]
§What would accuracy become?
compute: data:
[Brown et al, 2020] [Hernandez et al, 2021]

Forecasting Progress
§Scaling Laws extrapolate:
§If we [make model bigger / add more data / …]
§What would accuracy become?
[Brown et al, 2020]
§But some capabilities emerge
unexpectedly

§You get to determine that!
§As researchers / developers
§As auditors and regulators
§As informed public voices
§As you apply AI
What will be AI’s impact in the future?

Where to go next?

Where to go next?
§Congratulations, you’ve seen the basics of modern AI
§… and done some amazing work putting it to use!§How to continue:
§Machine learning: cs189, cs182, stat154, ind. eng. 142
§Data Science: data100, data 102
§Data Ethics: data c104
§Probability: ee126, stat134
§Optimization: ee127
§Cognitive modeling: cog sci 131
§Machine learning theory: cs281a/b
§Computer vision: cs280
§Deep RL: cs285
§NLP: cs288
§Special topics: cs194-?
§… and more; ask if you’re interested