ML Visuals.pptx

LInk91 91 views 62 slides Nov 09, 2022
Slide 1
Slide 1 of 62
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62

About This Presentation

machine learning PPT Template


Slide Content

ML Visuals By dair.ai https://github.com/dair-ai/ml-visuals

Basic ML Visuals

Softmax Convolve Sharpen

Softmax Convolve Sharpen

Positional Encoding Masked Multi-Head Attention Add & Norm Output Embedding Multi-Head Attention Add & Norm Outputs(shifted right) Positional Encoding Multi-Head Attention Add & Norm Input Embedding Feed Forward Add & Norm Inputs Feed Forward Add & Norm Linear Softmax

Multi-Head Attention Add & Norm Input Embedding Output Embedding Feed Forward Add & Norm Masked Multi-Head Attention Add & Norm Multi-Head Attention Add & Norm Feed Forward Add & Norm Linear Softmax Inputs Outputs (shifted right) Positional Encoding Positional Encoding

Tokenize I love coding and writing “I love coding and writing”

Input Layer Hidden Layers Output Layer X = A [0] a [4] A [1] A [3] X Ŷ a [1] n a [2] 1 a [2] 2 a [2] 3 a [2] n a [3] 1 a [3] 2 a [3] 3 a [3] n A [2] A [4]

Input Layer Hidden Layers Output Layer X = A [0] a [4] A [1] A [3] X Ŷ a [1] 1 a [1] 2 a [1] 3 a [1] n a [2] 1 a [2] 2 a [2] 3 a [2] n a [3] 1 a [3] 2 a [3] 3 a [3] n A [2] A [4]

Input Layer Hidden Layers Output Layer X = A [0] a [4] A [1] A [3] X Ŷ [1 a ] 1 a [1] 2 a [1] 3 a [1] n a [2] 1 a [2] 2 a [2] 3 a [2] n a [3] 1 a [3] 2 a [3] 3 a [3] n A [2] A [4]

NxNx3 +b1 +b2 MxM MxM +b1 +b2 ReLU ReLU a [l] MxMX2 a [l-1] CONV operation

NxNx3 +b1 +b2 MxM MxM +b1 +b2 ReLU ReLU MxMX2 CONV operation

NxNx3 +b1 +b2 MxM MxM +b1 +b2 ReLU ReLU MxMX2 CONV operation

Abstract backgrounds

DAIR.AI

Gradient Backgrounds

Community Contributions

S=1 S=2 Striding in CONV

NxNx192 NxNx64 NxNx32 NxNx128 NxNx192 1x1 Same 3x3 Same 5x5 Same MaxPool Same s=1 Inception Module

Retraining w/o expansion t-1 t

No- Retraining w/ expansion Partial Retraining w/ expansion

No-Retraining w/ expansion Partial Retraining w/ expansion t-1 t t t-1

No-Retraining expansion t Partial Retraining expansion t t-1 Retraining expansion t-1 t t-1

Size #bed ZIP Wealth Family? Walk? School PRICE ŷ X Y X Ŷ = 0 Ŷ = 1 How does NN work (Insprired from Coursera) Logistic Regression Basic Neuron Model

Size $ Size $ Linear regression ReLU(x)

I V 128*128*1 128*128*1 CONV 1 CONV 2 CONV 4 CONV 3 CONV 5 CONV 6 CONV 7 I1 128*128*1 ENcoder De coder V1 E ncoder Decoder 128*128*1 Train ing

Large NN Med NN Small NN SVM,LR etc η Amount of Data Why does Deep learning work?

a [1] 1 a [1] 2 a [1] 3 Input Hidden Output X = A [0] a [1] 4 a [2] A [1] A [2] X Ŷ One hidden layer neural network

a [1] 1 a [1] 2 x [1] a [2] x [2] x [2] x [3] x [1] Neural network templates

a [1] 1 a [1] 2 x [1] a [2] x [2] x [2] x [3] x [1] Neural network templates

Train Valid Test x1 x2 x1 x2 x1 x2 Train-Dev-Test vs. Model fitting Underfitting Good fit Overfitting

x [2] x [3] x [1] a [L] x1 x2 r= 1 x1 x2 DropOut Normalization

w1 w1 w2 J w1 w2 J w1 w2 w2 Before Normalization After Normalization Early stopping Dev Train Err it.

x 1 x 2 w[1] w[2] w[L-2] w[L-1] w[L] FN TN TP FP Deep neural networks Understanding Precision & Recall

w1 w2 SGD B GD w1 w2 SGD Batch vs. Mini-batch Gradient Descent B atch Gradient Descent vs. SGD

x [2] x [3] x [1] p [1] p [2] Softmax Prediction with 2 outputs

Miscellaneous

3 64 16 16 32 32 64 128 128 256 256 128+256 128 1 64+128 64 32+64 32 16+32 16 16 Convolution 3x3 Max Pooling 2x2 Convolution 1x1 Skip connection Up Sampling 2x2 Block copied Dropout 0.1 Dropout 0.2 Dropout 0.3

Conv3-32 Conv3-32 Conv3-32 Max-Pool Conv3-32 Conv3-128 Max-Pool Conv3-64 Conv3-64 Max-Pool Input Conv Conv Max-Pool Max-Pool FC Layer1 Softmax FC Layer2 Layer3 Layer4 Layer1 Layer2 Layer3 Layer4 Input Conv3-32 Conv3-32 Conv3-32 Max-Pool Conv3-32 Conv3-128 Conv3-64 Conv3-64 Max-Pool Layer1 Layer2 Layer3 FC-512 Output Max-Pool FC-512 Output

Previous layer 1x1 convolutions 1x1 convolutions 3x3 convolutions 1x1 convolutions 5x5 convolutions 3x3 max pooling 1x1 convolutions Filter concatenation

Previous layer 1 x3 conv, 1 padding 1x5 conv, 2 padding 1x3 conv, 1 padding 1 x7 conv, 3 padding Filter concatenation 1x3 conv, 1 padding 1x3 conv, 1 padding

Input Conv Max-Pool Max-Pool Max-Pool Inception Inception Max-Pool Conv Max-Pool Conv Inception Inception Inception Inception Inception Inception Inception Avg-Pool Conv FC FC Softmax Avg-Pool Conv FC Softmax Avg-Pool Conv FC FC Softmax Auxiliary Classifier Auxiliary Classifier Input ConvTranspose2d Max-Pool Conv Max-Pool

Previous layer 1x1 conv. 1x1 conv. 3x3 conv. 1x1 conv. 3x3 conv. Pool 1x1 conv. Filter concatenation 3x3 conv. Previous layer 1x1 conv. 1x1 conv. 1x1 conv. 3x3 conv. Pool 1x1 conv. Filter concatenation 1x3 conv. 3x1 conv. 1x3 conv. 3x1 conv. ( a ) ( b )

R1 R2 R3 R1 R2 R3 R1 R1 R1 R2 Stacked layers Previous input x F(x) y=F(x) Stacked layers Previous input x F(x) y=F(x)+x x identity +

Input Conv Avg-Pool Dense Block 2 Dense Block 3 Conv Avg-Pool Conv Dense Block 1 Avg-Pool FC Softmax Transition layers

3x3 conv ( a ) add identity 3x3 conv 5x5 conv 3x3 avg identity 3x3 avg 3x3 avg 3x3 conv 5x5 conv add add add add Filter concatenation h i h i-1 ... h i+1 h i h i-1 ... 7x7 conv 5x5 conv 7x7 conv 3x3 max 5x5 conv 3x3 avg add add add identity 3x3 avg 3x3 max 3x3 conv add add Filter concatenation h i+1 ( b )

14*14 28*28 56*56 112*112 224*224 =

1 1 2 4 5 6 7 8 3 2 1 1 2 3 4 6 8 3 4 Max(1,1,5,6) = 6 Image Representation Y X Pooling performed with a 2x2 kernel and a stride of 2
Tags