“Diagnosing Problems and Implementing Solutions for Deep Neural Network Training,” a Presentation from Sensor Cortek

embeddedvision 33 views 67 slides Oct 17, 2024
Slide 1
Slide 1 of 67
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67

About This Presentation

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/10/diagnosing-problems-and-implementing-solutions-for-deep-neural-network-training-a-presentation-from-sensor-cortek/

Fahed Hassanat, COO and Head of Engineering at Sensor Cortek, presents the “Deep Neural ...


Slide Content

Deep Neural Network Training:
Diagnosing Problems and
Implementing Solutions
Fahed Hassanat
COO / Head of Engineering
Sensor Cortek

2© 2024 Sensor Cortek
Training DNNs

•Find an acceptable relationship
between inputs and outputs
based on patterns found in
historical data.
The Goal
3© 2024 Sensor Cortek
X (independent variable)
Y
(dependent variable)

•The process of minimizing the
difference between the
produced output and the correct
output (ground truth).
•Uses mathematical techniques
to minimize the error.
•Stop when the results are
acceptable or can no longer
learn.
The Learning Process
4© 2024 Sensor Cortek
Y (independent variable)
X
(dependent variable)

Example Forward Calculation
5© 2024 Sensor Cortek
Activation (Sigmoid: 1/(1+e
-x
)) = 1/(1+e
-0.4
)
SUM = (0.7x0.2)+(0.3x0.9)
(SUM)
(Activation)
0.6

An Educated Network
6© 2024 Sensor Cortek

•A labeled collection of data.
•Variations of inputs that produce
same output.
•The larger and more diverse the
better.
Training Data
7© 2024 Sensor Cortek

Forward Pass & Backward Pass
8© 2024 Sensor Cortek

9© 2024 Sensor Cortek
Components of Training DNNs

•Framework: Manages the data flow and execution of the training
process.
•DNN model: An architecture that serves a certain purpose.
•Hyperparameters: Controls the training process.
•Training data: Labeled dataset for training the model.
Components of Training DNNs
10© 2024 Sensor Cortek

•Keras:
•Great for prototyping. High API level and high
readability.
•TensorFlow:
•High performance. Suited for large datasets.
•Pytorch:
•High performance. Excellent debugging.
Framework
11© 2024 Sensor Cortek

Architecture Purpose
Multi-layer perceptrons Image classification, natural language processing, and regression
Convolutional neural
networks (CNN)
Image classification, object detection, and segmentation
Recurrent neural networks
(RNN)
Time series prediction and speech recognition
Transformer networks Natural language processing and computer vision
Generative adversarial
networks (GAN)
Synthetic data generation
DNN Model
12© 2024 Sensor Cortek

•Hyperparameters are parameters that cannot be learned during the
training process and are set prior to the training process.
•Some components of the model architecture can be considered
hyperparameters.
•Model designers can create a hyperparameter that controls multiple
hyperparameters in a certain fashion.
What is a Hyperparameter
13© 2024 Sensor Cortek

Non-architecture-based Architecture-based
Learning rate Number of layers
Number of epochs Number of neurons
Batch size Activation function
Dropout rate
Weight initialization
Regularization parameters
Optimizer
Hyperparameters
14© 2024 Sensor Cortek

•A value that controls the amount by which the network weights are
updated.
•Usually between 0.0 and 1.0.
•Too large or too small affects the convergence of the model.
Hyperparameters: Learning Rate
15© 2024 Sensor Cortek

•The number of times the dataset is passed through the network.
•Too small and the model does not converge.
•Too large and the model overfits.
Hyperparameters: Number of Epochs
16© 2024 Sensor Cortek

•The number of data samples used in each iteration of the optimization
algorithm.
•Too large causes the model not to generalize well.
•Too small can prevent the model from converging.
Hyperparameters: Batch Size
17© 2024 Sensor Cortek

•Collection of data with corresponding labels.
•Data is used as input to the model and labels
adjust and correct the output.
•Dataset is divided into training/validation/testing
sets with the commonly used ratios of 60/20/20
respectively.
Dataset
18© 2024 Sensor Cortek

Sufficient data The dataset must contain enough data to reflect the targeted population.
Balanced data In multiclass problems, classes must have balanced contributions to the dataset.
Relevant data The data must represent the targeted population and population environment.
Proper labeling Labels must be accurate, and consistent.
Data diversity The diversity of the data should reflect the diversity of the targeted population.
High SNR The data should have little to no noise that cause ambiguity.
Attributes of a Good Dataset
19© 2024 Sensor Cortek

20© 2024 Sensor Cortek
Training Metrics

True Positive (TP)The model predicts True, and the label’s True.
False Positive (FP)The model predicts True, and the label’s False.
True Negative (TN)The model predicts False, and the label’s False.
False Negative (FN)The model predicts False, and the label’s True.
Prediction Outcomes
21© 2024 Sensor Cortek

•Used in object detection.
•Calculates overlap in bounding
boxes.
•Determines how close a prediction is
to the ground truth.
•Ranges from 0 to 1.
•A typical value for qualifying a
prediction as TPis 0.5.
Intersection over Union
22© 2024 Sensor Cortek
IoU= 0
IoU= 0.5
IoU= 1

•Is a measure of how far the
predictions are from the actual
values.
•It is the output of the loss
function during training.
Loss
23© 2024 Sensor Cortek
Loss
epoch

•It is the number of good predictions out of all predictions.
•Can be calculated as:
????????????+????????????
????????????+????????????+????????????+????????????
•Scenario: A model to classify emails as "spam" or "not spam."
•Example: If 1,000 emails are classified, and the model correctly identifies 950 emails
(including both spam and not spam), the accuracy of the model is 95%. This metric
shows how often the model is correct across both classes but doesn't detail its
performance on each class.
Accuracy
24© 2024 Sensor Cortek

•The number of correctly predicted labels out of all True predictions.
•Can be calculated as:
????????????
????????????+????????????
•Scenario: A facial recognition system used at an airport to identify
individuals on a watchlist.
•Example: Out of 100 alerts generated by the system, 90 are correct identifications of
individuals on the watchlist (True Positives), and 10 are false alarms (False Positives).
The precision of the system is 90%, indicating how reliable the alerts are when the
system identifies someone as being on the watchlist.
Precision
25© 2024 Sensor Cortek

Recall
26© 2024 Sensor Cortek
•The number of correctly predicted labels out of all True labels in the
data.
•Can be calculated as:
????????????
????????????+????????????
•Scenario: A medical diagnostic tool that predicts whether patient scans
indicate the presence of a specific disease.
•Example: There are 100 patients with the disease, but the tool only identifies 80 of
them. The recall of the tool is 80%, reflecting its ability to find all relevant cases
(True Positives) within the dataset. This is crucial in medical scenarios where missing
a positive case (False Negative) can be detrimental.

•It is the harmonic mean of Precisionand Recall.
•Gives a global picture of the performance.
•Can be calculated as: 2 x
??????�??????????????????�??????��??????????????????????????????????????????
??????�??????????????????�??????��+????????????????????????????????????
•Scenario: A content moderation system for a social media platform that flags posts as
"appropriate" or "inappropriate."
•Example: The system needs to balance the precision (not to incorrectly flag too many
posts as inappropriate) and recall (not to miss too many inappropriate posts).
•If the precision is 75% and the recall is 60%, the F1-Score would be approximately 67%
•F1 provides a single measure to assess the balance between precision and recall,
especially when both are equally important.
F1-Score
27© 2024 Sensor Cortek

•A window into the performance
of the model on one or more
classes.
•Can be used to calculate
Accuracy, Precision and Recall.
TP FP
FN TN
Confusion Matrix
28© 2024 Sensor Cortek
Actual Values
Predicted Values
P N
P
N

A B C D
A 93 1 5 13
B 4 89 5 3
C 1 7 88 5
D 2 1 2 79
Example: Multi-class Confusion Matrix
29© 2024 Sensor Cortek
Predicted Values
Actual Values

30© 2024 Sensor Cortek
Problems with Training DNNs

•Overfitting
•Underfitting
•Dataset imbalance
•Inefficient learning
Common Problems
31© 2024 Sensor Cortek
•Parameter initialization
•Gradient masking
•Vanishing gradient
•Exploding gradient

Overfitting
32© 2024 Sensor Cortek
Problem Solution
When a network learns the training
data too well, it can fail to generalize
to new data
Larger dataset: either add more labeled data or use
data augmentation to artificially increase the size and
variation of the dataset
Less complex model: remove layers or reduce the
width of the layers
Early stoppage / Saving checkpoints
Apply dropout
Apply regularization

Underfitting
33© 2024 Sensor Cortek
Problem Solution
When a network is unable to capture
the complexity of the data, it can fail
to fit the training data well enough
Examine the dataset: bad or missed labeling, as well as
lack of class representation can cause underfitting
Increase the number of epochs: not training the model
enough causes it to fail to grasp the essential pattern in
the data
More complex model: add layers, increase the width of
the layers
New architecture

Data Imbalance
34© 2024 Sensor Cortek
Problem Solution
If the dataset used to train the
network is not representative of the
target population, the network may
fail to generalize well to new data
Add more data: either add more labeled data to fix the
imbalance or use data augmentation to artificially
achieve the same effect
Transfer learning and fine-tuning: use pretrained
model weights for training on the new data.
The pretrained model would have been trained on a
more balanced dataset.

Inefficient Learning
35© 2024 Sensor Cortek
Problem Solution
When the learning process becomes
slow or stalls out during training, or
when the model is unable to optimize
the objective function
Investigate learning rate:a specific learning rate can
cause the training to be stuck at a local minima
Use momentum:accelerates convergence of the model
New architecture

36© 2024 Sensor Cortek
Detecting Training Problems

Loss Curve
37© 2024 Sensor Cortek
Ideal
•Bad data
•Model too simple
•Large learning rate
•Bad data
•Check activation functions
Loss
epoch
Loss
epoch
Loss
epoch

•The loss on the validation set will
start to diverge.
•Happens when the model
memorizes the training data.
•Happens when the epoch
number is too large.
•Happens when the dataset is too
small.
Overfitting Indicators
38© 2024 Sensor Cortek
Loss
epoch
Train
Valid

•Indicated by high and noisy loss
values.
•The model is not able to learn
from the training data.
•Happens due to bad data, small
dataset or a small/low
complexity model.
Underfitting Indicators
39© 2024 Sensor Cortek
Loss
epoch

•High train accuracy, low
validation accuracy: this is a sign
of overfitting the model.
•Low train accuracy, high
validation accuracy: this is a sign
of imbalance between train and
validation data.
Accuracy Indicators
40© 2024 Sensor Cortek
epoch
Train Acc.
Valid Acc.
0.0
1.0

•Due to unacceptable number of
incorrectly classified instances.
•Largely depends on the application.
Low Accuracy
41© 2024 Sensor Cortek
epoch
Accuracy
0.0
1.0

•Due to unacceptable number of
false positives.
•The model misclassifies
unacceptable negative instances
as positive.
Low Precision
42© 2024 Sensor Cortek
epoch
Precision
0.0
1.0

•Due to unacceptable number of false
negatives.
•The model is missing unacceptable
number of positive instances.
Low Recall
43© 2024 Sensor Cortek
epoch
Recall
0.0
1.0

Possible cause Remedy
Noisy or imbalanced data
Remove noise, augment data, use class weighting, use
oversampling/undersampling to fix the imbalance of the data
Small dataset
Use pretrained models as they have already learned important
features
Simple model
Increase the number of layers, width of the layer or change to a
different architecture
Inadequate hyperparameters
Revisit the choice of hyperparameters and select appropriate
parameter values
Inadequate loss functionChange to a loss function more suitable for the task
Fixing Low Accuracy/Precision/Recall
44© 2024 Sensor Cortek

Confusion Matrix Indicators
45© 2024 Sensor Cortek
A B C D
A 100 0 0 0
B 0 100 0 0
C 0 0 100 0
D 0 0 0 100
Predicted Values
Actual ValuesMulti-class Confusion Matrix:
IDEAL

Confusion Matrix Indicators
46© 2024 Sensor Cortek
A B C D
A 23 29 36 24
B 18 17 15 34
C 38 39 30 26
D 21 15 19 16
Predicted Values
Actual ValuesMulti-class Confusion Matrix:
Underfitting,
inefficient learning,
Simple model, etc.

Confusion Matrix Indicators
47© 2024 Sensor Cortek
A B C D
A 93 1 5 13
B 4 89 5 3
C 1 7 42 39
D 2 1 46 40
Predicted Values
Actual ValuesMulti-class Confusion Matrix:
Data imbalance,
insufficient dataset

•Training DNNs is a complex process that is subject to many problems.
•With the proper understanding of the training process, one can efficiently
resolve these problems.
•Datasets, hyperparameters and model architecture are the major contributors to
problems during training.
•Precision, recall, accuracy and confusion matrices are important training
evaluation metrics that can help diagnose and guide towards successful training.
Conclusion
48© 2024 Sensor Cortek

Interpreting Loss Curves
https://developers.google.com/ma
chine-learning/testing-
debugging/metrics/interpretic
Resources
49© 2024 Sensor Cortek
2024 Embedded Vision Summit
•Join us at the Synopsysbooth, #410, to see
our technology in action.
•“Introduction to Modern Radar for Machine
Perception”, Robert Laganière, Thu, May 23,
4:15PM –4:45PM

50© 2024 Sensor Cortek
Backup Material

51© 2024 Sensor Cortek
What is a Deep Neural Network?

•Deep Neural networks are
inspired by the function of
biological neurons, the simplest
unit of the nervous system in
humans.
•“Deep” refers to the presence of
multiple layers between the
input and the output to the
network.
Inspiration
52© 2024 Sensor Cortek

The basic unit of a NN is a
perceptron:
•Inputs
•Weights
•Sum
•Activation
•Output
Anatomy of the Perceptron
53© 2024 Sensor Cortek

•Input layer
•Hidden layer(s)
•Output layer
Building a Neural Network
54© 2024 Sensor Cortek

55© 2024 Sensor Cortek
Additional Hyperparameters

•Determines the depth of the network.
•Too large results in more parameters to train and longer training.
•Too small may not capture the complex relationship between the input
and the output.
Number of Layers (Model Architecture)
56© 2024 Sensor Cortek

•Determines the width of the network.
•Too large results in more parameters to train and longer training.
•Too small (shallow) may not capture the complex relationship between
the input and the output.
Number of Neurons (Model Architecture)
57© 2024 Sensor Cortek

•Determines if the neuron is activated and should pass on the
transformed input to the next layer.
•Impacts how well the model learns
Activation Function
58© 2024 Sensor Cortek

•Controls the number of neurons that are randomly removed during
training.
•Helps the model to generalize.
•Avoids overfitting.
Dropout Rate
59© 2024 Sensor Cortek

•Controls how the weights are initialized prior to starting training.
Weight Initialization
60© 2024 Sensor Cortek

•Controls how strong the penalty term in the cost function of a model.
•Helps in controlling over/under fitting of the model.
Regularization Parameter
61© 2024 Sensor Cortek

•Controls the method used to update the weights.
•Helps minimize the loss and improve the accuracy.
Optimizer
62© 2024 Sensor Cortek

63© 2024 Sensor Cortek
Additional Training Problems

Gradient Masking
64© 2024 Sensor Cortek
•The Problem
When gradients of some
parameters vanish because of the
choice of activation function, the
parameters cannot be learned.
•The Solution
This can be addressed by using
activation functions that alleviate
the issue, such as leaky ReLUor
Swish, or by using different
initialization schemes.

Parameter Initialization
65© 2024 Sensor Cortek
•The Problem
If the weights of the network are
initialized randomly and are too
small or too large, the network
may fail to learn.
•The Solution
This can be addressed by using
initialization techniques such as
Xavier or He initialization, which
ensure that the weights are
initialized in a way that is
appropriate for the network
architecture and the activation
functions used.

Vanishing Gradient
66© 2024 Sensor Cortek
•The Problem
The gradients in the deeper layers
of the network become very small
during backpropagation, making it
difficult to update the weights of
those layers.
•The Solution
Use activation functions that don't
saturate, such as ReLUor variants,
and by using normalization
techniques such as Batch
Normalization.

Exploding Gradient
67© 2024 Sensor Cortek
•The Problem
The gradients in the deeper layers
of the network become very large
during backpropagation, making it
difficult to update the weights of
those layers.
•The Solution
using gradient clipping techniques,
which limit the magnitude of the
gradients.