“DNN Quantization: Theory to Practice,” a Presentation from AMD
embeddedvision
56 views
21 slides
Aug 20, 2024
Slide 1 of 21
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
About This Presentation
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/08/dnn-quantization-theory-to-practice-a-presentation-from-amd/
Dwith Chenna, Member of the Technical Staff and Product Engineer for AI Inference at AMD, presents the “DNN Quantization: Theory to Practice�...
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/08/dnn-quantization-theory-to-practice-a-presentation-from-amd/
Dwith Chenna, Member of the Technical Staff and Product Engineer for AI Inference at AMD, presents the “DNN Quantization: Theory to Practice” tutorial at the May 2024 Embedded Vision Summit.
Deep neural networks, widely used in computer vision tasks, require substantial computation and memory resources, making it challenging to run these models on resource-constrained devices. Quantization involves modifying DNNs to use smaller data types (e.g., switching from 32-bit floating-point values to 8-bit integer values). Quantization is an effective way to reduce the computation and memory bandwidth requirements of these models, and their memory footprints, making it easier to run them on edge devices. However, quantization does degrade the accuracy of CNNs.
In this talk, Chenna surveys practical techniques for DNN quantization and shares best practices, tools and recipes to enable you to get the best results from quantization, including ways to minimize accuracy loss.
Size: 1.42 MB
Language: en
Added: Aug 20, 2024
Slides: 21 pages
Slide Content
DNN Quantization:
Theory to Practice
Dwith Chenna
MTS Product Engineer, AI Inference
AMD Inc.
•Why Quantization?
•Quantization Schemes
•DNN Model Quantization
•Quantization Aware Training (QAT)
•Post Training Quantization (PTQ)
•Quantization Analysis
•Quantization: Best Practices
Content
2
•Model compression techniques are crucial for edge computing, reducing deep learning
model size for lower memory and processing needs
•Knowledge Distillation
•Pruning / Sparsity
•Quantization
•Network Architecture Search (NAS)
Why Quantization?
3
•Quantization is the process of mapping real numbers, denoted as "r", to quantized
integers, represented as "q"
•Symmetric Quantization
•Asymmetric Quantization
where "S" is the scale and "Z" is the zero points
Quantization Scheme
4
Symmetric Distribution
Frequency
Data
Asymmetric Distribution
Frequency
Data
q = round(r/S)
q = round(r/S + Z)
S = (r_max–r_min) / (q_max–q_min)
Z = round (q_max–r_max/ S)
•Symmetric vs asymmetric quantization
•Choice of quantization scheme depends on data distribution
•Make the best use of bit precision
•Avoid outliers in the data distribution
Quantization Scheme
5
•Deep Neural Network (DNN) model
•Weights: Symmetric per channel
•Activation: Asymmetric per tensor
DNNModel Quantization
6
Activation DistributionWeight Distribution
Histogram distribution of weights and activations [1]
•DNN model quantization
•Quantization Aware Training (QAT)
•Post Training Quantization (PTQ)
DNN Model Quantization
7
•Quantization Aware Training (QAT)
•Adds fake quantization nodes during training
•Pros:
•Fine-tune trained float model
•Improves quantized accuracy
•Cons:
•Compute intensive process
•Needs training dataset
Quantization Aware Training (QAT)
8
•Post Training Quantization (PTQ)
•Analyze different quantization schemes
•Pros:
•No model training
•Limited calibration dataset
•Cons:
•Degradation in accuracy
Post Training Quantization (PTQ)
9
Network Floating-
point
Asymmetric
pertensor
Asymmetric
per channel
Mobilenet-v1 1 224 0.709 0.001 0.704
Mobilenet-v2 1 224 0.719 0.001 0.698
Nasnet-Mobile 0.74 0.722 0.74
Mobilenet-v2 1.4 2240.749 0.004 0.74
Inception-v3 0.78 0.78 0.78
Resnet-v1 50 0.752 0.75 0.75
Resnet-v2 50 0.756 0.75 0.75
Resnet-v1 152 0.768 0.766 0.762
Resnet-v2 152 0.778 0.761 0.77
•Calibration Dataset
•Used to define quantization
parameters
•Representative dataset
•Limited dataset ~100 to 1K
images
Calibration Dataset
10
Network
Accuracy
Accuracy vs Calibration dataset size
•Quantizationintroducesnoiseintheweightsandactivation
•Canleadtosignificantdegradationinmodelaccuracy
•Quantizationanalysis:
•Quantizationerror
•Visualization
•Min/maxtuning
•Layer-wiseanalysis
•Mixedprecision
•Weightequalization
Quantization Analysis
11
Loss surface of ResNet-56 by Hao Li et al. [4]
•Visualizationoftheweights/activations
•Natureofthedistribution
•Multimodaldistribution
•Longtailsindatadistribution
Visualization
13
Value
Frequency
Value
Frequency
Activation distribution (float) Activation distribution (quant)
•Reducethevarianceofweightdistributionacrosschannels
•Adjustthescalefactoracrosslayers
•Enablesuseofsimplerquantizationschemeslikepertensorinsteadofperchannel
Quantization Analysis: Weight Equalization
17
Range Range
Output channel index Output channel index
References
20
[1] Raghuraman Krishnamoorthi. Quantizing deep convolutional networks for efficient inference: A whitepaper.
arXivpreprint arXiv:1806.08342, 2018.
[2] From Theory to Practice: Quantizing Convolutional Neural Networks for Practical Deployment [Link]
[3] Quantization of Convolutional Neural Networks: Model Quantization [Link]
[4]Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. Visualizing the loss landscape of neural
nets. In Advances in Neural Information Processing Systems, pages 6389–6399, 2018.
[5]Quantization of Convolutional Neural Networks: Quantization Analysis [Link]