“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,” a Presentation from Arm

embeddedvision 113 views 21 slides Jun 11, 2024
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/how-arms-machine-learning-solution-enables-vision-transformers-at-the-edge-a-presentation-from-arm/

Stephen Su, Senior Segment Marketing Manager at Arm, presents the “How Arm’s Machine Learning Solutio...


Slide Content

How Arm’s Machine Learning
Solution Enables Vision
Transformers at the Edge
Stephen Su
Sr. Segment Marketing Manager
Arm Inc.

•What is a transformer? Ref. [1]
Vaswani et al. Attention is all you need,
NIPS 2017
•A highly scalable network
architecture based on self-
attention
Transformer Background
2© 2024 Arm Inc.

•Potentially unified architecture for text, audio, and image
•Models based on transformers perform outstandingly in natural
language processing (NLP) and computer vision (CV)
•Support wide use cases, not only image classification but also
applications such as super resolution, segmentation, object detection,
and much more
Why Transformers?
3© 2024 Arm Inc.

•While CNNs have inductive biases, e.g., locality and translation
equivariance,
•The transformer uses self-attention to capture the dependencies within
the input sequences
•Hence, models based on transformers are more extendable; i.e., work
well in video understanding, image completion, multi-camera, and multi-
modal domains
Transformer in Vision Applications
4© 2024 Arm Inc.

•Hardware is fragmented, ranging from CPU only, (CPU + GPU), or (CPU +
accelerator), and others
•What is the most suitable hardware solution for transformers?
•Efficiency is another challenge
•How do you run transformer models with high power efficiency and low
latency?
•Model size and memory usage
•We need a toolset (with tutorials) to compress model size to a reasonable size
so that it can be deployed at the edge.
Challenges in Deploying Transformer Models at the Edge
5© 2024 Arm Inc.

6© 2024 Arm Inc.
Arm Machine Learning Solution Supporting
Vision Transformers

Introducing Next Generation Arm NPU—
What Makes it Attractive?
7© 2024 Arm Inc.
Higher power efficiency
•Targeting 20%over current generation
Increased performance
•Configurations from 128MACs/cycle to 2048MACs/cycle
Extended operator support
•Hardware accelerated transformernetwork support
Double MAC throughput
•For2/4sparse layers

•In addition to the operators currently supported by the original Ethos product
family, the latest Arm Ethos-U85 includes native hardware support for
transformer networks and DeeplabV3 semantic segmentation network,such as:
New Hardware Operators Accelerate Transformer
Networks
8© 2024 Arm Inc.
TRANSPOSE GATHER MATMUL
RESIZE
BILINEAR
ARGMAX

Arm Example Subsystem
9© 2024 Arm Inc.
Cortex-
M85
Interconnect
DMA-350 Mali-C55
Peripherals
Ethos-U85
Memory
Arm IPs
Non-Arm IPs
•Pre-integrated and verified machine learning solution

How to Use Ethos-U85 in a System
10© 2024 Arm Inc.
Cortex-M Ethos-U85
Interconnect
System SRAMSystem Flash
•End Point AI: Cortex-M
based system
Arm IPs Non-Arm IPs
•ML Island: Cortex-A
based system
Cortex-M
Ethos-
U85
DRAM
Interconnect
System SRAMSystem Flash
Cortex-A
Cortex-A
Cortex-A
Cortex-A
•Discrete NPU: Cortex-A
only
DRAM
Interconnect
System SRAMSystem Flash
Ethos-
U85
Cortex-A
Cortex-A
Cortex-A
Cortex-A

Software Flow on Arm Machine Learning Solution
11© 2024 Arm Inc.
TFLu Runtime
Ref.Kernels
CMSIS-NN
Optimized
Kernels
Cortex-M
CPU
TARGET / DEVICEHOST (OFFLINE)
TF
Frame-
work
TF Quantization
Tooling
TFLite Converter
TFL flat
file
NN
Optimizer
Ethos-U85
Driver
Ethos-U85
NPU
•Cortex-M CPU with Ethos-U85
Arm IPs

Software Flow on Arm Machine Learning Solution
12© 2024 Arm Inc.
•Cortex-M + Cortex-A system TARGET / DEVICE
Ethos-U85
NPU
AXI bus
Linux OS
Cortex-A
+ cache, MMU
TFLiteµ
runtime
Application
.tflite
flatbuffer
Cortex-M +
cache, MPU
Subsystemdriver
NPU carveout
(boot time)
DRAM
Linux/OS
managed area
Ethos-U85
Driver
Address filter
Wrapper App
Inference API
SRAM
AXI bus
HOST (OFFLINE)
TF
Frame-
work
TF Quantization
Tooling
TFLite Converter
TFL flat
file
NN
Optimizer
Arm IPs

Software Flow on Arm Machine Learning Solution
13© 2024 Arm Inc.
•Cortex-A based system TARGET / DEVICE
Ethos-U85
NPU
AXI bus
Linux OS
Cortex-A
+ cache, MMU
Application
NPUdriver
NPU carveout
(boot time)
DRAM
Linux/OS
managed area
Address filter
TFLitedelegate
SRAM
AXI bus
HOST (OFFLINE)
TF
Frame-
work
TF Quantization
Tooling
TFLite Converter
TFL flat
file
NN
Optimizer
: Arm IPs

14© 2024 Arm Inc.
Arm Toolset Enables the Efficient Implementation of
Transformers on Ethos
Data
Use-
case
Optimized
Model
Integrated
with
Application
Model
Searcher
Model
Compressor
Trained
Model
Model
Zoo
Trained
Model
Device
Arm Vela
Compiler
Weight
clustering
QuantizationCompile
Arm transformer Tutorials, the Jupyter
notebooks(.ipynb)showing how to
quantize and compress transformer
encoder and encoder-decoder models.

15© 2024 Arm Inc.
Vision Transformer Example Implementation

DEiTTiny Runs on Ethos-U85
16© 2024 Arm Inc.
Arm MPS3
board with
previous
Ethos-U
Arm MPS3
board with
the latest
Ethos-U85
Demo is to compare how much faster the latest Ethos-
U85 runs a transformer network compared to the
previous Ethos, since there is no fall back for those
operators with Ethos-U85
images
images
Previous Ethos -hummingbird -
execution speed
Ethos-U85 -hummingbird -
execution speed
Output

Up to 8X Acceleration in Inference time
17© 2024 Arm Inc.
•Previous Ethos •The Latest Ethos-U85
For more details, please visit Arm booth at #409.

•Machine learning (ML) is everywhere, and its landscape is evolving from
CNNs to transformer-based models
•Arm just launched the latest NPU in the Arm Ethos product family to
extend the support of accelerating transformers at the edge
•Finally, “Edge AI runs on Arm.”
Summary
18© 2024 Arm Inc.

Resources
Please visit Arm booth #409 at the 2024
Embedded Vision Summit for more demos:
“The Newly LaunchedArm Ethos-U85 NPU”
“Renesas RZ/V2H-Qual-core Cortex-A55
Vision AI MPU”
“Arm-Himax, the High-efficiency Embedded
Computer Vision”
19© 2024 Arm Inc.
Arm Ethos-U product page
https://www.arm.com/products/silicon-
ip-cpu?families=ethos%20npus
Arm transformer tutorials
https://github.com/ARM-software/ML-
zoo/tree/master/tutorials/transformer_tu
torials
Arm keyword-transformer
https://github.com/ARM-
software/keyword-transformer

•Reference [1]: A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin,
“Attention is all you need,” in Proceedings of the 31st International Conference on Neural Information Processing
Systems, 2017, pp. 6000–6010
Reference
20© 2024 Arm Inc.

21© 2024 Arm Inc.
Thank You