AI/ML Infra Meetup | Perspective on Deep Learning Framework

Alluxio 247 views 24 slides May 24, 2024
Slide 1
Slide 1 of 24
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24

About This Presentation

AI/ML Infra Meetup
May. 23, 2024
Organized by Alluxio

For more Alluxio Events: https://www.alluxio.io/events/

Speaker:
- Triston Cao (Senior Deep Learning Software Engineering Manager, @NVIDIA)

From Caffe to MXNet, to PyTorch, and more, Xiande Cao, Senior Deep Learning Software Engineer Manager,...


Slide Content

Triston Cao, for Alluxio Meetup on May 23, 2024
PERSPECTIVE ON DEEP LEARNING FRAMEWORK

2

3
COMPUTATION GRAPH AND GRADIENT DECENT
Image credit to Deniz Yuret's Homepage: Alec Radford's animations for optimization
algorithms

4
OPEN-SOURCE FRAMEWORKS
2014 2017 2020 2016 20192015 2018 2024
ChatGPTAlexNet ResNet Transformer

5
WHAT DOES A FRAMEWORK LOOK LIKE
A Hybrid Programming Language Environment

6
NVIDIA NGC CONTAINERS

7
OPS, TENSORS, AND PARALLEL EXECUTION
System Level Optimization
https://mxnet.apache.org/versions/1.9.1/api/architecture/note_engine https://www.oreilly.com/library/view/elegant-scipy/9781491922927/ch01.html

8

9
CONVOLUTIONS
https://paperswithcode.com/methods/category/convolutional-neural-networks https://cv.gluon.ai/contents.html
https://epynn.net/Convolution.html

10
CUDNN 10
TH
ANNIVERSARY
April 2014 – April 2024

11
CUDA
TensorRT NCCL DALI cuDNN
cuBLAS

Deep Learning Frameworks
CPU Libraries

12
SYMBOLIC VS EAGER (IMPERATIVE)
Performance vs Easy of use
ImperativeStatic graph
Eager mode JIT
Hybrid

13
TENSOR CORES AND MIXED PRECISION

14
MORE TENSOR CORES

15
DATA LAYOUT MATTERS TOO
Reference: Convolutional Layers User's Guide - NVIDIA Docs

16
NVIDIA DALI

17
NCCL FOR MULTI-NODE TRAINING
https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/operations.html

18 https://www.nvidia.com/en-us/data-center/resources/mlperf-benchmarks/

19
TRAINING VS INFERENCE

20
FRAMEWORK + TENSORRT FOR INFERENCE

21
INFERENCE WITH INT8
Ref: Achieving FP32 Accuracy for INT8 Inference Using Quantization Aware Training with NVIDIA TensorRT | NVIDIA Technical Blog

22
COMPILER BASED FRAMEWORK
https://tvm.apache.org/docs/tutorial/relay_quick_start.html
https://www.linkedin.com/pulse/exploring-jax-googles-high-performance-py
thon-library-nagilla-hwauc/
Thunder can optimize Pytorch module with
•torch.compile
•nvFuser
•cuDNN
•Apex
•TransformerEngine
•PyTorch eager
•Custom CUDA kernels through PyCUDA,
Numba, CuPy
•Custom kernels written in OpenAI Triton
https://github.com/Lightning-AI/lightning-thun
der

23
TAKE AWAYS
•Deep learning frameworks are large software projects
•NVIDIA keeps making libraries to server deep learning frameworks for GPU acceleration
•Training and inference have different challenges
•More stabilized by still fast evolving
•Compiler technology getting more integrated into the framework