AI/ML Infra Meetup | Perspective on Deep Learning Framework

Alluxio 247 views 24 slides May 24, 2024

Slide 1 of 24

About This Presentation

AI/ML Infra Meetup
May. 23, 2024
Organized by Alluxio

For more Alluxio Events: https://www.alluxio.io/events/

Speaker:
- Triston Cao (Senior Deep Learning Software Engineering Manager, @NVIDIA)

From Caffe to MXNet, to PyTorch, and more, Xiande Cao, Senior Deep Learning Software Engineer Manager,...

Size: 5.6 MB

Language: en

Added: May 24, 2024

Slides: 24 pages

Slide Content

Triston Cao, for Alluxio Meetup on May 23, 2024
PERSPECTIVE ON DEEP LEARNING FRAMEWORK

3
COMPUTATION GRAPH AND GRADIENT DECENT
Image credit to Deniz Yuret's Homepage: Alec Radford's animations for optimization
algorithms

4
OPEN-SOURCE FRAMEWORKS
2014 2017 2020 2016 20192015 2018 2024
ChatGPTAlexNet ResNet Transformer

5
WHAT DOES A FRAMEWORK LOOK LIKE
A Hybrid Programming Language Environment

6
NVIDIA NGC CONTAINERS

7
OPS, TENSORS, AND PARALLEL EXECUTION
System Level Optimization
https://mxnet.apache.org/versions/1.9.1/api/architecture/note_engine https://www.oreilly.com/library/view/elegant-scipy/9781491922927/ch01.html

9
CONVOLUTIONS
https://paperswithcode.com/methods/category/convolutional-neural-networks https://cv.gluon.ai/contents.html
https://epynn.net/Convolution.html

10
CUDNN 10
TH
ANNIVERSARY
April 2014 – April 2024

11
CUDA
TensorRT NCCL DALI cuDNN
cuBLAS
…
Deep Learning Frameworks
CPU Libraries

12
SYMBOLIC VS EAGER (IMPERATIVE)
Performance vs Easy of use
ImperativeStatic graph
Eager mode JIT
Hybrid

13
TENSOR CORES AND MIXED PRECISION

14
MORE TENSOR CORES

15
DATA LAYOUT MATTERS TOO
Reference: Convolutional Layers User's Guide - NVIDIA Docs

16
NVIDIA DALI

17
NCCL FOR MULTI-NODE TRAINING
https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/operations.html

18 https://www.nvidia.com/en-us/data-center/resources/mlperf-benchmarks/

19
TRAINING VS INFERENCE

20
FRAMEWORK + TENSORRT FOR INFERENCE

21
INFERENCE WITH INT8
Ref: Achieving FP32 Accuracy for INT8 Inference Using Quantization Aware Training with NVIDIA TensorRT | NVIDIA Technical Blog

22
COMPILER BASED FRAMEWORK
https://tvm.apache.org/docs/tutorial/relay_quick_start.html
https://www.linkedin.com/pulse/exploring-jax-googles-high-performance-py
thon-library-nagilla-hwauc/
Thunder can optimize Pytorch module with
•torch.compile
•nvFuser
•cuDNN
•Apex
•TransformerEngine
•PyTorch eager
•Custom CUDA kernels through PyCUDA,
Numba, CuPy
•Custom kernels written in OpenAI Triton
https://github.com/Lightning-AI/lightning-thun
der

23
TAKE AWAYS
•Deep learning frameworks are large software projects
•NVIDIA keeps making libraries to server deep learning frameworks for GPU acceleration
•Training and inference have different challenges
•More stabilized by still fast evolving
•Compiler technology getting more integrated into the framework

AI/ML Infra Meetup | Perspective on Deep Learning Framework

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

AI/ML Infra Meetup | Perspective on Deep Learning Framework

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx