CUDA Architecture

4,267 views 16 slides Jul 17, 2018
Slide 1
Slide 1 of 16
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16

About This Presentation

CUDA Architecture


Slide Content

CUDA Architecture
Prof. Shashikant V. Athawale
Assistant Professor | Computer Engineering
Department | AISSMS College of Engineering,
Kennedy Road, Pune , MH, India - 411001

Contents
❖CUDA Architecture
❖Applications of CUDA
❖Introduction to CUDA C-Write and launch CUDA C
kernels
❖Manage GPU memory
❖Manage communication and synchronization
❖Parallel programming in CUDA- C.

Communication And Synchronization in
Thread

Communication And Synchronization in
Thread

CUDA Architecture

CUDA Architecture

Applications of CUDA

CUDA C : The Basics
❖Based on industry-standard C
❖A handful of language extensions to allow heterogeneous
programs
❖Straightforward APIs to manage devices, memory, etc.
❖Terminology:
➢Host – The CPU and its memory (host memory)
➢Device – The GPU and its memory (device memory)
Device

CUDA Kernels

GPU Memory Management

Data Transfer Directions Keywords
❖cudaMemcpyHostToHost
❖cudaMemcpyHostToDevice
❖cudaMemcpyDeviceToHost
❖cudaMemcpyDeviceToDevice

Parallel Programming in CUDA C
❖CUDA brings data-parallel computing to the masses.
❖ CUDA is a scalable parallel programming model.
❖Program runs on any number of processors without
recompiling.

Architecture Of Parallel CUDA Programming

CUDA Uses Extensive Multithreading
❖CUDA threads express fine-grained data parallelism.
➢Map threads to GPU threads.
➢Virtualize the processors.
❖ CUDA thread blocks express coarse-grained parallelism.
➢Blocks hold arrays of GPU threads, define shared
memory boundaries.
➢Allow scaling between smaller and larger GPUs.

CUDA Uses Extensive Multithreading
❖GPUs execute thousands of lightweight threads.
➢In graphics, each thread computes one pixel.
➢One CUDA thread computes one result (or several
results).
➢Hardware multithreading & zero-overhead
scheduling.

Applications
❖High bandwidth
❖Visual computing
❖High arithmetic intensity