CUDA Architecture
Prof. Shashikant V. Athawale
Assistant Professor | Computer Engineering
Department | AISSMS College of Engineering,
Kennedy Road, Pune , MH, India - 411001
Contents
❖CUDA Architecture
❖Applications of CUDA
❖Introduction to CUDA C-Write and launch CUDA C
kernels
❖Manage GPU memory
❖Manage communication and synchronization
❖Parallel programming in CUDA- C.
Communication And Synchronization in
Thread
Communication And Synchronization in
Thread
CUDA Architecture
CUDA Architecture
Applications of CUDA
CUDA C : The Basics
❖Based on industry-standard C
❖A handful of language extensions to allow heterogeneous
programs
❖Straightforward APIs to manage devices, memory, etc.
❖Terminology:
➢Host – The CPU and its memory (host memory)
➢Device – The GPU and its memory (device memory)
Device
CUDA Kernels
GPU Memory Management
Data Transfer Directions Keywords
❖cudaMemcpyHostToHost
❖cudaMemcpyHostToDevice
❖cudaMemcpyDeviceToHost
❖cudaMemcpyDeviceToDevice
Parallel Programming in CUDA C
❖CUDA brings data-parallel computing to the masses.
❖ CUDA is a scalable parallel programming model.
❖Program runs on any number of processors without
recompiling.
Architecture Of Parallel CUDA Programming
CUDA Uses Extensive Multithreading
❖CUDA threads express fine-grained data parallelism.
➢Map threads to GPU threads.
➢Virtualize the processors.
❖ CUDA thread blocks express coarse-grained parallelism.
➢Blocks hold arrays of GPU threads, define shared
memory boundaries.
➢Allow scaling between smaller and larger GPUs.
CUDA Uses Extensive Multithreading
❖GPUs execute thousands of lightweight threads.
➢In graphics, each thread computes one pixel.
➢One CUDA thread computes one result (or several
results).
➢Hardware multithreading & zero-overhead
scheduling.