The OpenCL-based FPGA accelerator data flow is as follows. First, the kernel receives the data in the input buffer; then, it stores the result in the output buffer after computing; finally, the host side reads the result into memory
chowsaj13
10 views
12 slides
May 26, 2024
Slide 1 of 12
1
2
3
4
5
6
7
8
9
10
11
12
About This Presentation
To alleviate the problems mentioned above, channel shuffle was employed to compensate for the accuracy of the CNN model. Since channel shuffle requires the exchange of data between groups of channels, it imposed new synchronization points while compensating for accuracy.
When using model parallelism...
To alleviate the problems mentioned above, channel shuffle was employed to compensate for the accuracy of the CNN model. Since channel shuffle requires the exchange of data between groups of channels, it imposed new synchronization points while compensating for accuracy.
When using model parallelism, some neural networks need synchronization points because their network structure inherently requires communication
Size: 268.74 KB
Language: en
Added: May 26, 2024
Slides: 12 pages
Slide Content
Introduction
ChowdhurySajadulIslam
Computer Science and Engineering
UttaraUniversity, Dhaka
Architecture & Organization
All Intel x86 family share the same basic architecture
The IBM System/370 family share the same basic
architecture
This gives code compatibility
At least backwards
Organization differs between different versions
8/7/2020 2
Structureis the way in which components
relate to each other
Functionis the operation of individual
components as part of the structure
8/7/2020 3
Digital computer Structure & Function
All computer functions are:
•Data processing
•Data storage
•Data movement
•Control
Functional View
8/7/2020 4
Operations (a) Data movement
simply transferring
data from one peripheral or
communications line to
another.
8/7/2020 5
Operations(b) Storage
Data transferred
from the external
environment
to computer storage
(read) and vice
versa (write).
8/7/2020 6
Operation (c) Processing from/to storage
Show operations involving data
processing, on data either in
storage
8/7/2020 7
Operation (d) Processing from storage to I/O
Data either in
storage or
external
environment or route
between storage and
the external
environment
8/7/2020 8
Forces on Computer Architecture
Digital Computer
Architecture
Technology
Programming
Languages
Operating
Systems
History
Applications
8/7/2020 9
Technology => dramatic change
Processor
logic capacity: about 30% per year
clock rate: about 20% per year
So… advanced functions (e.g., multimedia functions in
some Pentiums) and high-speed features (multiple
pipelines, larger caches)
Memory
DRAM capacity: about 60% per year (4x every 3 years)
Memory speed: about 10% per year
Cost per bit: improves about 25% per year
So… larger memory => more challenging applications (e.g.,
atmospheric modeling, astrophysics modeling)
Disk
capacity: about 60% per year
So … huge disk capacities => large data storage (video,
music files, large data for various applications)
8/7/2020 10
Performance Assessment Clock Speed
1.Key parameters
1.Performance, cost, size, security, reliability,
power consumption
2.Systemclock speed
1.In Hz or multiples of Clock rateor clock
speed(The rate of pulses),
2.clock cycle, clock tick(One increment, or pulse,
of the clock),
3.cycle time(The time between pulses is the cycle
time)
3. Signals in CPU take time to settledown to 1 or 0
4. Signalsmay changeat different speeds
5. Operationsneed to be synchronised
6. Instructionexecutionin discretesteps
1.Fetch, decode, loadand store, arithmeticor logical
2.Usually require multiple clock cycles per instruction
7. Pipelininggives simultaneous executionof instructions
So, clockspeed is not the whole story
Performance Assessment Clock Speed