The OpenCL-based FPGA accelerator data flow is as follows. First, the kernel receives the data in the input buffer; then, it stores the result in the output buffer after computing; finally, the host side reads the result into memory

chowsaj13 10 views 12 slides May 26, 2024
Slide 1
Slide 1 of 12
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12

About This Presentation

To alleviate the problems mentioned above, channel shuffle was employed to compensate for the accuracy of the CNN model. Since channel shuffle requires the exchange of data between groups of channels, it imposed new synchronization points while compensating for accuracy.
When using model parallelism...


Slide Content

Introduction
ChowdhurySajadulIslam
Computer Science and Engineering
UttaraUniversity, Dhaka

Architecture & Organization
All Intel x86 family share the same basic architecture
The IBM System/370 family share the same basic
architecture
This gives code compatibility
At least backwards
Organization differs between different versions
8/7/2020 2

Structureis the way in which components
relate to each other
Functionis the operation of individual
components as part of the structure
8/7/2020 3
Digital computer Structure & Function
All computer functions are:
•Data processing
•Data storage
•Data movement
•Control

Functional View
8/7/2020 4

Operations (a) Data movement
simply transferring
data from one peripheral or
communications line to
another.
8/7/2020 5

Operations(b) Storage
Data transferred
from the external
environment
to computer storage
(read) and vice
versa (write).
8/7/2020 6

Operation (c) Processing from/to storage
Show operations involving data
processing, on data either in
storage
8/7/2020 7

Operation (d) Processing from storage to I/O
Data either in
storage or
external
environment or route
between storage and
the external
environment
8/7/2020 8

Forces on Computer Architecture
Digital Computer
Architecture
Technology
Programming
Languages
Operating
Systems
History
Applications
8/7/2020 9

Technology => dramatic change
Processor
logic capacity: about 30% per year
clock rate: about 20% per year
So… advanced functions (e.g., multimedia functions in
some Pentiums) and high-speed features (multiple
pipelines, larger caches)
Memory
DRAM capacity: about 60% per year (4x every 3 years)
Memory speed: about 10% per year
Cost per bit: improves about 25% per year
So… larger memory => more challenging applications (e.g.,
atmospheric modeling, astrophysics modeling)
Disk
capacity: about 60% per year
So … huge disk capacities => large data storage (video,
music files, large data for various applications)
8/7/2020 10

Performance Assessment Clock Speed
1.Key parameters
1.Performance, cost, size, security, reliability,
power consumption
2.Systemclock speed
1.In Hz or multiples of Clock rateor clock
speed(The rate of pulses),
2.clock cycle, clock tick(One increment, or pulse,
of the clock),
3.cycle time(The time between pulses is the cycle
time)

3. Signals in CPU take time to settledown to 1 or 0
4. Signalsmay changeat different speeds
5. Operationsneed to be synchronised
6. Instructionexecutionin discretesteps
1.Fetch, decode, loadand store, arithmeticor logical
2.Usually require multiple clock cycles per instruction
7. Pipelininggives simultaneous executionof instructions
So, clockspeed is not the whole story
Performance Assessment Clock Speed