Reconfigurable computing represents a paradigm shift in computer architecture, enabling hardware to be dynamically modified to better suit the requirements of specific tasks. Unlike traditional fixed-function hardware, reconfigurable computing systems, such as those based on Field-Programmable Gate ...
Reconfigurable computing represents a paradigm shift in computer architecture, enabling hardware to be dynamically modified to better suit the requirements of specific tasks. Unlike traditional fixed-function hardware, reconfigurable computing systems, such as those based on Field-Programmable Gate Arrays (FPGAs), offer a versatile platform where the hardware functionality can be reprogrammed on-the-fly. This approach bridges the gap between hardware and software, offering the performance benefits of custom hardware with the flexibility of software.
Origins and Evolution
The concept of reconfigurable computing dates back to the 1960s and 70s, but it wasn't until the development of FPGAs in the mid-1980s that the field gained practical traction. Initially, FPGAs were simple, consisting of a few thousand logic gates, but over the decades, advancements in semiconductor technology have significantly increased their complexity and capability. Modern FPGAs boast millions of gates and sophisticated features such as embedded processors, high-speed transceivers, and extensive memory resources.
Architectural Overview: At the heart of reconfigurable computing is the FPGA, a semiconductor device composed of a matrix of configurable logic blocks (CLBs) connected via programmable interconnects. Each CLB can be configured to perform a variety of logical functions, and the interconnects can be adjusted to create complex circuits and data pathways. This configuration is defined by a hardware description language (HDL) such as VHDL or Verilog, which is compiled into a bitstream that programs the FPGA.
In addition to CLBs, modern FPGAs often include:
Digital Signal Processing (DSP) Blocks: Specialized for high-speed arithmetic operations, critical for applications like signal processing and machine learning.
Block RAM: Embedded memory blocks that provide fast, on-chip storage.
Hard IP Cores: Fixed-function blocks such as processors or Ethernet controllers that provide high performance for specific tasks.
High-Speed I/O: Interfaces for high-speed data transfer, essential for applications requiring large data bandwidths.
Advantages of Reconfigurable Computing
Performance and Efficiency
One of the primary advantages of reconfigurable computing is the ability to tailor the hardware to the specific needs of an application. This customization can lead to significant performance improvements and power efficiency compared to general-purpose processors. By directly implementing computational tasks in hardware, FPGAs can execute operations in parallel, leading to substantial speed-ups for certain workloads.
Flexibility
The reconfigurability of FPGAs allows developers to update and optimize the hardware as application requirements evolve. This flexibility is particularly beneficial in fields like telecommunications, where standards and protocols frequently change. Instead of designing and fabricating a new chip, engineers can simply reprogram the FPGA to support
Size: 513.42 KB
Language: en
Added: Jul 07, 2024
Slides: 27 pages
Slide Content
CprE / ComS 583
Reconfigurable Computing
Prof. Joseph Zambreno
Department of Electrical and Computer Engineering
Iowa State University
Lecture #3 –FPGA Basics
Lect-03.2CprE 583 –Reconfigurable ComputingAugust 28, 2007
Quick Points
•HW #1 is out
•Due Thursday, September 6 (12pm)
•Submission via WebCT
•Possible strategies
Assigned Due
Effort Level
Standard
Preferred
Lect-03.3CprE 583 –Reconfigurable ComputingAugust 28, 2007
Recap
•FPGAs –spatial computation
•CPU –temporal computation
•FPGAs are by their nature more
computationally “dense” than CPU
•In terms of number of computations / time / area
•Can be quantitatively measured and compared
•Capacity, cost, ease of programming still
important issues
•Numerous challenges to reconfiguration
Lect-03.5CprE 583 –Reconfigurable ComputingAugust 28, 2007
FPGA Taxonomy
•Programming technology–how is the FPGA programmed? Where
does it store configuration bits?
•SRAM
•Anti-fuse
•EPROM
•Flash memory (EEPROM)
•Logic cell architecture–what is the granularity of configurable
component? Tradeoff between complexity and versitility
•Transistors
•Gates
•PAL/PLAs
•LUTs
•CPUs
•Interconnect architecture –how do the logic cells communicate?
•Tiled
•Hierarchical
•Local
Lect-03.6CprE 583 –Reconfigurable ComputingAugust 28, 2007
Anti-Fuse Technology
•Dielectric that prevents current flow
•Applying a voltage melts the dielectric
•One time programmable –not really
reconfigurable computing
Lect-03.8CprE 583 –Reconfigurable ComputingAugust 28, 2007
E(E)PROM Technology
•To program a transistor, a voltage differential between the
access/floating gates accelerates electrons from the source fast
enough to travel across the insulator to the floating gate
•Electrons prevent the access gate from closing
•EPROM –Erasable Programmable Read-Only Memory
•Nonvolatile
•Can be erased using UV light
•EEPROM –Electrically Erasable Programmable Read-Only Memory
•Removes the electrons by reversing the voltage differential
•Limited number of erases possible
•Precursor to Flash technology
Lect-03.9CprE 583 –Reconfigurable ComputingAugust 28, 2007
Flash / EEPROM Devices
•Migrated from early PLD technology
•Traditionally based on AND/OR architecture
•High ratio of logic-to-registers
•Future of Flash devices?
•Logic elements (LUTs and flip flops)
•Segmented routing
•Low logic to register ratio
Actel ProASIC Plus logic cell
Lect-03.10CprE 583 –Reconfigurable ComputingAugust 28, 2007
SRAM Technology
•SRAM –Static Random
Access Memory
•SRAM cells are larger
(6 transistors) than anti-
fuse or EEPROM
•Slower
•Less computational
power per λ
2
•SRAM bits can be
programmed many
times
Lect-03.11CprE 583 –Reconfigurable ComputingAugust 28, 2007
Which Devices to Study?
•SRAM –~95%, Flash/Anti-fuse –~5%
Lect-03.12CprE 583 –Reconfigurable ComputingAugust 28, 2007
Lookup Tables (LUTs)
•What is a Lookup Table (LUT)?
•In most generic terms, a pre-loaded memory
•Great way of implementing some function
without calculation –a cheat sheet
Data
Addr
R_W
Q
int table[256] = {1,7,8,9,…14};
int Q, addr;
...
Q = table[addr];
Q7: What is the answer to life,
the universe, and everything?
Answer Key
.
.
.
A(Q7) = 42
Lect-03.13CprE 583 –Reconfigurable ComputingAugust 28, 2007
LUTs and Digital Logic
carry
logic
k-LUT
DFF
I1I2…Ik
Cout
Cin
OUT
•Each k-LUT operates on kone-bit inputs
•Output is one data bit
•Can perform anyBoolean function of kinputs
Lect-03.14CprE 583 –Reconfigurable ComputingAugust 28, 2007
LUTs and Digital Logic (cont.)
•kinputs 2
k
possible input values
•k-LUT corresponds to 2
k
x 1 bit memory
•How many different functions?
•Two-input (A
1,A
2) case
•F(A
1,A
2) = {0, A
1nor A
2, Ā
1andA
2, Ā
1, A
1andĀ
2, Ā
2, A
1
xorA
2, A
1nandA
2, A
1andA
2, A
1xnorA
2, A
2, Ā
1orA
2,
A
1, A
1orĀ
2, A
1or A
2, 1} = 16 different possibilities
•What to store in the LUT?
0 0
0 1
1 0
1 1
A
1A
2
0
0
0
0
0
1
0
0
0
1
0
1
0
0
2
1
1
0
0
3
0
0
1
0
4
1
0
1
0
5
0
1
1
0
6
1
1
1
0
7
0
0
0
1
8
1
0
0
1
9
0
1
0
1
10
1
1
0
1
11
0
0
1
1
12
1
0
1
1
13
0
1
1
1
14
1
1
1
1
15
Lect-03.15CprE 583 –Reconfigurable ComputingAugust 28, 2007
LUTs and Digital Logic
•F(input) can be 0 or 1 independently for each of the 2
k
bits
•2
2
k
possible functions
•Not all patterns are unique (k! permutations of the
inputs)
•Approximately 2
2
k
/ k! unique functions
•Is this efficient?
•How to select kvalue?
0 0
0 1
1 0
1 1
A
1A
2
0
0
0
0
0
1
0
0
0
1
0
1
0
0
2
1
1
0
0
3
0
0
1
0
4
1
0
1
0
5
0
1
1
0
6
1
1
1
0
7
0
0
0
1
8
1
0
0
1
9
0
1
0
1
10
1
1
0
1
11
0
0
1
1
12
1
0
1
1
13
0
1
1
1
14
1
1
1
1
15
Lect-03.16CprE 583 –Reconfigurable ComputingAugust 28, 2007
LUT Mapping
•How many gates in a 2-LUT + flip-flop?
0 0
0 1
1 0
1 1
A
1A
2
x
0
x
1
x
2
x
3
f
DFF
0
0.5
1
1.5
~8
DQ
Q’
R
S
4
Lect-03.17CprE 583 –Reconfigurable ComputingAugust 28, 2007
LUT Mapping (cont.)
•Implement the following logic function with k-
LUTs, for k={2, 3, 4}:
F = A
0A
1A
3+ A
1A
2Ā
3+ Ā
0 Ā
1 Ā
2
A
0
A
3
A
2
A
3
A
1
A
0
A
2
A
1
F
A
0
A
1
A
2
A
3
F
A
0A
2A
3
F
A
0A
2
A
1
Lect-03.18CprE 583 –Reconfigurable ComputingAugust 28, 2007
Logic Block Granularity
•Each k-LUT requires 2
k
configuration bits:
•The 2-LUT implementation requires 2
2
x 7 = 28
bits
•The 3-LUT needs 2
3
x 3 = 24 bits
•The 4-LUT needs just 2
4
x 1 = 16 bits
•Using configuration bits as area measure (area
cost), the 4-LUT implementation achieves
minimum logic area
Lect-03.19CprE 583 –Reconfigurable ComputingAugust 28, 2007
Interconnect
•Problem:
•Thousands of operators producing results
•Each taking as outputs the results of other bit
operators
•Initial assumptions –have to connect them all
simultaneously
Lect-03.21CprE 583 –Reconfigurable ComputingAugust 28, 2007
General Routing Architecture
•A wire segmentis a wire unbroken by
programmable switches
•Typically one switch is attached to each end of
a wire segment
•A trackis a sequence of one or more wire
segments in a line
•A routing channel is a group of parallel tracks
•A connection blockprovides connectivity from
the inputs and outputs of a logic block to the
wire segments in the channels
Lect-03.22CprE 583 –Reconfigurable ComputingAugust 28, 2007
Attempt 1 –Crossbar
•Any operator may want to take as input the
output of any other operator
•Let nbe the number of LUTs, and lthe length
of wire
•Delay:
•Parasitic loads = kn
Switches/path = l
Wire length = O(kn)
Delay = O(kn)
•Area:
•Bisection BW = n
Switches = kn
2
Area = O(n
2
)
Lect-03.23CprE 583 –Reconfigurable ComputingAugust 28, 2007
Attempt 2 –Mesh
•Put connected components together
•Transitive fan-in grows faster than the close sites:
•Let wbe the channel width
•Delay:
•Switches/path = l
Wire length = O(l)
Best Delay = O(w)
Worst Delay = O(n
½
w)
•Area:
•Bisection BW = wn
½
Switches = O(nw
2
)
Area = O(nw
2
)
Lect-03.24CprE 583 –Reconfigurable ComputingAugust 28, 2007
Segmentation
•Segmentation distribution: how many of each length?
•Longer length
•Better performance?
•Reduced routability?
X Y
Length 4
Length 2
Length 1
Lect-03.25CprE 583 –Reconfigurable ComputingAugust 28, 2007
Interconnect Area/Delay
•Interconnect area = ~10x configuration memory
= ~100x logic
•The interconnect constitutes ~90% of the total
area and 70-80% of the total delay
•See [Deh96A] for details
Lect-03.26CprE 583 –Reconfigurable ComputingAugust 28, 2007
Architectural Issues [AhmRos04A]
•What values of N, I,
and K minimize the
following parameters?
•Area
•Delay
•Area-delay product
•Assumptions
•All routing wires length 4
•Fully populated IMUX
•Wiring is half pass
transistor, half tri-state
•180 nm
•Routing performed with
W
min+ 30% tracks
Lect-03.27CprE 583 –Reconfigurable ComputingAugust 28, 2007
Summary
•Three basic types of FPGA devices
•Antifuse
•EEPROM
•SRAM
•Key issues for SRAM FPGA are logic cluster,
connection box, and switch box.
•Most tasks have structure, which can be
exploited by LUT arrays with programmable
interconnect (FPGAs)
•Cannot afford full interconnect, or make all
interconnects local