All rights Reserved to original creator of the PPT .
Size: 1.37 MB
Language: en
Added: Feb 28, 2024
Slides: 76 pages
Slide Content
1
World of Integrated Circuits
Integrated Circuits
Full-Custom
ASICs
Semi-Custom
ASICs
User
Programmable
PLD FPGA
PAL PLA PML
LUT
(Look-Up Table)
MUXGates
2
3
4
5
6
•designs must be sent
for expensive and time
consuming fabrication
in semiconductor foundry
•bought off the shelf
and reconfigured by
designers themselves
Two competing implementation approaches
ASIC
ApplicationSpecific
IntegratedCircuit
FPGA
FieldProgrammable
GateArray
•designed all the way
from behavioral description
to physical layout
•no physical layout design;
design ends with
a bitstreamused
to configure a device
7
8
9
10B0
10McCollum
CPLD Summary
•Constant delay
•Shallow logic
•great for combinatorial logic , but not sequential logic
•less than 5000 gates
•Marginal radiation tolerance due to erasure ~20K
Rads
•Can suffer SEGR during programming
11
Block RAMs Block RAMs
Configurable
Logic
Blocks
I/O
Blocks
What is an FPGA?
Block
RAMs
12
Other FPGA Advantages
•Manufacturing cycle for ASIC is very costly,
lengthy and engages lots of manpower
•Mistakes not detected at design time have
large impact on development time and cost
•FPGAs are perfect for rapid prototyping of
digital circuits
•Easy upgrades like in case of software
•Unique applications
•reconfigurable computing
14
Xilinx
Primary products: FPGAs and the associated CAD
software
Main headquarters in San Jose, CA
Fabless* Semiconductor and Software Company
UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996}
Seiko Epson (Japan)
TSMC (Taiwan)
Programmable
Logic Devices
ISE Alliance and Foundation
Series Design Software
15
Xilinx FPGA Families
•Old families
•XC3000, XC4000, XC5200
•Old 0.5µm,0.35µmand 0.25µmtechnology. Not
recommended for modern designs.
•High-performance families
•Virtex (0.22µm)
•Virtex-E, Virtex-EM (0.18µm)
•Virtex-II, Virtex-II PRO(0.13µm)
•Low Cost Family
•Spartan/XL –derived from XC4000
•Spartan-II–derived from Virtex
•Spartan-IIE –derived from Virtex-E
•Spartan-3
16
17
Basic Spartan-II FPGA Block Diagram
18
F5IN
CIN
CLK
CE
COUT
D Q
CK
S
R
EC
D Q
CK
R
EC
O
G4
G3
G2
G1
Look-Up
Table
Carry
&
Control
Logic
O
YB
Y
F4
F3
F2
F1
XB
X
Look-Up
Table
BY
SR
S
Carry
&
Control
Logic
SLICE
COUT
D Q
CK
S
R
EC
D Q
CK
R
EC
O
G4
G3
G2
G1
Look-Up
Table
Carry
&
Control
Logic
O
YB
Y
F4
F3
F2
F1
XB
X
Look-Up
Table
F5IN
BY
SR
S
Carry
&
Control
Logic
CIN
CLK
CE SLICE
CLB Structure
•Each slice has 2 LUT-FF pairs with associated carry logic
•Two 3-state buffers (BUFT) associated with each CLB,
accessible by all CLB outputs
19
COUT
D Q
CK
S
R
EC
D Q
CK
R
EC
O
G4
G3
G2
G1
Look-Up
Table
Carry
&
Control
Logic
O
YB
Y
F4
F3
F2
F1
XB
X
Look-Up
Table
F5IN
BY
SR
S
Carry
&
Control
Logic
CIN
CLK
CE
SLICE
CLB Slice
20
LUT (Look-Up Table) Functionality
•Look-Up tables
are primary
elements for
logic
implementation
•Each LUT can
implement any
function of 4
inputsx
1
x
2
x
3
x
4
y
x
1x
2
y
LUT
x
1
x
2
x
3
x
4
y
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
x
1
x
2
x
3
x
4
y
x
1
x
2
x
3
x
4
y
x
1x
2
y
x
1x
2
y
LUT
x
1
x
2
x
3
x
4
y
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
21
CLB Slice Structure
•Each slice contains two sets of the
following:
•Four-input LUT
•Any 4-input logic function,
•or 16-bit x 1 sync RAM
•or 16-bit shift register
•Carry & Control
•Fast arithmetic logic
•Multiplier logic
•Multiplexer logic
•Storage element
•Latch or flip-flop
•Set and reset
•True or inverted inputs
•Sync. or async. control
22
LUT (Look-Up Table) Functionality
•Look-Up tables
are primary
elements for
logic
implementation
•Each LUT can
implement any
function of 4
inputsx
1
x
2
x
3
x
4
y
x
1x
2
y
LUT
x
1
x
2
x
3
x
4
y
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
x
1
x
2
x
3
x
4
y
x
1
x
2
x
3
x
4
y
x
1x
2
y
x
1x
2
y
LUT
x
1
x
2
x
3
x
4
y
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
23
RAM16X1S
O
D
WE
WCLK
A0
A1
A2
A3
RAM32X1S
O
D
WE
WCLK
A0
A1
A2
A3
A4
RAM16X2S
O1
D0
WE
WCLK
A0
A1
A2
A3
D1
O0
=
=
LUT
LUT
or
LUT
RAM16X1D
SPO
D
WE
WCLK
A0
A1
A2
A3
DPRA0DPO
DPRA1
DPRA2
DPRA3
or
Distributed RAM
•CLB LUT configurable as
Distributed RAM
•A LUT equals 16x1 RAM
•Implements Single and Dual-
Ports
•Cascade LUTs to increase
RAM size
•Synchronous write
•Synchronous/Asynchronous
read
•Accompanying flip-flops used
for synchronous read
24
DQ
CE
DQ
CE
DQ
CE
DQ
CE
LUT
IN
CE
CLK
DEPTH[3:0]
OUT
LUT=
Shift Register
•Each LUT can be
configured as shift register
•Serial in, serial out
•Dynamically addressable
delay up to 16 cycles
•For programmable
pipeline
•Cascade for greater cycle
delays
•Use CLB flip-flops to add
depth
25
COUT
D Q
CK
S
R
EC
D Q
CK
R
EC
O
G4
G3
G2
G1
Look-Up
Table
Carry
&
Control
Logic
O
YB
Y
F4
F3
F2
F1
XB
X
Look-Up
Table
F5IN
BY
SR
S
Carry
&
Control
Logic
CIN
CLK
CE
SLICE
Carry & Control Logic
26
Each CLB contains separate
logic and routing for the fast
generation of sum & carry
signals
•Increases efficiency and
performance of adders,
subtractors, accumulators,
comparators, and counters
Carry logic is independent of
normal logic and routing
resources
Fast Carry Logic
LSB
MSB
Carry Logic
Routing
27
Block RAM
Spartan-II
True Dual-Port
Block RAM
Port APort B
Block RAM
•Most efficient memory implementation
•Dedicated blocks of memory
•Ideal for most memory requirements
•4 to 14 memory blocks
•4096 bits per blocks
•Use multiple blocks for larger memories
•Builds both single and true dual-port RAMs
28
Spartan-II Block RAM Amounts
29
Block RAM Port Aspect Ratios
0
4095
1
1023
4
0
1047
2
0
511
8
0
255
16
0
4k x 1
2k x 2
1k x 4
512 x 8
256 x 16
30
Basic I/O Block Structure
D
EC
Q
SR
D
EC
Q
SR
D
EC
Q
SR
Three-State
Control
Output Path
Input Path
Three-State
Output
Clock
Set/Reset
Direct Input
Registered
Input
FF Enable
FF Enable
FF Enable
31
IOB Functionality
•IOB provides interface between the
package pins and CLBs
•Each IOB can work as uni-or bi-directional
I/O
•Outputs can be forced into High Impedance
•Inputs and outputs can be registered
•advised for high-performance I/O
•Inputs can be delayed
43B0
43McCollum
LUT
•Add a flip flop and your done.
George Mason University
FPGA Tools
45
Design process (1)
Design and implement a simple unit permitting to
speed up encryption with RC5-similar cipher with
fixed key set on 8031 microcontroller. Unlike in
the experiment 5, this time your unit has to be able
to perform an encryption algorithm by itself,
executing 32 rounds…..
LibraryIEEE;
useieee.std_logic_1164.all;
useieee.std_logic_unsigned.all;
entityRC5_core is
port(
clock, reset, encr_decr: instd_logic;
data_input: instd_logic_vector(31downto0);
data_output: outstd_logic_vector(31downto0);
out_full: instd_logic;
key_input: instd_logic_vector(31downto0);
key_read: outstd_logic;
);
endAES_core;
Specification (Lab Experiments)
VHDL description (Your Source Files)
Functional simulation
Post-synthesis simulation
Synthesis
46
Design process (2)
Implementation
Configuration
Timing simulation
On chip testing
47
Simulation Tools
Many others…
48
49
50
Synthesis Tools
… and others
51
Levels of design description
Algorithmic level
Register Transfer Level
Logic (gate) level
Circuit (transistor) level
Physical (layout) level
Level of description
most suitable for synthesis
52
53
Logic Synthesis
VHDL code VHDL simulator
Library of
standard cells
Speed without routing
Area without routing
Netlist
Design Process for ASICs (1)
Functional verification
54
Placing & routing
Netlist
Library of
standard cells
Area with routing
Speed with routing
Layout
Design Process (2)
55
architecture MLU_DATAFLOW of MLU is
signal A1:STD_LOGIC;
signal B1:STD_LOGIC;
signal Y1:STD_LOGIC;
signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC;
begin
A1<=A when (NEG_A='0') else
not A;
B1<=B when (NEG_B='0') else
not B;
Y<=Y1 when (NEG_Y='0') else
not Y1;
MUX_0<=A1 and B1;
MUX_1<=A1 or B1;
MUX_2<=A1 xor B1;
MUX_3<=A1 xnor B1;
with (L1 & L0) select
Y1<=MUX_0 when "00",
MUX_1 when "01",
MUX_2 when "10",
MUX_3 when others;
end MLU_DATAFLOW;
VHDL description Circuit netlist
Logic Synthesis
56
Features of synthesis tools
•Interpret RTL code
•Produce synthesized circuit netlist in a
standard EDIF format
•Give preliminary performance estimates
•Some can display circuit schematics
corresponding to EDIF netlist
57
Implementation
•After synthesis the entire implementation
process is performed by FPGA vendor
tools
69
Static Timing Analyzer
•Performs static analysis of the circuit
performance
•Reports critical paths with all sources of
delays
•Determines maximum clock frequency
70
Static Timing Analysis
•Critical Path –The Longest Path From
Outputs of Registers to Inputs of
Registers
DQ
in
clk
DQ
out
t
P logic
t
Critical= t
P FF+ t
Plogic+ t
S FF
71
Static Timing Analysis
•Min. Clock Period = Length of The
Critical Path
•Max. Clock Frequency = 1 / Min. Clock
Period
72
Configuration
•Once a design is implemented, you must create a
file that the FPGA can understand
•This file is called a bit stream: a BIT file (.bit extension)
•The BIT file can be downloaded directly to the
FPGA, or can be converted into a PROM file
which stores the programming information
73
74
Projects1, 2
Optimization Criteria
Maximum ratio
Throughput / Circuit Area
or
Minimum product
Latency Circuit Area
75
76
Primary timing parameters
Latency Throughput
Circuit
Time to
process
a single block
of data
X
i
Y
i
Number of bits
processed
in a unit of time
Circuit
X
i
X
i+1
X
i+2
Y
i
Y
i+1
Y
i+2
Throughput =
Block_size · Number_of_blocks_processed_simultaneously
Latency