Introduction to Asic Design and VLSI Design

PratikGohel3 187 views 76 slides Feb 28, 2024
Slide 1
Slide 1 of 76
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76

About This Presentation

All rights Reserved to original creator of the PPT .


Slide Content

1
World of Integrated Circuits
Integrated Circuits
Full-Custom
ASICs
Semi-Custom
ASICs
User
Programmable
PLD FPGA
PAL PLA PML
LUT
(Look-Up Table)
MUXGates

2

3

4

5

6
•designs must be sent
for expensive and time
consuming fabrication
in semiconductor foundry
•bought off the shelf
and reconfigured by
designers themselves
Two competing implementation approaches
ASIC
ApplicationSpecific
IntegratedCircuit
FPGA
FieldProgrammable
GateArray
•designed all the way
from behavioral description
to physical layout
•no physical layout design;
design ends with
a bitstreamused
to configure a device

7

8

9

10B0
10McCollum
CPLD Summary
•Constant delay
•Shallow logic
•great for combinatorial logic , but not sequential logic
•less than 5000 gates
•Marginal radiation tolerance due to erasure ~20K
Rads
•Can suffer SEGR during programming

11
Block RAMs Block RAMs
Configurable
Logic
Blocks
I/O
Blocks
What is an FPGA?
Block
RAMs

12
Other FPGA Advantages
•Manufacturing cycle for ASIC is very costly,
lengthy and engages lots of manpower
•Mistakes not detected at design time have
large impact on development time and cost
•FPGAs are perfect for rapid prototyping of
digital circuits
•Easy upgrades like in case of software
•Unique applications
•reconfigurable computing

13
Major FPGA Vendors
SRAM-based FPGAs
•Xilinx, Inc.
•Altera Corp.
•Atmel
•Lattice Semiconductor
Flash & antifuse FPGAs
•Actel Corp.
•Quick Logic Corp.

14
Xilinx
Primary products: FPGAs and the associated CAD
software
Main headquarters in San Jose, CA
Fabless* Semiconductor and Software Company
UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996}
Seiko Epson (Japan)
TSMC (Taiwan)
Programmable
Logic Devices
ISE Alliance and Foundation
Series Design Software

15
Xilinx FPGA Families
•Old families
•XC3000, XC4000, XC5200
•Old 0.5µm,0.35µmand 0.25µmtechnology. Not
recommended for modern designs.
•High-performance families
•Virtex (0.22µm)
•Virtex-E, Virtex-EM (0.18µm)
•Virtex-II, Virtex-II PRO(0.13µm)
•Low Cost Family
•Spartan/XL –derived from XC4000
•Spartan-II–derived from Virtex
•Spartan-IIE –derived from Virtex-E
•Spartan-3

16

17
Basic Spartan-II FPGA Block Diagram

18
F5IN
CIN
CLK
CE
COUT
D Q
CK
S
R
EC
D Q
CK
R
EC
O
G4
G3
G2
G1
Look-Up
Table
Carry
&
Control
Logic
O
YB
Y
F4
F3
F2
F1
XB
X
Look-Up
Table
BY
SR
S
Carry
&
Control
Logic
SLICE
COUT
D Q
CK
S
R
EC
D Q
CK
R
EC
O
G4
G3
G2
G1
Look-Up
Table
Carry
&
Control
Logic
O
YB
Y
F4
F3
F2
F1
XB
X
Look-Up
Table
F5IN
BY
SR
S
Carry
&
Control
Logic
CIN
CLK
CE SLICE
CLB Structure
•Each slice has 2 LUT-FF pairs with associated carry logic
•Two 3-state buffers (BUFT) associated with each CLB,
accessible by all CLB outputs

19
COUT
D Q
CK
S
R
EC
D Q
CK
R
EC
O
G4
G3
G2
G1
Look-Up
Table
Carry
&
Control
Logic
O
YB
Y
F4
F3
F2
F1
XB
X
Look-Up
Table
F5IN
BY
SR
S
Carry
&
Control
Logic
CIN
CLK
CE
SLICE
CLB Slice

20
LUT (Look-Up Table) Functionality
•Look-Up tables
are primary
elements for
logic
implementation
•Each LUT can
implement any
function of 4
inputsx
1
x
2
x
3
x
4
y
x
1x
2
y
LUT
x
1
x
2
x
3
x
4
y
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
x
1
x
2
x
3
x
4
y
x
1
x
2
x
3
x
4
y
x
1x
2
y
x
1x
2
y
LUT
x
1
x
2
x
3
x
4
y
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0

21
CLB Slice Structure
•Each slice contains two sets of the
following:
•Four-input LUT
•Any 4-input logic function,
•or 16-bit x 1 sync RAM
•or 16-bit shift register
•Carry & Control
•Fast arithmetic logic
•Multiplier logic
•Multiplexer logic
•Storage element
•Latch or flip-flop
•Set and reset
•True or inverted inputs
•Sync. or async. control

22
LUT (Look-Up Table) Functionality
•Look-Up tables
are primary
elements for
logic
implementation
•Each LUT can
implement any
function of 4
inputsx
1
x
2
x
3
x
4
y
x
1x
2
y
LUT
x
1
x
2
x
3
x
4
y
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
x
1
x
2
x
3
x
4
y
x
1
x
2
x
3
x
4
y
x
1x
2
y
x
1x
2
y
LUT
x
1
x
2
x
3
x
4
y
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
x
1
0
x
2x
3x
4
00
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0

23
RAM16X1S
O
D
WE
WCLK
A0
A1
A2
A3
RAM32X1S
O
D
WE
WCLK
A0
A1
A2
A3
A4
RAM16X2S
O1
D0
WE
WCLK
A0
A1
A2
A3
D1
O0
=
=
LUT
LUT
or
LUT
RAM16X1D
SPO
D
WE
WCLK
A0
A1
A2
A3
DPRA0DPO
DPRA1
DPRA2
DPRA3
or
Distributed RAM
•CLB LUT configurable as
Distributed RAM
•A LUT equals 16x1 RAM
•Implements Single and Dual-
Ports
•Cascade LUTs to increase
RAM size
•Synchronous write
•Synchronous/Asynchronous
read
•Accompanying flip-flops used
for synchronous read

24
DQ
CE
DQ
CE
DQ
CE
DQ
CE
LUT
IN
CE
CLK
DEPTH[3:0]
OUT
LUT=
Shift Register
•Each LUT can be
configured as shift register
•Serial in, serial out
•Dynamically addressable
delay up to 16 cycles
•For programmable
pipeline
•Cascade for greater cycle
delays
•Use CLB flip-flops to add
depth

25
COUT
D Q
CK
S
R
EC
D Q
CK
R
EC
O
G4
G3
G2
G1
Look-Up
Table
Carry
&
Control
Logic
O
YB
Y
F4
F3
F2
F1
XB
X
Look-Up
Table
F5IN
BY
SR
S
Carry
&
Control
Logic
CIN
CLK
CE
SLICE
Carry & Control Logic

26
Each CLB contains separate
logic and routing for the fast
generation of sum & carry
signals
•Increases efficiency and
performance of adders,
subtractors, accumulators,
comparators, and counters
Carry logic is independent of
normal logic and routing
resources
Fast Carry Logic
LSB
MSB
Carry Logic
Routing

27
Block RAM
Spartan-II
True Dual-Port
Block RAM
Port APort B
Block RAM
•Most efficient memory implementation
•Dedicated blocks of memory
•Ideal for most memory requirements
•4 to 14 memory blocks
•4096 bits per blocks
•Use multiple blocks for larger memories
•Builds both single and true dual-port RAMs

28
Spartan-II Block RAM Amounts

29
Block RAM Port Aspect Ratios
0
4095
1
1023
4
0
1047
2
0
511
8
0
255
16
0
4k x 1
2k x 2
1k x 4
512 x 8
256 x 16

30
Basic I/O Block Structure
D
EC
Q
SR
D
EC
Q
SR
D
EC
Q
SR
Three-State
Control
Output Path
Input Path
Three-State
Output
Clock
Set/Reset
Direct Input
Registered
Input
FF Enable
FF Enable
FF Enable

31
IOB Functionality
•IOB provides interface between the
package pins and CLBs
•Each IOB can work as uni-or bi-directional
I/O
•Outputs can be forced into High Impedance
•Inputs and outputs can be registered
•advised for high-performance I/O
•Inputs can be delayed

32
Routing Resources
PSM PSM
CLB
PSM PSM
CLB CLB
CLBCLB CLB
CLBCLB CLB
Programmable
Switch
Matrix

33
Spartan-II FPGA Family Members

34

35
Virtex-II 1.5V Architecture
Configurable
Logic
Block
Block RAMs
I/O
Block
Multipliers 18 x 18
Block RAMs
Multipliers 18 x 18
Block RAMs
Multipliers 18 x 18
Block RAMs
Multipliers 18 x 18

36
Virtex-II 1.5V
Device CLB
Array
SlicesMaximum
I/O
BlockRAM
(18kb)
Multiplier
Blocks
Distributed
RAM bits
XC2V40 8x8 256 88 4 4 8,192
XC2V80 16x8 512 120 8 8 16,384
XC2V250 24x161,536 200 24 24 49,152
XC2V500 32x243,072 264 32 32 98,304
XC2V1000 40x325,120 432 40 40 163,840
XC2V1500 48x407,680 528 48 48 245,760
XC2V2000 56x4810,752 624 56 56 344,064
XC2V3000 64x5614,336 720 96 96 458,752
XC2V4000 80x7223,040 912 120 120 737,280
XC2V6000 96x8833,7921,104 144 144 1,081,344
XC2V8000112x10446,5921,108 168 168 1,490,944

37
Virtex-II Block SelectRAM
•Virtex-II BRAM is 18 kbits
•Additional “parity” bits
available in selected
configurationsWEA
ENA
SSRA
CLKA
ADDRA[# : 0]
DIA[# : 0]
DOA[# : 0]
WEB
ENB
RSTB
CLKB
ADDRB[# : 0]
DIB[# : 0]
DOB[# : 0]
DIPA[# : 0]
DIPA[# : 0]
DOPA[# : 0]
DOPB[# : 0]
WEA
ENA
SSRA
CLKA
ADDRA[# : 0]
DIA[# : 0]
DOA[# : 0]
WEB
ENB
RSTB
CLKB
ADDRB[# : 0]
DIB[# : 0]
DOB[# : 0]
DIPA[# : 0]
DIPA[# : 0]
DOPA[# : 0]
DOPB[# : 0]
WidthDepthAddressData Parity
116,386[13:0] [0] N/A
2 8,192[12:0][1:0] N/A
4 4,096[11:0][3:0] N/A
9 2,048[10:0][7:0] [0]
181,024[9:0][15:0] [1:0]
36 512 [8:0][31:0] [3:0]

38
FPGA Nomenclature

39B0
39McCollum

40B0
40McCollum

41B0
41McCollum

42B0
42McCollum

43B0
43McCollum
LUT
•Add a flip flop and your done.

George Mason University
FPGA Tools

45
Design process (1)
Design and implement a simple unit permitting to
speed up encryption with RC5-similar cipher with
fixed key set on 8031 microcontroller. Unlike in
the experiment 5, this time your unit has to be able
to perform an encryption algorithm by itself,
executing 32 rounds…..
LibraryIEEE;
useieee.std_logic_1164.all;
useieee.std_logic_unsigned.all;
entityRC5_core is
port(
clock, reset, encr_decr: instd_logic;
data_input: instd_logic_vector(31downto0);
data_output: outstd_logic_vector(31downto0);
out_full: instd_logic;
key_input: instd_logic_vector(31downto0);
key_read: outstd_logic;
);
endAES_core;
Specification (Lab Experiments)
VHDL description (Your Source Files)
Functional simulation
Post-synthesis simulation
Synthesis

46
Design process (2)
Implementation
Configuration
Timing simulation
On chip testing

47
Simulation Tools
Many others…

48

49

50
Synthesis Tools
… and others

51
Levels of design description
Algorithmic level
Register Transfer Level
Logic (gate) level
Circuit (transistor) level
Physical (layout) level
Level of description
most suitable for synthesis

52

53
Logic Synthesis
VHDL code VHDL simulator
Library of
standard cells
Speed without routing
Area without routing
Netlist
Design Process for ASICs (1)
Functional verification

54
Placing & routing
Netlist
Library of
standard cells
Area with routing
Speed with routing
Layout
Design Process (2)

55
architecture MLU_DATAFLOW of MLU is
signal A1:STD_LOGIC;
signal B1:STD_LOGIC;
signal Y1:STD_LOGIC;
signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC;
begin
A1<=A when (NEG_A='0') else
not A;
B1<=B when (NEG_B='0') else
not B;
Y<=Y1 when (NEG_Y='0') else
not Y1;
MUX_0<=A1 and B1;
MUX_1<=A1 or B1;
MUX_2<=A1 xor B1;
MUX_3<=A1 xnor B1;
with (L1 & L0) select
Y1<=MUX_0 when "00",
MUX_1 when "01",
MUX_2 when "10",
MUX_3 when others;
end MLU_DATAFLOW;
VHDL description Circuit netlist
Logic Synthesis

56
Features of synthesis tools
•Interpret RTL code
•Produce synthesized circuit netlist in a
standard EDIF format
•Give preliminary performance estimates
•Some can display circuit schematics
corresponding to EDIF netlist

57
Implementation
•After synthesis the entire implementation
process is performed by FPGA vendor
tools

58

59
Translation
Translation
UCF
NGD
EDIF NCF
Native Generic Database file
Constraint Editor
User Constraint File
Native
Constraint
File
Electronic Design
Interchange Format
Circuit netlistTiming Constraints
Synthesis

60
Sample UCF File
•#
•# Constraints generated by Synplify Pro 7.3.3, Build 039R
•#
•# Period Constraints
•#Begin clock constraints
•#End clock constraints
•# Output Constraints
•# Input Constraints
•# Location Constraints
•# End of generated constraints
•NET "clock" LOC = "P88";
•NET "control(0)" LOC = "P50";
•NET "control(1)" LOC = "P48";
•NET "control(2)" LOC = "P42";
•NET "reset" LOC = "P93";
•NET "segments(0)" LOC = "P67";
•NET "segments(1)" LOC = "P39";
•NET "segments(2)" LOC = "P62";
•NET "segments(3)" LOC = "P60";
•NET "segments(4)" LOC = "P46";
•NET "segments(5)" LOC = "P57";
•NET "segments(6)" LOC = "P49";

61
Pin Assignment
LAB2
CLOCK
CONTROL(0)
CONTROL(2)
CONTROL(1)
RESET
SEGMENTS(0)
SEGMENTS(1)
SEGMENTS(2)
SEGMENTS(3)
SEGMENTS(4)
SEGMENTS(5)
SEGMENTS(6)
P39
P42
P46
P48
P49
P50
P57
P60
P62
P67
P88
P93FPGA

62
Constraints Editor

63
Circuit netlist

64
Mapping
LUT2
LUT3
LUT4
LUT5
LUT1
FF1
FF2

65
Placing
CLB SLICES
FPGA

66

67
Routing
Programmable Connections
FPGA

68

69
Static Timing Analyzer
•Performs static analysis of the circuit
performance
•Reports critical paths with all sources of
delays
•Determines maximum clock frequency

70
Static Timing Analysis
•Critical Path –The Longest Path From
Outputs of Registers to Inputs of
Registers
DQ
in
clk
DQ
out
t
P logic
t
Critical= t
P FF+ t
Plogic+ t
S FF

71
Static Timing Analysis
•Min. Clock Period = Length of The
Critical Path
•Max. Clock Frequency = 1 / Min. Clock
Period

72
Configuration
•Once a design is implemented, you must create a
file that the FPGA can understand
•This file is called a bit stream: a BIT file (.bit extension)
•The BIT file can be downloaded directly to the
FPGA, or can be converted into a PROM file
which stores the programming information

73

74
Projects1, 2
Optimization Criteria
Maximum ratio
Throughput / Circuit Area
or
Minimum product
Latency Circuit Area

75

76
Primary timing parameters
Latency Throughput
Circuit
Time to
process
a single block
of data
X
i
Y
i
Number of bits
processed
in a unit of time
Circuit
X
i
X
i+1
X
i+2
Y
i
Y
i+1
Y
i+2
Throughput =
Block_size · Number_of_blocks_processed_simultaneously
Latency
Tags