3. Single Cycle Data Path in computer architecture
560 views
32 slides
Feb 13, 2024
Slide 1 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
About This Presentation
Single Cycle Data Path in computer architecture
Size: 934.65 KB
Language: en
Added: Feb 13, 2024
Slides: 32 pages
Slide Content
11/17/2019 1
A. Computer Architecture
Single Cycle Datapath
11/17/2019 2
The CPU
•Processor(CPU): the active part of the computer,
which does all the work (data manipulation and
decision-making)
–Datapath: portion of the processor which contains
hardware necessary to perform all operations
required by the computer
–Control: portion of the processor (also in
hardware) which tells the datapath what needs to
be done (the brain)
11/17/2019 3
The Processor: Datapath & Control
11/17/2019 4
Abstract View of the DataPath
•The data path contains 2 types of logic elements:
–Combinational: Elements that operate on data values. Their
outputs depend on their inputs. The ALU is an combinational
element.
–State: Elements with internal storage. Their state is defined
by the values they contain (memory and registers).Registers
Register #
Data
Register #
Data
memory
Address
Data
Register #
PC Instruction ALU
Instruction
memory
Address
11/17/2019 5
Clocking Methodology
11/17/2019 6
Our Implementation
11/17/2019 7
Clocking Methodology
Registers
Register
ALU
Read
Write
11/17/2019 8
Clocking Methodology
Read
Write
11/17/2019 9
Instruction Datapath
Instruction
Memory
Read address
Instruction
PC
Add
4
•Instructions will be held in
the instruction memory
•The instruction to fetch is at
the location specified by the
PC
–Instr. = M[PC]
Note: Regular instruction width
(32 for MIPS) makes this easy
•After we fetch one
instruction, the PC must be
incremented to the next
instruction
•All instructions are 4 bytes
•PC = PC + 4
11/17/2019 10
R-type Instruction Datapath
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Result
Zero
ALU
Instruction
•R-type Instructions have three registers
–Two read (Rs, Rt) to provide data to the ALU
–One write (Rd) to receive data from the ALU
•We’ll need to specify the operation to the ALU (later...)
•We might be interested if the result of the ALU is zero (later...)
Read reg num A
11/17/2019 11
Memory Operations
Data Memory
Read address
Write address
Write data
Read data
Result
Zero
sign
extend
16 32
•Memory operations first need to compute the effective address
–LW $t1, 450($s3) # E.A. = 450 + $s3
–Add together one register and 16 bits of immediate data
–Immediate data needs to be converted from 16-bit to 32-bit
•Memory then performs load or store using destination register
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Instruction
11/17/2019 12
Branches
Add
Result
Sh.
Left
2
Result
Zero
sign
extend
16 32
PC + 4
To control
logic
Instruction
•Branches conditionally
change the next instruction
–BEQ $2, $1, 42
–The offset is specified as
the number of words to
be added to the next
instruction (PC+4)
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
•Control logic has to decide if
the branch is taken
–Uses ‘zero’ output of ALU
•Take offset, multiply by 4
–Shift left two
•Add this to PC+4 (from PC
logic)
offset
11/17/2019 13
Integrating the R-types and Memory
•R-types and Load/Stores are similar in many respects
•Differences:
–2nd ALU source: R-types use register, I-types use
Immediate
–Write Data: R-types use ALU result, I-types use memory
•Mux the conflicting datapaths together
Data Memory
Read address
Write address
Write data
Read data
Result
Zero
sign
extend
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Instruction
0
1
1
0
Memory
Datapath
11/17/2019 14
Adding the instruction memory
Instruction
Memory
Add
4
Read address
Instruction [31-0]
Result
PC
Simply add the instruction memory
and PC to the beginning of the datapath.
Data Memory
Read address
Write address
Write data
Read data
Result
Zero
1
00
1
sign
extend
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
11/17/2019 15
Adding the Branch Datapath
Instruction
Memory
Add
4
Read address
Instruction [31-0]
Result
PC
Data Memory
Read address
Write address
Write data
Read data
Result
Zero
1
00
1
sign
extend
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Add
Result
Sh.
Left
2
0
1
Now we have the datapath for R-type, I-type, and branch instructions.
On to the control logic!
11/17/2019 16
When does everything happen?
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
00
1
sign
extend
PC
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Combinational Logic:
Just does it! Outputs are
always just a function of its
inputs (with some delay)
Registers: Written at the end of the clock cycle.
(Rising edge triggered).
clk
clk
clk
Single-Cycle Design
11/17/2019 17
What do we need to control?
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
00
1
sign
extend
PC
16 32
ALU -
What is the
Operation?
Memory-
Read/Write/neither?
Mux -are we
branching or not?
Mux -Where
does 2nd ALU
operand come
from?
Registers-
Should we
write data? Mux -Result from
ALU or Memory?
Almost all of the information we need is in the instruction!
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
11/17/2019 18
The ALU
•The ALU is stuck right in the middle of everything...
•It must:
–Add, Subtract, And, or Or for arithmetic instructions
–Subtract for a branch on equal
–Subtract and set for a SLT
–Add for a memory access
0
1
A
Operation
Result
+
2
B
Carry
In
Carry
Out
0
1
BInvert
3Less
Function BInvertOpCarry
inResult
And 0 000 R = A • B
Or 0 010 R = A B
Add 0 100 R = A + B
Subtract1 101 R = A -B
SLT 1 111 R = 1 if A < B
0 if A B
Always the same: Combine into one signal called “sub”
11/17/2019 19
Setting the ALU controls
•The instruction Opcode and Function give us the info we need
–For R-type instructions, Opcode is zero, function code
determines ALU controls
InstructionOpcodeALUOpFunct. CodeALU actionALU control
subop
add R-type10 100000 add 010
sub R-type10 100010 subtract 110
and R-type10 100100 and 000
or R-type10 100101 or 001
SLT R-type10 101010 SLT 111
New control signal: ALUOpis 00 for memory, 01 for Branch, and 10 for R-type
–For I-type instructions, Opcode determines ALU controls
load wordLW 00 xxxxxx add 010
store wordSW 00 xxxxxx add 010
branch equalBEQ 01 xxxxxx subtract 110
11/17/2019 20
Decoding the Instruction -Data
The instruction holds the key to all of the data signals
Write
reg./
Read
reg. B
R-type
Memory,
Branch
Opcode RS RT RDShAmtFunction
31-2625-2120-1615-1110-6 5-0
Opcode RS RT Immediate Data
31-2625-2120-16 15-0
To ctrl
logic
Read
reg. A
Memory address or Branch Offset
To ctrl
logic
Read
reg. A
Read
reg. B
Write
reg.
To ALU
Control
Not
Used
One problem -Write register number must come from two different places.
11/17/2019 21
Instruction Decoding
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
00
1
sign
extend
PC
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Imm:
[15-0]
Rs:[25-21]
Rt:[20-16]
Rd:
[15-11]
Op:[31-26]
Ctrl
Read Reg A: Rs
Read Reg B: Rt
Write Reg: Either Rd or Rt
Immediate Data: [15-0]
Opcode: [31-26]
0
1
We can decode the data simply
by dividing up the instruction bus
11/17/2019 22
Control Signals
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
00
1
sign
extend
PC
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
ALU
Ctrl
6
ALUOp
ALU Control -A function of:ALUOpand the function code
RegWrite
MemToReg
MemWrite
MemRead
ALUSrc
PCSrc
Load
Store
Load
Memory
Load,R-type
BEQ and zero
00: Memory
01: Branch
10: R-type
0
1
Ctrl
Imm:
[15-0]
Rs:[25-21]
Rt:[20-16]
Rd:
[15-11]
Op:[31-26]
FC:[5-0]
RegDest
R-type
11/17/2019 23
Inside the control oval
RegALUMem RegMemMem
InstructionOpcodeWriteSrcTo RegDestReadWritePCSrcALUOp
•This control logic can be decoded in several ways:
–Random logic, PLA, PAL
•Just build hardware that looks for the 4 opcodes
–For each opcode, assert the appropriate signals
Note: BEQ must also check the zerooutput of the ALU...
BEQ 0001000 0x x 0 0 1 01
R-format0000001 00 1 0 0 0 10
LW 1000111 11 0 1 0 0 00
SW 1010110 1x x 0 1 0 00
0:Rt
1:Rd
0:Reg
1:Imm
1:Mem
0:ALU
1:Branch
00:Mem
01:Branch
10:R-type
11/17/2019 24
Control Unit Implementation
11/17/2019 25
Control Signals
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
00
1
sign
extend
PC
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
ALU
Ctrl
6
ALUOp
RegWrite
MemToReg
MemWrite
MemRead
ALUSrc
PCSrc
0
1
Ctrl
Imm:
[15-0]
Rs:[25-21]
Rt:[20-16]
Rd:
[15-11]
Op:[31-26]
FC:[5-0]
RegDest
BEQ
ReadWrite
We must AND
BEQ and Zero
11/17/2019 26
Jumping
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
00
1
sign
extend
PC
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
ALU
Ctrl
6
ALUOp
RegWrite
MemToReg
MemWrite
MemRead
ALUSrc
PCSrc
0
1
Ctrl
Imm:
[15-0]
Rs:[25-21]
Rt:[20-16]
Rd:
[15-11]
Op:[31-26]
FC:[5-0]
RegDest
BEQ
ReadWrite
1
0
Sh.
Left
2
J:[25-0]
Concat.
26
4
32
28
[31-28]
Jump
11/17/2019 27
Complete Control
11/17/2019 28
Operation of the Datapath
•Let's see the stages of execution of a R-type instruction
add $t1,$t2,$t3:
1. An instruction is fetched from memory, the PC is incremented
2. Two registers $t2and $t3are read from the register file.
3. The ALU operates on the data read from the register file.
4. The results of the ALU is written into the register$t1.
•Let's look at lw $t1,offset($t2)
1. An instruction is fetched from memory, the PC is incremented
2. The register $t2is read from the register file.
3. The ALU computes the sum of $t2and the sign-extended offset.
4. The sum from the ALU is used as the address for the data memory.
5. The data from memory is written into register $t1.
11/17/2019 29
Performance of Single-Cycle
Machines
•Let's assume that the operation time for the following units is:
Memory -2 nanoseconds (ns), ALU and adders -2 ns, Register
file -1 ns. We will assume that MUXs, control, sign-extension,
PC accesses, and wires have no delays.
•Which implementation is faster?
1. Every instruction operates in 1 clock cycle of fixed length.
2. Every instruction operates in a varying length clock cycle.
•Lets look at the time needed by each instruction:
Inst. Fetch Reg. Rd ALU op Memory Reg. Wr Total
R-Type
Load
Store
Branch
Jump
11/17/2019 30
Performance of Single-Cycle
Machines
•Let's assume that the operation time for the following units is:
Memory -2 nanoseconds (ns), ALU and adders -2 ns, Register
file -1 ns. We will assume that MUXs, control, sign-extension,
PC accesses, and wires have no delays.
•Which implementation is faster?
1. Every instruction operates in 1 clock cycle of fixed length.
2. Every instruction operates in a varying length clock cycle.
•Lets look at the time needed by each instruction:
Inst. Fetch Reg. Rd ALU op Memory Reg. Wr Total
R-Type 2 1 2 0 1 6 ns
Load 2 1 2 2 1 8 ns
Store 2 1 2 2 7 ns
Branch 2 1 2 5 ns
Jump 2 2 ns
11/17/2019 31
Fixed vs. Variable Cycle Length
•Lets Assume a program has the following instruction mix: 24%
loads, 12% stores, 44% R-type, 18% branchs, 2% jumps.
•For the fixed cycle length the cycle time is 8 ns, long enough for
the longest instruction (load). Thus each instruction takes 8 ns
to execute.
•For the variable cycle time the average CPU clock cycle is:
8*24% + 7*12% + 6*44% + 5*18% + 2*2% = 6.3 ns
•It is obvious that the variable clock implementation is faster but
it is extremely hard to implement.
•Variable clock implementation is 8/6.3 = 1.27 times faster
•When adding instructions such as multiply and divide which can
take tens of cycles this scheme is too slow.
11/17/2019 32
Observations on the Single Cycle
Design
•The single-cycle datapath is straightforward, but...
–It has to use 3 separate ALU’s
–It has separate Instruction and Data memories
–Cycle time is determined by worst-case path
•A multi-cycle datapath might be better
–We can reuse some of the hardware
–We can combine the memories
–Cycle time is still constant, but instructions may
take differing numbers of cycles