Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part B
Tasks of Superscalar Processing
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Specific Tasks of Superscalar Processing (I)
Specific tasks of superscalar processiong
Parallel
decoding
Preserving the
sequential
consistency of
execution
Parallel
instruction
execution
Superscalar
instruction issue
Preserving the
sequential
consistency of
exception
processing
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part C
Parallel Decoding
Specific tasks of superscalar processing
Parallel
decoding
Preserving the
sequential
consistency of
execution
Parallel
instruction
execution
Superscalar
instruction issue
Preserving the
sequential
consistency of
exception
processing
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Sequential Decoding vs. Parallel Decoding
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Basic Ideas of Parallel Decoding
°Parallel decoding = Decoding multiple instr. / cycle
°Hardware complexity increases with issue rate.
°Check dependencies w.r.t.
•Instructions currently being executed.
•Instruction candidates to be issued next.
°Multiple instructions decoding in a clock cycle
•Decode-issue path becomes critical for clock frequencies.
°Solutions:
•Multiple pipeline cycles for decoding
-E.g. PowerPC601/604, UltraSPARC: 2 cycles; Alpha 21064: 3
cycles, Pentium Pro: 4.5 cycles
•Predecoding
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Principle of Pre-Decoding
°Part of decode task in loading phase of on-chip
instruction cache.
°Shorten overall decoding time or reduce no. of cycles for
decoding and instruction issue.
°Append a number of decode bits to each instr.
•Instruction class
•Type of resources required for execution
•Calculation of branch address (for some processors)
°CISC processors require more bits for information such
as variable instruction length (e.g. starting/ending of I).
°Extra space is required. E.g. K5 adds 5 extra bits to each
byte.
°Common to most predominant processor lines.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Example of Pre-Decoding
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Pre-decode Bits
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part D
Superscalar Instruction Issue
Specific tasks of superscalar processing
Parallel
decoding
Preserving the
sequential
consistency of
execution
Parallel
instruction
execution
Superscalar
instruction issue
Preserving the
sequential
consistency of
exception
processing
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Design Space
°Issue policyspecifies how dependencies are handled during issue
process.
°Issue ratespecifies the max. no. of instructions a superscalar
processor is able to issue in each cycle.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Design Space of Issue Policies (I)
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Design Space of Issue Policies (II)
°Four main aspects:
•False data dependencies
-E.g. WAR, WAW (note that this is just for registers, not mem.)
-Solution: Register renaming –renaming the destination reg.
That is, the result is written into a dynamically allocated
“spare register” instead of the specified register.
•Unresolved control dependencies
-Solution: Speculative branch processing –A guess about the
outcome of the unresolved conditional branch is made.
•Use of shelving
-Separate issue/dispatch into two stages.
-Handling blockages either directly (with issue window) or by
decoupling (no dependency checking on issue).
•Handling of issue blockages
-Preserving issue order: In-order vs. out-of-order
-Alignment of issue: Aligned vs. unaligned issue
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Principle of Blocking Issue Mode
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Principle of Shelving Shelving
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Design Aspects Related to Handling of Blockages
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Issue Order of Instructions (I)
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Issue Order of Instructions (II)
°In-order:
•A dependent instruction will block the issue of all subsequent
instructions until the dependency is resolved.
°Out-of-order:
•An independent instruction can be issued even if a dependent
instruction is still in the issue window.
•Some processors allow partial out-of-order. E.g. PowerPC 601
issues branches and FP out-of-order; MC 88100 does only for FP
instructions.
°Not many processors employ out-of-orderbecause
•Preserving sequential consistency requires much more efforts.
•Shelving reduces the need for out-of-order.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Aligned Issue of Instructions (I)
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Aligned Issue of Instructions (II)
°Aligned issue:
•No instructions of the next window will be considered as
candidates for issue until all instructions in the current window
have been issued.
°Unaligned issue:
•A gliding window whose width equals the issue rate is employed.
•In every cycle, all instructions in the window are checked for
dependencies. Those independent ones are issued either as in-
order or out-of-order. Then the window will be refilled.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Issue Rate (I)
°Issue rate (or superscalarity) refers to the maximum
number of instructions a superscalar processor can
issue in one cycle.
°Higher issue rate potentially offers higher performance.
The cost is the more complex circuitry. It needs a
balance between the two.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Issue Rate (II)
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part E
Superscalar Instruction Issue: Shelving
Specific tasks of superscalar processing
Parallel
decoding
Preserving the
sequential
consistency of
execution
Parallel
instruction
execution
Superscalar
instruction issue:
Shelving
Preserving the
sequential
consistency of
exception
processing
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Introduction
°Eliminate issue blockages due to dependencies.
°Make use of dedicated instruction buffers, called shelving
buffers in front of EU(s).
°Shelving decouples dependency checking from instruction
issue, and defers it to instr. dispatch.
°Decoded instructions are issued to the shelving buffers
without any checks for data or control dependencies or for
busy EU(s).
°Processors with shelving usually employ in-order, aligned
issue polices, together with register renaming & speculative
conditional branch execution (Only true dependencies can
block instruction execution). (Why in-order, aligned issue?)
°Dependency check will be done during instruction dispatch
phase (from shelving buffer to EU). Dependency free
instructions, with their operands available, will be available
for execution –dataflow principle of operation.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Principle of Straightforward Issue Policy
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Principle of Shelving
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Design Space of Shelving
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part E-1
Design Space Topic of Shelving:
Scope of Shelving
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Scope of Shelving
°Scope of shelvingspecifies whether shelving is restricted to a few
instruction types or is performed for all instructions.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part E-2
Design Space Topic of Shelving:
Layout of Shelving Buffers
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Layout of Shelving Buffers
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part E-2-1
Design Space Topic of Shelving:
Layout of Shelving Buffers
Type of Buffers
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Type of Shelving Buffers (I)
°Standalone buffers are buffers which are used exclusively for
shelving.
°Combined buffersare those with multiple functionalities.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Type of Shelving Buffers (II)
°Standalone using reservation station (RS):
•Individual
-Earliest to be adopted
-In front of each EU
-Size usually small (2-4)
•Group
-Hold instructions for a group of EUs that execute inst. of
the same type
-More reliable
-Large in size (8-16)
-Shelving or dispatching more than one instruction per
cycle
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Type of Shelving Buffers (III)
°Standalone using reservation station (RS) (Cont’d):
•Central
-Most flexible
-Disadvantages:
–Need a word length equal to the longest possible data
word
–Much more complex
–Size about 20
°Combined buffers (reorder buffer ROB) for shelving,
renaming & reordering.
•Expect to be the future trend
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Type of Shelving Buffers (IV)
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Combined Buffer for Shelving, Renaming and Reordering
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part E-2-2
Design Space Topic of Shelving:
Layout of Shelving Buffers
Number of Buffer Entries
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Shelving Buffer Entries in Superscalar Processors
What types of RSs should
be expected?
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part E-2-3
Design Space Topic of Shelving:
Layout of Shelving Buffers
Number of Read/Write Ports
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Number of Read/Write Ports for Shelving Buffers
°Individual reservation stations only need to forward a
single instruction per cycle.
°Group/Central reservation stations need to deliver
multiple instructions per cycle, ideally as many as the
number of EU(s) connected.
°Study the relationship between read/write ports and no.
of shelving buffer entries
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part E-3
Design Space Topic of Shelving:
Operand Fetch Policy
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Types of Operand Fetch Policies (I)
°Two types:
•Issue bound
-Operands fetched during instruction issue.
-Shelving buffers provide entries long enough to hold
source operands.
•Dispatch bound
-Operands fetched during instruction dispatch.
-Shelving buffers contain short register identifiers.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Types of Operand Fetch Policies (II)
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Operand Fetch During Instr. Issue w/ Single Register File
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Operand Fetch During Instr. Dispatch w/ Single Register File
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Policies Comparison of Operand Fetch
°Policy comparison:
•Issue bound
-Register file supplies all operands for all issued instructions.
-Need twice as many read ports in the register file as the max.
issue rate.
-Size of RS is relatively larger.
•Dispatch bound
-No. of read ports should equal to twice the dispatch rate
(Note that max. dispatch rate is usually higher than that of
issue rate, why?).
-Critical decode/issue path is shorter.
-Shelving buffers are relatively less complex.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Issue Bound Operand Fetch with Multiple Register Files
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Dispatch Bound Operand Fetch with Multiple Register Files
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
MFU Shelving Buffer Types & Operand Fetch Policies
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part E-4
Design Space Topic of Shelving:
Instruction Dispatch Scheme
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Design Space of Inst. Dispatch
Instruction dispatch scheme
Dispatch
policy
Treatment
of an empty
reservation
station
Scheme for
checking the
availability
of operands
Dispatch
Rate
°Instruction dispatch involves twp basic tasks: scheduling the
instructions held in a particular RS for execution and disseminating
the scheduled instruction(s) to the allocated EU(s).
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part E-4-1
Design Space Topic of Shelving:
Instruction Dispatch Scheme
Dispatch policy
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Design Space of Dispatch Policy
Dispatch policy
Selection
rule
Arbitration
rule
Dispatch
order
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Consideration of Dispatch Policy (I)
°Dispatch policy specifies how instructions are selected
for execution and how dispatch blockages are handled.
•Selection rule:
-Specify when instructions are considered as executable.
•Arbitration rule:
-Choose a subset of instructions when more instructions are
eligible for execution than can be disseminated in the next
cycle.
-Usually , “older” instructions are preferable than “younger”
ones.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Consideration of Dispatch Policy (II)
°Dispatch policy (Cont’d)
•Dispatch order:
-Will a non-executable instruction block all subsequent
instructions from being dispatched.
-Three types:
–In-order: Simple (only last inst. to be inspected)
–Partially out-of-order (for certain instr. Types)
–Out-of-order
»Complex
»Need to check all instructions in shelving buffer
for executable instructions.
»Expect to be used in group or central RS.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part E-4-2
Design Space Topic of Shelving:
Instruction Dispatch Scheme
Dispatch rate
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Considerations of Dispatch Rate
°Dispatch rate is defined as the no. of instructions that
can be dispatched from each reservation station per
cycle.
°Ideal dispatch rate is one instruction per EU.
°Easier to achieve in individual and group RS.
°Future dispatch rate is expected to get higher because
of less restrictions imposed on data path, ports, and
transistor count.
°Note that very often, max. issue rate is less than max.
dispatch rate.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Multiplicity of Dispatched Instructions
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Max. Issue and Dispatch Rates of Superscalar Proc.
°Study relationship between issue rate and dispatch rate.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part E-4-3
Design Space Topic of Shelving:
Instruction Dispatch Scheme
Checking for Operand Availability
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Intro. to Checking for Operand Availability
°Availability checking is done:
•when operands are fetched from the register file, and
•(during dispatch) if operands of instructions in the shelving
buffers are available.
°Solution: Scoreboard
•Direct check of the scoreboard bits
-RS does not hold any explicit status information indicating if
source operands are available.
-Employed when operands are fetched during inst. dispatch.
•Check of explicit status bit
-Availability is indicated in RS through status bits.
-Employed if operands are fetched during inst. issue.
-Additional associative search needed for value updating in
RS.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Principle of Scoreboarding
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Scheme for Checking Operand Availability
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Use of Multiple Buses for Updating Multiple RSs
°If multiple RSs exists, their updating must be done globally.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Updating RSs in case of Multiple Register Files
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part E-4-4
Design Space Topic of Shelving:
Instruction Dispatch Scheme
Treatment of Empty Reservation Station
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Treatment of Empty Reservation Table
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part E-4-5
Design Space Topic of Shelving:
Instruction Dispatch Scheme
Typical Dispatch Schemes
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Typical Approaches in Dispatching (I)
°Assumptions for typical solutions:
•Register renaming and speculative execution are usually
employed.
•If operands are fetched during instruction dispatch, use direct
checking method.
•If operands are fetched during instruction issue, use explicit
status bits to maintain and check operand availability
•Empty RS is usually bypassed.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Typical Approaches in Dispatching (II)
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part F
Superscalar Instruction Issue: Register Renaming
Specific tasks of superscalar processing
Parallel
decoding
Preserving the
sequential
consistency of
execution
Parallel
instruction
execution
Superscalar
instruction issue:
Register Renaming
Preserving the
sequential
consistency of
exception
processing
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Introduction to Register Renaming
°Standard technique for removing false data
dependencies (i.e. WAR, WAW).
°Always turn instructions to be three-operands by
renaming the destination operand.
°Two implementations:
•Static
-Done by the compiler.
•Dynamic
-Take place in hardware during execution time.
-Require extra circuitry for suppl. register space, additional
data paths and logic.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Implementation of Register Renaming
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Design Space of Register Renaming
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part F-1
Design Space Topic: Register Renaming
Scope of Register Renaming
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Scope of Renaming
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part F-2
Design Space Topic: Register Renaming
Layout of Rename Buffers
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Layout of Rename Buffers
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Types of Rename Buffers
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Architecture of Rename Buffers
°For merged arch. & rename register file:
•A free physical register is allocated to each
destination register specified in an instruction.
•A mapping table is used to track all allocation reg.
pairs.
•Scheme is required to reclaim physical registers no
longer in use.
°For all three other cases, intermediate results are held in
respective rename buffer until their retirement. During
retirement, content of rename buffer will be written back
to architectural register file.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Example of Renaming Architecture Register (I)
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Example of Renaming Architecture Register (II)
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Number of Rename Buffers
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Access Mechanism of Rename Buffers (I)
°Need to access rename buffers because:
•Fetch operands
•Update rename registers
•Deallocate rename registers
°Two distinct mechanisms:
•Associative mechanism
•Indexed access mechanism
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Access Mechanism of Rename Buffers (II)
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part F-3
Design Space Topic: Register Renaming
Operand Fetch Policy
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Operand Fetch Policies of Rename Buffers
°Two policies:
•Rename bound
-Fetch referenced operands during renaming
•Dispatch bound
-Defer operand fetch until dispatching
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part F-4
Design Space Topic: Register Renaming
Rename Rate
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Rename Rate
°Rename rate is the max. number of renames per cycle
that a processor is able to perform.
°To avoid bottlenecks, rename rate is equal to issue rate.
°HW requirements: a large number of ports at register
files and the mapping tables.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part F-5
Design Space Topic: Register Renaming
Most Frequently Used Renaming
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Most Frequently Used Basic Renaming
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part G
Parallel Execution
Specific tasks of superscalar processing
Parallel
decoding
Preserving the
sequential
consistency of
execution
Parallel
instruction
execution
Superscalar
instruction issue
Preserving the
sequential
consistency of
exception
processing
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Concept of Parallel Execution
°Independent of whether instructions are issued or
dispatched in-order or out-of-order, they will generally
be finished in out-of-program-order.
°Three terms:
•“to finish”: operation is completed except for writing back the
result into the architectural register or memory (and status bits).
•“to complete”: the last action of instruction execution (i.e. write
back to arch. registers) is finished.
•“to retire”: write back to arch. registers and delete completed
instruction from ROB (Reorder Buffer).
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part H
Preserving Sequential Consistency
of Instruction Execution
Specific tasks of superscalar processing
Parallel
decoding
Preserving the
sequential
consistency of
execution
Parallel
instruction
execution
Superscalar
instruction issue
Preserving the
sequential
consistency of
exception
processing
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Sequential Consistency (I)
°Two aspects:
•Order in which instructions are completed.
•Order in which memory is accessed due to LD/ST.
°Processor consistency indicates the consistency of
instruction completion with sequential instruction
execution.
°Two possible processor consistencies:
•Weak: instructions are completed out of order, provided that non
data dependencies are scarified.
•Strong: instructions are forced to complete in strict program
order. Usually achieved with ROB.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Sequential Consistency (II)
°Memory consistency indicates whether memory
accesses are performed in the same order as in a
sequential processor.
°Two possible memory access consistencies:
•Weak:memory accesses may be out of order compared with a
strict sequential program execution, provided that data
dependencies must not be violated.
•Strong:memory accesses occur strictly in program order.
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Sequential Consistency (III)
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Sequential Consistency Model
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Concept of Load/Store Reordering
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Principle of Reorder Buffer
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Use of Reorder Buffer in Commercial Processors
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Design Space of Reorder Buffers
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Basic Layout of Reorder Buffers
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Part I
Preserving Sequential Consistency
of Exception Processing
Specific tasks of superscalar processing
Parallel
decoding
Preserving the
sequential
consistency of
execution
Parallel
instruction
execution
Superscalar
instruction issue
Preserving the
sequential
consistency of
exception
processing
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
Sequential Consistency of Exception Processing
Dr. Anil Kumar Lamba (Professor ) Geeta University
Even 2024
END