Module II:
Memory: Organization, Memory segmentation,
Multithreading, Symmetric multiprocessing.
Processor Design flow: Capturing requirements,
Instruction coding, Exploration of architecture
organizations, hardware and software development.
Extreme CISC and extreme RISC ,Very long instruction
word (VLIW),
Module II: B: Processor
Design flow
Extreme CISC and Extreme RISC
Towards CISC
•Wired logic microcode control
▫Temptingly easy extensibility
•Performance tuning
▫HW implementation of some high-level functions
•Marketing
▫Add successful instructions of competitors
▫“New feature” hype
▫Compatibility: only extensions are possible
CISC Problems
•Performance tuning unsuccessful
▫Rarely used high-level instructions
▫Sometimes slower than equivalent sequence
•High complexity
▫Pipelining bottlenecks lower clock rates
▫Interrupt handling can complicate even more
•Marketing
▫Prolonged design time and frequent microcode errors
hurt competitiveness
RISC Features
•Low complexity
▫Generally results in overall speedup
▫Less error-prone implementation by hardwired logic
or simple microcodes
•VLSI implementation advantages
▫Less transistors
▫Extra space: more registers, cache
•Marketing
▫Reduced design time, less errors, and more options
increase competitiveness
RISC vs. CISC misconceptions
•Arguments favoring RISC: simple design,
short design time, speed, price…
•Study of RISC should include
hardware/software tradeoffs, factors
influencing computer performance and
industry-side evaluation.
RISC vs. CISC misconceptions
•Incorrect implication from the two acronyms:
RISC and CISC.
▫They are not bifurcations between which designers
have to choose
•Carelessly leaving out the ‘participation’ of
Operating System
RISC vs. CISC misconceptions
•Reduced design time?
▫academic <-> industrial
•Performance claims of RISC proponent do not
decouple design features like MRSs.
▫MRSs can have a remarkable effect on program
execution
Conclusion – RISC vs. CISC?
•CISC
▫Effectively realizes one particular High Level
Language Computer System in HW - recurring
HW development costs when change needed
•RISC
▫Allows effective realization of any High Level
Language Computer System in SW - recurring SW
development costs when change needed
Conclusion – Optimum?
•Hybrid solutions
▫RISC core & CISC interface
▫Still has specific performance tuning
•Optimal ISA
▫Between RISC & CISC
▫Few, carefully chosen, useful complex instructions
▫Still has complexity handling problems
13
VLIW Processors
VLIW (“very long instruction word”) processors
•instructions are scheduled by the compiler
•a fixed number of operations are formatted as one big
instruction (called a bundle)
usually LIW (3 operations)
change in the instruction set architecture,
i.e., 1 program counter points to 1 bundle (not 1
operation)
•operations in a bundle issue in parallel
fixed format so could decode operations in parallel
enough FUs for types of operations that can issue in
parallel
pipelined Fus
14
VLIW Processors
Goal of the hardware design:
•reduce hardware complexity
•to shorten the cycle time for better performance
•to reduce power requirements
How VLIW designs reduce hardware complexity
•less multiple-issue hardware
no dependence checking for instructions within a bundle
can be fewer paths between instruction issue slots &
FUs
•simpler instruction dispatch
no out-of-order execution, no instruction grouping
•ideally no structural hazard checking logic
•Reduction in hardware complexity affects cycle time &
power consumption
15
VLIW Processors
Compiler support to increase ILP
•compiler creates each VLIW word
•need for good code scheduling greater than with in-order
issue superscalars
instruction doesn’t issue if 1 operation can’t
•techniques for increasing ILP
loop unrolling
software pipelining (schedules instructions from different
iterations together)
aggressive in lining (function becomes part of the caller
code)
trace scheduling (schedule beyond basic block
boundaries)
16
VLIW Processors
More compiler support to increase ILP
•detects hazards & hides latencies
structural hazards
•no 2 operations to the same functional unit
•no 2 operations to the same memory bank
hiding latencies
•data prefetching
•hoisting loads above stores
data hazards
•no data hazards among instructions in a bundle
control hazards
•predicated execution
•static branch prediction
17
Superscalars vs. VLIW
Superscalar has more complex hardware for instruction
scheduling
•instruction slotting or out-of-order hardware
•more paths between instruction issue structure & functional units
•possible consequences:
slower cycle times
more chip real estate
more power consumption
but VLIW has more functional units if supports full predication
•possible consequences:
slower cycle times
more chip real estate
more power consumption