ADNSU Computer Architecture Topic Presentation.pptx

KamranGasanov1 4 views 15 slides Mar 10, 2025
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

Procrastination is something we all experience at some point—whether it's delaying an assignment, postponing an important task, or avoiding a difficult conversation. While it may seem like a simple issue of laziness or poor time management, procrastination is actually a complex psychological b...


Slide Content

İnstruction Pipelining structure for modern processors

Introduction

Pipelining With increasing the number of pipeline stages will be increased the pipeline intensity With increasing the pipeline intensity may be used very high core frequence in CPU Input stream of arithmetic pipeline consist of directed to execute RISC instructions Input stream of instruction pipeline consist of fetched from memory CISC instructions For supporting high speed of instruction pipeline intermediate stream between stages could not be interrupted (continual) For providing continual instructions stream at early stages of pipeline instruction fethch block must be has very small time duration

Pipelining The main reason of inserting the large instruction L2 Cache is to satisfy the low instruction fetch time on pipeline Was implemented 1MB L2 cache in In Zen 4 CPU CISC instructions are fetched from L2 Cache with large width Cache lines that provides fetching large number of bytes of instruction at each access time The often arising of conditional branche operations in source program causes the speed down of instruction pipeline

Branch Prediction The greatest difficulty in providing these requirements is happened when is processed on the pipeline the conditional jump operations. Because only after calculation the condition that is given in the operation can be defined address of next instruction or branch . Therefore for early defining the jump address in condition jump operations on the pipeline is used special stage called “Branch prediction ”.

Branch Prediction The calculation of given condition is implemented at the same with starting branch prediction process Result of prediction process may be " true"or "false" Predicted branch is"true " when calculation of condition confirms the predicted address,else Predicted branch is "false". The jump address that was defined by prediction block is directly used for fetching the next branch from instruction cache memory . Instructions that are existing on this branch begin to process on the following stages of pipeline Last results of instructions execution are saved on the temperary (physical ) registers of CPU and can be loaded to logic registers when branch prediction is true

Branch and decode Stage For increasing the latency of very long pipeline caused by accidental branch operations operands are saved in L1 Data Cache befor they might be needed At the deocder stage of instruction pipeline in superscalar microarchitecture single CISC instructions stream converted to multiple streams of RISC instructions Recent AMD and Intel high performance CPUs implement an op cache that remembers decoder output and functions like a L0 instruction cache. Compared to the traditional L1i fetch and decode path, the op cache provides higher bandwidth while saving power by allowing the decoders to idle. Op Cache

Branch and decode Stage In this microarchitecture there are 4 3 decoders for simple instructions ,one decoder for complex instructions that work simultaneously For decoding more complex instructions is used Microinstructions ROM The probability of correct choosing the truth branch depends on entries numbers of special table that named as branch target buffer (BTB) BTB saves information about results of previous branch operations Branch target buffer (BTB)

İnstruction Shedder For decreasing data dependency between simultaneously treated on pipeline instructions is used Register Rename block that converts each logic register to some copies of physical registers that will be needed for simultaneously exeting instructions Logic registers defined by user in source program replaced by corresponding physical registers that service as same logic register for different instructions Register Rename block consist of physical register files for FLP and digital operations on separatly Because the number of physical registers is very large than logical ones they can be used for 64 bits operations too Instruction shedder temporary saves multiple RISC instructions and directs they to corresponding arithmetic pipelines when it is possible

İnstruction Shedder Instruction shedder in Prescott microarchitecture can save three RISC instructions with its operands in corresponding physical register files There are four ALU, three AGU (address generation Units) for operands and some FLP Units for FLP and MMX operations Each of these executing units is internally pipelined and works simultaneously as different arithmetic pipelines

İnstruction Shedder The integer unit schedulers can accept up to six micro-ops per cycle, which feed into the 224-entry reorder buffer (up from 192). The Integer unit technically has seven execution ports, comprised of four ALUs (arithmetic logic units) and three AGUs (address generation units).

Load/Store The three AGUs feed into the load/store unit that can support two 256-bit reads and one 256-bit write per cycle. Not all the three AGUs are equal, judging by the diagram above: AGU2 can only manage stores, whereas AGU0 and AGU1 can do both loads and stores. The store queue has increased from 44 to 48 entries, and the TLBs for the data cache have also increased. The key metric here though is the load/store bandwidth, as the core can now support 32 bytes per clock, up from 16.