ARM PROCESSING BASICS PPT FOR 4TH SEM ENGINEERING

prajwalshivaiah 32 views 63 slides Jul 12, 2024
Slide 1
Slide 1 of 63
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63

About This Presentation

ARMPP


Slide Content

Unit - I ARM Processor Fundamentals

Introduction ARM processors are a family of central processing units (CPUs) based on a reduced instruction set computer ( RISC ) architecture. ARM stands for Advanced RISC Machine . History of ARM processors: x86 is an older architectural approach (CISC - complex instruction set computer) , first x86 CPU design launched in 1978. “ M icrocomputers" (PCs) evolved  high performance and a smaller design became a challenge. Early 1980s, Acorn Computers designed microcomputers  performance limitations with chip design. Around 1981, University of California, Berkeley project  resource usage with computer chips. Processing units have certain predefined operations collectively called instruction sets. Most programs used only a small subset of the instruction set. Reducing the number of predefined instructions—cutting out complex and hard to implement (and little used) instructions —the remaining simple instructions would run faster and take up much less power and space on the chip.  RISC Dr. Shachi P, Dept. of ECE, BMSCE 2

History of ARM processors X86(Intel) design have a modular approach based on a motherboard with swappable components. The CPU and other components—such as graphics cards and GPUs, memory controllers, storage, or processing cores—are optimized for specific functions and can be easily swapped out or expanded . However, these hardware components are typically more homogenized system architectures, which can allow hackers to quickly breach and attack systems with "write once, run anywhere" exploits. In ARM-based processor , CPU cores and other hardware functions (like I/O bus controllers such as peripheral component interconnect) are on the same physical platform, and all of the different functions are integrated together through an internal bus.  SoC x86 chips are designed to optimize performance ; ARM-based processors are designed to balance cost with smaller sizes, lower power consumption, lower heat generation, speed, and potentially longer battery life. Dr. Shachi P, Dept. of ECE, BMSCE 3

ARM Partnership Model Dr. Shachi P, Dept. of ECE, BMSCE 4

Microprocessors vs. Microcontrollers Microprocessors Microcontrollers A silicon chip representing a Central Processing Unit(CPU), which is capable of performing arithmetic as well as logical operations according to a pre-defined set of Instructions. A microcontroller is a highly integrated chip that contains a CPU, RAM, Special and General purpose Register Arrays, On Chip ROM/FLASH memory for program storage, Timer and Interrupt control units and dedicated I/O ports It is a dependent unit. It requires the combination of other chips like Timers, Program and data memory chips, Interrupt controllers etc. for functioning. It is a self contained unit and doesn’t require external Interrupt Controller, Timer, and UART etc. for its functioning. Most of the time general purpose in design and operation. Mostly application oriented or domain specific. Doesn’t contain a built in I/O port. The I/O Port functionality needs to be implemented with the help of external Programmable Peripheral Interface Chips Most of the processors contain multiple built-in I/O ports which can be operated as a single 8 or 16 or 32 bit Port or as individual port pins. T a r ge t ed f o r hig h end m a r k e t whe r e performance is important. T a r ge t ed f o r em b e d d ed m a r k e t whe r e performance is not so critical. Limited power saving options. Includes lot of power saving features Dr. Shachi P, Dept. of ECE, BMSCE 5

RISC V/S CISC Processors/Controllers: RISC Processors/Controllers CISC Processors/Controllers Lesser no. of instructions. Greater no. of Instructions. In s t r uc t io n P i p e lin i n g a n d i n c r ea s ed execution speed. Generally no instruction pipelining feature. Operations are performed on registers only, the only memory operations are load and store Operations are performed on registers or memory depending on the instruction Large number of registers are available Limited no. of general purpose registers Programmer needs to write more code to execute a task since the instructions are simpler ones. A p r og r amm e r c an ac h i ev e the d e si r ed functionality with a single instruction. Single, Fixed length Instructions. Variable length Instructions. Less Silicon usage and pin count. More silicon usage. With Harvard Architecture Harvard or Von-Neumann Architecture Dr. Shachi P, Dept. of ECE, BMSCE 6

CISC vs. RISC - Instruction set The terms CISC and RISC refer to design principles and techniques. RISC: Reduced instruction set computers Simple instructions require a small number of basic steps to execute. For a processor that has only simple instructions, a large number of instructions may be needed to perform a given programming task. This could lead to a large value of N and a small value for S. 'N' is the total number of steps required to complete program execution . 'S' is the average number of basic steps each instruction execution requires. It is much easier to implement efficient pipelining in processors with simple instruction sets. CISC: Complex instruction set computers Complex instructions involve a large number of steps. If individual instructions perform more complex operations, fewer instructions will be needed, leading to a lower value of N and a larger value of S. Complex instructions combined with pipelining would achieve good performance. Dr. Shachi P, Dept. of ECE, BMSCE 7

Harvard V/s Von-Neumann Processor/Controller Architecture Harvard Architecture(ARM) Von-Neumann Architecture(x86) Microprocessors/controllers based on the Harvard architecture will have separate data bus and instruction bus. This allows the data transfer and program fetching to occur simultaneously on both buses. Microprocessors/controllers based on the Von-Neumann architecture shares a single bus for fetching both instructions and data. Program instructions & data are stored in a common main memory. Separate buses for Instruction and Data fetching. Single shared bus for Instruction and Data fetching. Easier to Pipeline, so high performance can be achieved. Low performance Compared to Harvard Architecture. Comparatively high cost. Cheaper. No memory alignment problems Allows self modifying codes Dr. Shachi P, Dept. of ECE, BMSCE 8

Unit - I Basic Structure of computers- Von Neumann and Harvard Architecture, Basic Processing Unit, Bus Structure, RISC and CISC Architecture, RISC and ARM Design philosophy, ARM core Dataflow model, programming model, processor states and operating modes, ARM pipeline. Dr. Shachi P, Dept. of ECE, BMSCE 9

Computer Types Since their introduction in the 1940s, digital computers have evolved into many different types that vary widely in size, cost, computational power, and intended use. Modern computers can be divided roughly into four general categories: 1. Embedded computers are integrated into a larger device or system in order to automatically monitor and control a physical process or environment. They are used for a specific purpose rather than for general processing tasks. Ex: industrial and home automation, appliances, telecommunication products, and vehicles Dr. Shachi P, Dept. of ECE, BMSCE 10

Computer Types 2. Personal computers have achieved widespread use in homes, educational institutions, and business and engineering office settings, primarily for dedicated individual use. They support a variety of applications such as general computation, document preparation, computer-aided design, audio visual entertainment, interpersonal communication, and Internet browsing. A number of classifications are used for personal computers. Desktop computers serve general needs and fit within a typical personal workspace. Workstation computers offer higher computational capacity and more powerful graphical display capabilities for engineering and scientific work. Portable and Notebook computers provide the basic features of a personal computer in a smaller lightweight package. They can operate on batteries to provide mobility Dr. Shachi P, Dept. of ECE, BMSCE 11

Computer Types 3. Servers and Enterprise systems are large computers that are meant to be shared by a potentially large number of users who access them from some form of personal computer over a public or private network. Such computers may host large databases and provide information processing for a government agency or a commercial organization. Supercomputers and Grid computers normally offer the highest performance. They are the most expensive and physically the largest category of computers. Supercomputers are used for the highly demanding computations needed in weather forecasting, engineering design and simulation, and scientific work. Dr. Shachi P, Dept. of ECE, BMSCE 12

Functional Units of a Computer Computer consists of five functionally independent main parts: Input Memory Arithmetic and logic Output Control units Dr. Shachi P, Dept. of ECE, BMSCE 13

Functional Units of a Computer The input unit accepts coded information from human operators using devices such as keyboards, or from other computers over digital communication lines. The information received is stored in the computer’s memory, either for later use or to be processed immediately by the arithmetic and logic unit. The processing steps are specified by a program that is also stored in the memory. Finally, the results are sent back to the outside world through the output unit. All of these actions are coordinated by the control unit. An interconnection network provides the means for the functional units to exchange information and coordinate their actions. Dr. Shachi P, Dept. of ECE, BMSCE 14

The information handled by a computer is categorize as either instructions or data . Instructions, or machine instructions, are explicit commands that Govern the transfer of information within a computer as well as between the computer and its I/O devices. Specify the arithmetic and logic operations to be performed A program is a list of instructions which performs a task. Programs are stored in the memory. The processor fetches the program instructions from the memory, one after another, and performs the desired operations. The computer is controlled by the stored program, except for possible external interruption by an operator or by I/O devices connected to it. Data are numbers and characters that are used as operands by the instructions. Data are also stored in the memory. The instructions and data handled by a computer must be encoded in a suitable format. Dr. Shachi P, Dept. of ECE, BMSCE 15 Functional Units of a Computer

Memory Unit The function of the memory unit is to store programs and data . There are two classes of storage, called primary and secondary . Primary Memory Primary memory ( main memory ) is a fast memory that operates at electronic speeds. Programs must be stored in this memory while they are being executed. Semiconductor storage cells, each capable of storing one bit of information. These cells are handled in groups of fixed size called words . The memory is organized  one word can be stored / retrieved in one basic operation. N umber of bits in each word  word length (typically 16, 32, or 64 bits). Dr. Shachi P, Dept. of ECE, BMSCE 16 Functional Units of a Computer

To provide easy access to any word in the memory, a distinct address is associated with each word location. Addresses are consecutive numbers, starting from 0, that identify successive locations. A memory in which any location can be accessed in a short and fixed amount of time after specifying its address is called a random-access memory (RAM). The time required to access one word is called the memory access time . This time is independent of the location of the word being accessed. It typically ranges from a few nanoseconds (ns) to about 100 ns for current RAM units. Dr. Shachi P, Dept. of ECE, BMSCE 17 Functional Units of a Computer

Cache Memory As an adjunct to the main memory, a smaller, faster RAM unit, called a cache , is used to hold sections of a program that are currently being executed, along with any associated data . The cache is tightly coupled with the processor and is usually contained on the same integrated-circuit chip. The purpose of the cache is to facilitate high instruction execution rates . As execution proceeds, instructions are fetched into the processor chip, and a copy of each is placed in the cache. If the required data located in the main memory, the data are fetched and copies are also placed in the cache. Dr. Shachi P, Dept. of ECE, BMSCE 18 Functional Units of a Computer

Secondary Storage Although primary memory is essential, it tends to be expensive and does not retain information when power is turned off . Thus additional, less expensive, permanent secondary storage is used when large amounts of data and many programs have to be stored, particularly for information that is accessed infrequently . Access times for secondary storage are longer than for primary memory. Examples: magnetic disks , optical disks (DVD and CD), and flash memory devices . https://www.youtube.com/watch?v=7J7X7aZvMXQ (up to 3:17seconds) Dr. Shachi P, Dept. of ECE, BMSCE 19 Functional Units of a Computer

When operands are brought into the processor, they are stored in high-speed storage elements called registers . Each register can store one word of data. Access times to registers are even shorter than access times to the cache unit on the processor chip. Control Unit Control circuits are responsible for generating the timing signals that govern the transfers and determine when a given action is to take place. Data transfers between the processor and the memory are also managed by the control unit through timing signals. Dr. Shachi P, Dept. of ECE, BMSCE 20 Functional Units of a Computer

The Basic Operational Concepts of a Computer To perform a given task an appropriate program consisting of a list of instructions is stored in the memory. Individual instructions are brought from the memory into the processor, which executes the specified operations. Data to be stored are also stored in the memory. Examples: - Add LOCA, R0 This instruction adds the operand at memory location LOCA, to operand in register R0 & places the sum into register. This instruction requires the performance of several steps, First the instruction is fetched from the memory into the processor. The operand at LOCA is fetched and added to the contents of R0 Finally the resulting sum is stored in the register R0 The preceding add instruction combines a memory access operation with an ALU Operations . In some other type of computers, these two types of operations are performed by separate instructions for performance reasons. Load LOCA, R1 Add R1, R0 Transfers between the memory and the processor are started by sending the address of the memory location to be accessed to the memory unit and issuing the appropriate control signals. The data are then transferred to or from the memory. Dr. Shachi P, Dept. of ECE, BMSCE 21

Dr. Shachi P, Dept. of ECE, BMSCE 22 Connections between the processor and the memory Besides IR and PC, there are n-general purpose registers R0 through Rn-1. Memory Address Register (MAR) : It holds the address of the location to be accessed. Memory Data Register (MDR): It contains the data to be written into or read out of the address location. The instruction register (IR) Holds the instructions that is currently being executed. Its output is available for the control circuits which generates the timing signals that control the various processing elements in one execution of instruction. The program counter PC: This is another specialized register that keeps track of execution of a program. It contains the memory address of the next instruction to be fetched and executed.

Operating steps for Program execution Execution of the program (stored in memory) starts when the PC is set to point to the first instruction of the program. The contents of the PC are transferred to the MAR and a Read control signal is sent to the memory. The addressed word is read out of the memory and loaded into the MDR . Next, the contents of the MDR are transferred to the IR . At this point, the instruction is ready to be decoded and executed. If the instruction involves an operation to be performed by the ALU, it is necessary to obtain the required operands. If an operand resides in memory (it could also be in a general purpose register in the processor), it has to be fetched by sending its address to the MAR and initiating a Read cycle . When the operand has been read from the memory into the MDR , it is transferred from the MDR to ALU . After one or more operands are fetched in this way, the ALU can perform the desired operation. If the result of the operation is to be stored in the memory, then the result is entered in to the MDR . Dr. Shachi P, Dept. of ECE, BMSCE 23

The address of the location where the result is to be stored is sent to the MAR , and a write cycle is initiated. At some point during the execution of the current instruction, the contents of the PC are incremented so that the PC points to the next instruction to be executed. Thus, as soon as the execution of the current instruction is completed, a new instruction fetch may be started. In addition to transferring data between the memory and the processor, the computer accepts data from input devices and sends data to output devices. Thus, some machine instructions with the ability to handle I/O transfers are provided. Normal execution of a program may be preempted (temporarily interrupted) if some devices require urgent servicing, to do this one device raises an Interrupt signal. An interrupt is a request signal from an I/O device for service by the processor. The processor provides the requested service by executing an appropriate interrupt service routine . The Diversion may change the internal state of the processor. I ts state must be saved in the memory location before interruption. When the interrupt-routine service is completed the state of the processor is restored so that the interrupted program may continue. Dr. Shachi P, Dept. of ECE, BMSCE 24

Bus Structures BUS : A group of lines(wires) that serves as a connecting path for several devices of a computer is called a bus. The following are different types of busses: 1. Address Bus 2. Data Bus 3. Control Bus The Data bus Carries(transfer) data from one component (source) to other component (destination) connected to it. The data bus consists of 8, 16, 32 or more parallel signal lines. The data bus lines are bi-directional i. e., CPU can read data on these lines from memory or from a port, as well as send data out on these lines to a memory location. The Address bus is the set of lines that carry(transfer) address information about to which memory address, the data is to be transferred to or from. It is an unidirectional bus. The address bus consists of 16, 20, 24 or more parallel signal lines. On these lines CPU sends out the address of the memory location. The Control Bus carries the Control and timing information . Dr. Shachi P, Dept. of ECE, BMSCE 25

Bus Structures F ollowing are the other types of busses. System Bus: A System Bus is usually a combination of address bus, data bus, and control bus . Internal Bus: The bus that operates only with the internal circuitry of the CPU . External Bus: Buses which connects computer to external devices I/O Bus: The bus used by I/O devices to communicate with the CPU Synchronous Bus: While using Synchronous bus, data transmission between source and destination units takes place in a given timeslot which is already known to these units. Dr. Shachi P, Dept. of ECE, BMSCE 26

Bus Structures Asynchronous Bus: In this case the data transmission is governed by a special concept. That is handshaking control signals. Handshaking (either software codes or hardware signals) is used to halt transmission of data from the sending computer until the receiving computer has emptied the buffer. Handshaking is a I/O control method to synchronize I/O devices with the microprocessor. As many I/O devices accepts or release information at a much slower rate than the microprocessor, this method is used to control the microprocessor to work with a I/O device at the I/O devices data transfer rate. Dr. Shachi P, Dept. of ECE, BMSCE 27

The Bus interconnection Scheme Dr. Shachi P, Dept. of ECE, BMSCE 28 Bus is a connecting path for several devices of a computer In addition to the lines that carry the data, the bus must have lines for address and control purposes.

Single bus structure The simplest way to interconnect functional units is to use a single bus, as shown below. All units are connected to this bus. The bus can be used for only one transfer at a time. Bus control lines are used to arbitrate multiple requests for use of the bus. ADVANTAGE Low - cost and its flexibility for attaching peripheral devices DISADVANTAGE Low - performance because at time only one transfer Scalability : As computer systems become more complex and require higher bandwidth for data transfer, a single bus structure may struggle to scale efficiently. Contention : Contention for the bus can occur when multiple components attempt to access it simultaneously, leading to delays and potential performance issues. Dr. Shachi P, Dept. of ECE, BMSCE 29

Traditional / Multiple bus Structure: Advantages: better performance, scalable, less contention Disadvantage: increased cost and complexity. Dr. Shachi P, Dept. of ECE, BMSCE 30

Traditional / Multiple bus Structure: Traditional / Multiple bus Structure: There is a local bus that connects the processor to cache memory and that may support one or more local devices. There is also a cache memory controller that connects this cache not only to this local bus but also to the system bus. On the system, the bus is attached to the main memory modules. I/O transfers to and from the main memory across the system bus do not interfere with the processor’s activity. An expansion bus interface buffers data transfers between the system bus and the I/O controllers on the expansion bus. I/O devices that might be attached to the expansion bus include: Network cards (LAN), SCSI (Small Computer System Interface), Modem, etc.. Dr. Shachi P, Dept. of ECE, BMSCE 31

Basic Processing Unit Computing task consists of a series of operations specified by a sequence of machine-language instructions that constitute a program. The processor fetches one instruction at a time and performs the operation specified. Instructions are fetched from successive memory locations until a branch or a jump instruction is encountered . The processor uses the program counter, PC, to keep track of the address of the next instruction to be fetched and executed. After fetching an instruction, the contents of the PC are updated to point to the next instruction in sequence. A branch instruction may cause a different value to be loaded into the PC . When an instruction is fetched, it is placed in the instruction register, IR, from where it is interpreted, or decoded, by the processor’s control circuitry. The IR holds the instruction until its execution is completed. Consider a 32-bit RISC-style instruction set architecture. Dr. Shachi P, Dept. of ECE, BMSCE 32

Basic Processing Unit To execute an instruction, the processor has to perform the following steps: Fetch the contents of the memory location pointed to by the PC. The contents of this location are the instruction to be executed; hence they are loaded into the IR. In register transfer notation, the required action is IR ← [[PC]] Increment the PC to point to the next instruction. Assuming that the memory is byte addressable, the PC is incremented by 4; that is PC ← [PC] + 4 Carry out the operation specified by the instruction in the IR. Dr. Shachi P, Dept. of ECE, BMSCE 33

Basic Processing Unit The operation specified by an instruction can be carried out by performing one or more of the following actions: Read the contents of a given memory location and load them into a processor register. Read data from one or more processor registers. Perform an arithmetic or logic operation and place the result into a processor register. Store data from a processor register into a given memory location. The processor communicates with the memory through the processor- memory interface, which transfers data from and to the memory during Read and Write operations. The instruction address generator updates the contents of the PC after every instruction is fetched. The register file is a memory unit whose storage locations are organized to form the processor’s general-purpose registers. Dr. Shachi P, Dept. of ECE, BMSCE 34

Basic Processing Unit The processor communicates with the memory through the processor-memory interface, which transfers data from and to the memory during Read and Write operations. The instruction address generator updates the contents of the PC after every instruction is fetched. The register file is a memory unit whose storage locations are organized to form the processor’s general- purpose registers. During execution, the contents of the registers named in an instruction that performs an arithmetic or logic operation are sent to the arithmetic and logic unit (ALU), which performs the required computation. The results of the computation are stored in a register in the register file. The clock period, which is the time between two successive rising edges, must be long enough to allow the combinational circuit to produce the correct result. Dr. Shachi P, Dept. of ECE, BMSCE 35

RISC and ARM Design Philosophy The RISC Design Philosophy Instructions – reduced number and simpler Pipeline Registers – large number of general purpose registers (store data or address) Load/Store architecture – anything data on memory (to be processed), is first moved to register/s and then processed. ARM Design Philosophy Power efficiency High code density Memory footprint/ Die area Hardware Debug technology Dr. Shachi P, Dept. of ECE, BMSCE 36

The RISC Design Philosophy Dr. Shachi P, Dept. of ECE, BMSCE 37

Nomenclature ARM7TDMI-S Dr. Shachi P, Dept. of ECE, BMSCE 38

ARM7TDMI Features 32 bit data bus/ ALU 32 bit instructions/ Address bus Aligned memory Von Neuman architecture 3-stage pipeline 37 registers- 32 bit each Load- store Model 7 operating modes 7 exceptions 7 addressing modes 3 data formats Dr. Shachi P, Dept. of ECE, BMSCE 39

ARM ISA Features ARM ISA differs from pure RISC Variable execution cycle for certain instructions In-line barrel shifter leading to more complex instructions. Thumb instruction set Conditional execution Enhanced instructions with DSP extension Dr. Shachi P, Dept. of ECE, BMSCE 40

Data Sizes and Instruction Sets The ARM is a 32-bit architecture. When used in relation to the ARM: Byte means 8 bits Halfword means 16 bits (two bytes) Word means 32 bits (four bytes) Most ARM’s implement two instruction sets 32-bit ARM Instruction Set 16-bit Thumb Instruction Set Jazelle cores can also execute Java bytecode Dr. Shachi P, Dept. of ECE, BMSCE 41

ARM core Dataflow model MOVS r7, r5, LSL #2 MLA{<cond>}{S} R0,R1,R2,R3 LDR r0, [r1, #4]! STRH r0,[r1,#0x4]! LDRSB r0,[r1] Dr. Shachi P, Dept. of ECE, BMSCE 42

Registers What is a register? data holding places that are part of the computer processor high-speed memory storing units. memory locations that can be accessed by the CPU directly Difference between memory and register A register stores the instructions which the CPU currently processes. Memory stores the data and instructions that the processor while operation may require. Dr. Shachi P, Dept. of ECE, BMSCE 43

Registers (contd.) ARM has 37 registers ( all are 32-bits long) 1 dedicated program counter 1 dedicated current program status register (CPSR) 5 dedicated saved program status registers (SPSR) 30 general purpose registers Out of 37 only 18 are active registers 16 data registers (r0-r15)- hold either data or address 2 process status registers r13 : stack pointer r14: link register r15: program counter Dr. Shachi P, Dept. of ECE, BMSCE 44

Registers (contd.) Register r13 : used as the stack pointer ( sp ) stores the head of the stack in the current processor mode. Register r14 the link register ( lr ) the core puts the return address whenever it calls a subroutine. Register r15: is the program counter (pc) the address of the next instruction to be fetched by the processor These registers are distributed in several register banks, their usage depends on the mode in which the ARM processor is operated Dr. Shachi P, Dept. of ECE, BMSCE 45

Banked Registers registers hidden from a program at different times  banked registers are identified by the shading in the diagram Available only when the processor is in a particular mode Mode can be selected by writing directly to the mode bits of the cpsr (core must be in privileged mode) Mode can also be changed by hardware when the core responds to an exception or interrupt A banked register maps one-to one onto a user mode register If processor mode is changed , a banked register from the new mode will replace an existing register Saved Program Status Register (SPSR) stores the current value of the CPSR when an exception is taken so that the CPSR can be restored after handling the exception. Dr. Shachi P, Dept. of ECE, BMSCE 46

Exceptions and interrupts suspend the normal execution of sequential instructions and jump to a specific location. The following exceptions and interrupts cause a mode change: Reset interrupt request fast interrupt request software interrupt data abort prefetch abort undefined instruction a new register appearing in interrupt request mode: the saved program status register ( spsr ), which stores the previous mode’s cpsr spsr can only be modified and read in a privileged mode. There is no spsr available in user mode. cpsr is not copied into tspsr when a mode change is forced due to a program writing directly to the cpsr . The saving of the cpsr only occurs when an exception or interrupt is raised. Dr. Shachi P, Dept. of ECE, BMSCE 47

Current Program Status Register Used to monitor and control internal operations. 32-bit register, resides in register file. The CPSR is divided into four fields, each 8 bits wide: Dr. Shachi P, Dept. of ECE, BMSCE 48 flags Status Extension Control The control field  processor mode, state, and interrupt mask bits. Th e flag s fiel d  c o n t ains the c ondit i o n flag s . Some ARM processor cores have extra bits allocated. The J bit, in flags field  used in Jazelle -enabled processors

Interrupt Masks Used to stop specific interrupt requests from interrupting the processor T wo int e rrupt request leve l s in A R M core Interrupt Request (IRQ) Fast Interrupt Request (FIQ) CPSR: 2 interrupt mask bits I  when set to 1 it masks requests made by IRQ F  when set to 1 it masks requests made by FIQ Conditional Flags There are four Conditional Flags in ARM7TDMI It is present in the CPSR, the flag bits are N: Result is Negative Z: Zero flag C: Carry Flag V : Ov e r f l o w Flag Dr. Shachi P, Dept. of ECE, BMSCE 49

Condition Flags Updated by comparisons and results of ALU operations Only instructions having suffix S can update the flags Eg: SUBS instruction when executed sets Z=1 if result is zero Q: used in cores with DSP extensions Indicates an overflow/ saturation due to execution of enhanced DSP instruction It’s a sticky flag: can be set only by hardware Can be cleared by writing to CPSR directly ARM instructions follow conditional execution Its is based on the value stored in conditional flag[ Ref Table next slide] Note 1: When bit=1  Capital Letter When bit=0  Lower case Letter Figure: CPSR with both Jazelle and DSP extensions set Note 2: Conditional flags  Capital letter indicate flag is set Interrupts  Capital letter indicates interrupt is disabled/masked Dr. Shachi P, Dept. of ECE, BMSCE 50

Conditional Execution Controls whether or not the core will execute an instruction Before execution, processor compares the attributes with the flags in CPSR If they match  instruction is executed If not  instruction is ignored Conditional attribute is post-fixed to instruction mnemonic [REFER TABLE] If mnemonic is not present the default is AL (Always) Dr. Shachi P, Dept. of ECE, BMSCE 51

Dr. Shachi P, Dept. of ECE, BMSCE 52

Dr. Shachi P, Dept. of ECE, BMSCE 53 On power up t he proc essor by default operat e s in supe r visor m ode  pr i vi l eged m ode

Processor Modes The processor mode determines which registers are active and the access rights to the cpsr register itself. Each process mode is either privileged or nonprivileged A privileged mode : allows full read-write access to the cpsr A nonprivileged mode : only allows to read access to the control field in the cpsr but still allows read-write access to the condition flags. Dr. Shachi P, Dept. of ECE, BMSCE 54

Dr. Shachi P, Dept. of ECE, BMSCE 55

Dr. Shachi P, Dept. of ECE, BMSCE 56

State and Instruction Sets State defines which instruction set needs to be executed Selected using the control bits of the CPSR register 3 states: ARM : default state, selected when T=J=0, ARM instructions are executed Thumb: Selected when T=1; 16 bit thumb instructions are executed Jazelle : selected when J=1; 8 bit Jazelle instruction set is selected; Used to execute java bytecodes States can be changed by executing branch instruction Dr. Shachi P, Dept. of ECE, BMSCE 57

Dr. Shachi P, Dept. of ECE, BMSCE 58

Pipelining in ARM7TDMI ARM devices need pipelining because of RISC as it emphasizes on compiler complexity . Each stage is equivalent to 1 cycle, that is n stages = n cycles. ARM7 uses 3 stage pipeline Pipeline speeds up the execution; Next instruction is fetched while the other instructions are being decoded and executed The pipeline stages are FETCH : loads instruction from memory to instruction pipeline DECODE : identifies instruction to be executed EXECUTE :processes the instruction and writes the result back to a register Dr. Shachi P, Dept. of ECE, BMSCE 59

Pipelining in ARM7TDMI Three instructions are in the pipeline. Instructions are placed in pipeline sequentially Cycle1 : CORE fetches ADD from memory and puts it in instruction pipeline Cycle 2 : CORE fetches SUB instruction and Decodes ADD instruction Cycle 3 : CORE fetches CMP instruction, decodes SUB instruction and Executes ADD instruction This procedure is called FILLING THE PIPELINE Pipeline allows the CORE to execute an instruction every cycle. Latency is 3-cycles but throughput is one instruction per cycle. Dr. Shachi P, Dept. of ECE, BMSCE 60

EXTRAS Dr. Shachi P, Dept. of ECE, BMSCE 61

Barrel Shifter A barrel shifter is a digital circuit that can shift a binary number by a specified number of bits in one clock cycle. Barrel shifter can be implemented by a combination of multiplexers 2 types – arithmetic and logical shifter A few examples of barrel shifter applications: In Digital Signal Processing, barrel shifters are used to perform fast multiplication and division operations. For example, in a FIR filter implementation, a barrel shifter can be used to shift the filter coefficients based on the filter order. In Cryptography, barrel shifters are used to perform bitwise operations, such as encryption and decryption. For example, a barrel shifter can be used to perform a circular shift on a binary value to improve the security of the encryption algorithm. In Microprocessor Architectures, barrel shifters are used to shift the contents of registers, allowing for efficient data manipulation. For example, in the ARM architecture, the barrel shifter is used to perform shift and rotate operations on the contents of registers. Dr. Shachi P, Dept. of ECE, BMSCE 62

Extras Load-store architecture A load-store architecture is a type of computer architecture where all data processing operations (such as arithmetic, logical, and control operations) are performed only on data that is loaded from memory into registers, and the results are stored back into memory. In other words, the only operations that directly access memory are load and store operations. For CISC machine, which is a register-memory architecture, operands may come from register or memory and RISC a register-register(or load-store) one on the contrary.  Dr. Shachi P, Dept. of ECE, BMSCE 63