ARM_InstructionSet.pdf; For VTU 22 regulation course code BCS402

DrBAMASCSE 107 views 100 slides Jul 07, 2024
Slide 1
Slide 1 of 100
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100

About This Presentation

As a faculty, prepared pdf file using the BookARm System Developer's guide Designing and Optimizing system software


Slide Content

ARM Instruction Set Com
p
uter Or
g
ani
z
ation and Assembl
y
Lan
g
ua
g
es
pgz ygg
Yung-Yu Chuang with slides by Peng-Sheng Chen

Introduction • The ARM processor is easy to program at the
assembly level (It is a RISC) assembly level
.
(It is a RISC)
• We will learn ARM assembly programming at the
l l d it GBA l t
user
l
eve
l
an
d
run
it
on

a
GBA
emu
l
a
t
or.

ARM programmer model • The state of an ARM system is determined by
the content of visible registers and memory the content of visible registers and memory
.
• A user-mode program can see 15 32-bit general-
it (R0
R14) t
purpose

reg
i
s
t
ers
(R0
-
R14)
,

program

coun
t
er

(PC) and CPSR.
• Instruction set defines the operations that can
change the state.

Memory system • Memory is a linear array of
bytes addressed from 0 to
00
0x00000000
bytes addressed from 0 to 2
32
-1
Wd hlf
d bt
10
20
0x00000001
0x00000002

W
or
d
,
h
a
lf
-wor
d
,
b
y
t
e
• Little-endian
30
FF
0x00000003
0x00000004
FF FF
0x00000004 0x00000005 0x00000006
00
0x00000006 0 FFFFFFFD
00
00
0xFFFFFFFE
0
x
FFFFFFFD
0xFFFFFFFF

Byte ordering • Big Endian
Least significant byte has
00
0x00000000

Least significant byte has highest address
Word address 0x00000000
00 10 20
0x00000000 0x00000001 0 00000002
Value: 00102030
• Little Endian
20 30 FF
0
x
00000002
0x00000003
– Least significant byte has
lowest address
FF FF FF
0x00000004
0x00000005
Word address 0x00000000
Value: 30201000
FF 00
0x00000006
00 00
0xFFFFFFFE 0xFFFFFFFD
00
0xFFFFFFFF 0xFFFFFFFE

ARM programmer model
00
0x00000000
10
20
0x00000001
0x00000002
R0 R1 R2 R3
30 FF
0x00000002 0x00000003 0 00000004
R4 R5 R6 R7 R8
R9
R10
R11
FF FF FF
0
x
00000004
0x00000005
R8
R9
R10
R11
R12 R13 R14 PC
FF 00
0x00000006
00 00 00
0
xFFFFFFFE
0xFFFFFFFD
00
0xFFFFFFFF 0

Instruction set ARM instructions are all 32
bit long
are all 32
-
bit long
(except for Thumb mode) Thumb mode)
.

There are 2
32
possible machine possible machine instructions. Fortunately they Fortunately
,
they
are structured.

Features of ARM instruction set • Load-store architecture
3
dd i i

3
-a
dd
ress
i
nstruct
i
ons
• Conditional execution of every instruction
• Possible to load/store multiple registers at
once
• Possible to combine shift and ALU operations in
a single instruction a single instruction

Instruction set • Data processing
D

D
ata

movement
• Flow control

Data processing • They are move, arithmetic, logical, comparison
and multiply instructions and multiply instructions
.
• Most data processing instructions can process
f th i d i th b l hift
one

o
f th
e
i
r

operan
d
s

us
i
ng
th
e
b
arre
l
s
hift
er.

• General rules:
– All operands are 32-bit, coming
from registers or literals.
– The result, if any, is 32-bit and
placed in a register (with the
ti f l g lti l
excep
ti
on
f
or
l
on
g
mu
lti
p
l
y

which produces a 64-bit result) 3
address format

3
-
address format

Instruction set MOV<cc><S> Rd, <operands> MOVCS R0, R1 @ if carry is set
@
then R0:=R1
MOVS R0, #0 @ R0:=0
@
Z=1
,
N=0
,
@ C, V unaffected

Conditional execution • Almost all ARM instructions have a condition
field which allows it to be executed field which allows it to be executed conditionally.
01
movcs R
0
, R
1

Register movement
immediate,register,shift
•MOV R0, R2 @ R0 = R2
02
@0 2
•MVN R
0
, R
2
@
R
0
= ~R
2
td
move

nega
t
e
d

Addressing modes • Register operands
ADD R0 R1 R2 ADD

R0
,
R1
,
R2
• Immediate operands
a literal; most can be represented
ADD R3 R3 #1 @ R3:=R3+1
a literal; most can be represented
by (0..255)x2
2n
0<n<12
ADD

R3
,
R3
,
#1

@

R3:=R3+1
AND R8, R7, #0xff @ R8=R7[7:0]
a hexadecimal literal This is assembler dependent syntax This is assembler dependent syntax
.

Shifted register operands • One operand to ALU is
routed through the Barrel routed through the Barrel shifter. Thus, the operand can be modified before it can be modified before it is used. Useful for fast
multi
p
liation and dealin
g

pg
with lists, table and other complex data structure. (similar to the displacement addressing
d i CSC
• Some instructions (e.g.
MUL CLZ QADD
) do
mo
d
e
i
n
C
I
SC
.)
MUL
,
CLZ
,
QADD
) do
not read barrel shifter.

Shifted register operands

Logical shift left
CC
0
registe
r
MOV R0, R2, LSL #2@ R0:=R2<<2
@
R2 unchan
g
ed
g
Example: 0…0 0011 0000 Before R2
=
0x00000030
Before

R2 0x00000030
After R0=0x000000C0
R2=0x00000030 R2=0x00000030

Logical shift right
C 0
registe
r
MOV R0, R2, LSR #2 @ R0:=R2>>2
@
R2 unchan
g
ed
g
Example: 0…0 0011 0000 Before R2
=
0x00000030
Before

R2 0x00000030
After R0=0x0000000C
R2=0x00000030 R2=0x00000030

Arithmetic shift right
MSB
registe
r
C
MOV R0, R2, ASR #2 @ R0:=R2>>2
@
R2 unchan
g
ed
g
Example: 1010 0…0 0011 0000 Before R2
=
0xA0000030
Before

R2 0xA0000030
After R0=0xE800000C
R2=0xA0000030 R2=0xA0000030

Rotate right
registe
r
MOV R0, R2, ROR #2 @ R0:=R2 rotate
@
R2 unchan
g
ed
g
Example: 0…0 0011 0001 Before R2
=
0x00000031
Before

R2 0x00000031
After R0=0x4000000C
R2=0x00000031 R2=0x00000031

Rotate right extended
C
registe
r
C
C
MOV R0, R2, RRX@ R0:=R2 rotate
@
R2 unchan
g
ed
g
Example: 0…0 0011 0001 Before R2
=
0x00000031, C
=
1
Before

R2 0x00000031,

C1
After R0=0x80000018, C=1
R2=0x00000031 R2=0x00000031

Shifted register operands

Shifted register operands

Shifted register operands • It is possible to use a register to specify the
number of bits to be shifted; only the bottom 8 number of bits to be shifted; only the bottom 8 bits of the register are significant.
@idllti @
array
i
n
d
ex ca
l
cu
l
a
ti
on
ADD R0, R1, R2, LSL R3@ R0:=R1+R2*2
R3
@ fast multiply R2=35xR0 A
DD R0, R0, R0, LSL #2 @ R0’=5xR0
RSB R2, R0, R0, LSL #3 @ R2 =7xR0’

Multiplication
MOV R1, #35 MUL R2 R0 R1 MUL

R2
,
R0
,
R1
or
000 #2@0’50
A
DD R
0
, R
0
, R
0
, LSL
#2

@
R
0’
=
5
xR
0
RSB R2, R0, R0, LSL #3 @ R2 =7xR0’

Shifted register operands

Encoding data processing instructions
cond
0 0operand 2
#
opcode
S
Rn
Rd
31 28 27 26 25 24 21 20 19 16 15 12 11 0
destination register first operand register set condition codes set

condition

codes
arithmetic/logic function
8
bit i di t
25 11 8 7 0
#t
8
-
bit

i
mme
di
a
t
e
1
#
ro
t
11 7 6 5 4 3 0
immediate alignment
Rm
#shift
0
25
Sh
0
immediate shift length
f
Rm
0
11 876543 0
Rs
1
0
Sh
shi
f
t type
second operand register
Rm
Rs
1
0
Sh
register shift length

Arithmetic • Add and subtraction

Arithmetic •ADD R0, R1, R2 @ R0 = R1+R2
ADC R0 R1 R2
@R0 R1R2C

ADC

R0
,
R1
,
R2
@

R0
=
R1
+
R2
+
C
•SUB R0, R1, R2 @ R0 = R1-R2 •SBC R0, R1, R2 @ R0 = R1-R2-!C

RSB R0, R1, R2
@R0
=
R2
-
R1
RSB

R0,

R1,

R2
@

R0

R2
R1
•RSC R0, R1, R2 @ R0 = R2-R1-!C
0
127
128
1
3
0
127
-
128
-
1
255
128
127
0
-5
3-5=3+(-5)

sum<=255

C=0

borrow
255
128
127
0
5-3=5+(-3)

sum > 255

C=1
→ no
borrow

Arithmetic

Arithmetic

Setting the condition codes • Any data processing instruction can set the
condition codes if the programmers wish it to condition codes if the programmers wish it to
64-bit addition
R1
R0
ADDSR2, R2, R0 ADC R3 R3 R1
R3
R2
+
ADC

R3
,
R3
,
R1
R3
R2

Logical

Logical •AND R0, R1, R2 @ R0 = R1 and R2
ORR R0 R1 R2
@R0 R1 R2

ORR

R0
,
R1
,
R2
@

R0
=
R1
or
R2
•EOR R0, R1, R2 @ R0 = R1 xor R2 •BIC R0, R1, R2 @ R0 = R1 and (~R2)
bit clear: R2is a mask identifying which
bits of R1will be cleared to zero
R1=0x11111111 R2=0x01100101 B
IC R0, R1, R2
R0=0x10011010 R0=0x10011010

Logical

Comparison • These instructions do not generate a result, but
set condition code bits (N Z C V) in CPSR set condition code bits (N
,
Z
,
C
,
V) in CPSR
.

Often, a branch operation follows to change the program flow program flow
.

Comparison •
CMP R1 R2
@ set cc on R1
-
R2
compare

CMP

R1
,
R2
@

set

cc

on

R1
-
R2

CMN R1 R2
@ set cc on R1+R2
compare negated

CMN

R1
,
R2
@

set

cc

on

R1+R2
TST R1 R2
@ set cc on R1 and R2
bit test

TST

R1
,
R2
@

set

cc

on

R1

and

R2
TEQ R1 R2
@ t R1 R2
test equal

TEQ

R1
,
R2
@
se
t
cc on
R1
xor
R2

Comparison

Multiplication

Multiplication •MUL R0, R1, R2 @ R0 = (R1xR2)
[31:0]
• Features:
Sd d ’t b idit

S
econ
d
operan
d
can
’t b
e
i
mme
di
a
t
e
– The result register must be different from
the first operand
–C
y
cles de
p
ends on core t
yp
e
yp yp
– If S bit is set, C flag is meaningless
See the reference manual (4133)

See the reference manual (4
.
1
.
33)

Multiplication • Multiply-accumulate (2D array indexing)
MLA R4 R3 R2 R1 @ R4 = R3xR2+R1 MLA

R4
,
R3
,
R2
,
R1

@

R4

=

R3xR2+R1
M lti l ith t t ft b

M
u
lti
p
l
y

w
ith
a

cons
t
an
t
can

o
ft
en
b
e

more

efficiently implemented using shifted register operand operand
MOV R1, #35 MUL R2 R0 R1 MUL

R2
,
R0
,
R1
or
A
DD R0, R0, R0, LSL #2 @ R0’=5xR0
RSB R2, R0, R0, LSL #3 @ R2 =7xR0’

Multiplication

Multiplication

Flow control instructions • Determine the instruction to be executed next
pc-relative offset within 32MB

Flow control instructions • Branch instruction
B
lbl
B
l
a
b
e
l

label: … • Conditional branches
MOV R0, #0 MOV

R0,

#0
loop: …
ADD R0 R0 #1 ADD

R0
,
R0
,
#1
CMP R0, #10 B
NE
l
oop

Branch conditions

Branches

Branch and link •BLinstruction save the return address to R14
(lr)(lr)
BL sub @ call sub
CMP R1, #5 @ return to here
MOVEQ R1, #0 …
sub: … @ sub entry point
… MOV PC, LR @ return

Branch and link
BL sub1 @ call sub1 …
use stack to save/restore the return address and registers sub1:
S
TMFD R13!, {R0-
R
2,R14}
BL sub2
…L
DMFD R13!
,

{
R0-
R
2
,
PC
}
,{
,}
sub2: sub2:
……MOV PC LR MOV

PC
,
LR

Conditional execution
CMP R0, #5 BEQ b pass @ if (R0! 5) { BEQ

b
y
pass

@

if

(R0!
=
5)

{
ADD R1, R1, R0 @ R1=R1+R0-R2 S
UB R1, R1, R2 @ }
bypass: …
C
MP R0
,
#5
smaller and faster
,
ADDNE R1, R1, R0 SUBNE R1 R1 R2 SUBNE

R1
,
R1
,
R2
Rule of thumb: if the conditional se
q
uence is three instructions
q
or less, it is better to use conditional executionthan a branch.

Conditional execution
if ((R0==R1) && (R2==R3)) R4++
CMP R0, R1 BNE skip BNE

skip
CMP R2, R3 BNE skip BNE

skip
ADD R4, R4, #1
skip: skip:


CMP R0 R1 CMP

R0
,
R1
CMPEQ R2, R3 ADDEQ R4 R4 #1 ADDEQ

R4
,
R4
,
#1

Data transfer instructions • Move data between registers and memory
Th b i f

Th
ree
b
as
i
c
f
orms
– Single register load/store
– Multiple register load/store

Single register swap:
SWP(B),
atomic
Single register swap:
SWP(B),
atomic
instruction for semaphore

Single register load/store

Single register load/store No STRSB/STRSHsince STRB/STRHstores both
id/id
s
i
gne
d/
uns
i
gne
d
ones

Single register load/store • The data items can be a 8-bit byte, 16-bit half-
word or 32
bit word Addresses must be
word or 32
-
bit word
.
Addresses must be
boundary aligned. (e.g. 4’s multiple for LDR/STR
)
LDR/STR
)
LDR R0, [R1]
@
R0 := mem
32
[
R1]
STR R0, [R1] @ mem
3
2
[R1] := R0
LDR
,
LDRH
,
LDRB for 32
,
16
,
8 bits
,,
,,
STR, STRH, STRB for 32, 16, 8 bits

Addressing modes • Memory is addressed by a register and an offset.
LDR R0 [R1]
@ [R1]
LDR

R0
,
[R1]
@
mem
[R1]
• Three ways to specify offsets:
– Immediate
LDR R0, [R1, #4] @ mem[R1+4] Ri

R
eg
i
ster
LDR R0, [R1, R2] @ mem[R1+R2]
Scaled register
@ [R1+4*R2]

Scaled register
@
mem
[R1+4*R2]
LDR R0, [R1, R2, LSL #2]

Addressing modes • Pre-index addressing (
LDR R0, [R1, #4]
)
ih i b k
w
i
t
h
out

a

wr
i
te
b
ac
k
• Auto-indexing addressing (
LDR R0, [R1, #4]!
)
Pre-index with writeback calculation before accessing with a writeback calculation before accessing with a writeback
• Post-index addressing (
LDR R0, [R1], #4
)
l l ti ft i ith it b k
ca
l
cu
l
a
ti
on

a
ft
er

access
i
ng

w
ith
a

wr
it
e
b
ac
k

Pre-index addressing LDR R0, [R1, #4] @ R0=mem[R1+4]
@ R1 nchanged @

R1
u
nchanged

LDR R0,
[R1, ]
LDR

R0,
[R1,

]
R1
+
R0

Auto-indexing addressing LDR R0, [R1, #4]! @ R0=mem[R1+4]
@ R1 R1+4 @

R1
=
R1+4
No extra time
;
Fast
;

LDR R0,
[R1, ]!
;;
LDR

R0,
[R1,

]!
R1
+
R0

Post-index addressing LDR R0, R1, #4@ R0=mem[R1]
@ R1 R1+4 @

R1
=
R1+4
LDR R0,
[R1],
LDR

R0,
[R1],
R0
R1
+

Comparisons • Pre-indexed addressing LDR R0
[R1 R2]
@ R0=mem[R1+R2]
LDR

R0
,
[R1
,
R2]

@

R0=mem[R1+R2]
@ R1 unchanged
Auto
indexing addressing

Auto
-
indexing addressing
LDR R0, [R1, R2]! @ R0=mem[R1+R2]
@ R1 R1+R2 @

R1
=
R1+R2
• Post-indexed addressing LDR R0, [R1], R2
@
R0=mem[R1]
@ R1=R1+R2

Example

Example

Example

Summary of addressing modes

Summary of addressing modes

Summary of addressing modes

Summary of addressing modes

Load an address into a register • Note that all addressing modes are register-
offseted Can we issue
LDR R0 Table
? The
offseted
.
Can we issue
LDR

R0
,
Table
? The
pseudo instruction ADRloads a register with an address address
table: .word 10 …
ADR R0, table
• Assembler transfer
p
seudo instruction into a
p
sequence of appropriate instructions sub
r0 pc #12
sub
r0
,
pc
,
#12

Application
ADR R1, table
loop
LDR R0 [R1]
t
able
loop
:
LDR

R0
,
[R1]
ADD R1, R1, #4
R1
@
operations on R0

ADR R1
,
table
,
loop: LDR R0, [R1], #4
@ operations on R0 …

Multiple register load/store • Transfer a block of data more efficiently.
Ud f d d i f i

U
se
d f
or

proce
d
ure

entry

an
d
ex
i
t
f
or

sav
i
ng

and restoring workspace registers and the
t dd
re
t
urn

a
dd
ress
• For ARM7, 2+Ntcycles (N:#words, t:time for a
word for sequential access). Increase interrupt
latency since it can’t be interrupted.
registers are arranged an in increasing order; see manual LDMIA R1, {R0, R2, R5} @ R0 = mem[R1]
@ R2 = mem[r1+4] @R5
=
mem[r1+8]
@

R5

mem[r1+8]

Multiple load/store register LDM load multiple registers STM store m ltiple registers STM

store

m
u
ltiple

registers
suffix meaning
IA increase after
IB increase before D
A decrease after
DB decrease before

Addressing modes

Multiple load/store register LDM<mode> Rn, {<registers>}
IA: addr:=Rn
IB: addr:=Rn+4 DA: addr:=Rn-#<registers>*4+4
#
DB: addr:=Rn
-
#
<registers>*4
For each Ri in <registers>
IB: addr:=addr+4 IB:

addr:=addr+4
DB: addr:=addr-4 R
i:=M[addr]
IA: addr:=addr+4 DA: addr:=addr-4
!
Rn
R1
<!
>: Rn:=addr
R2
R3

Multiple load/store register LDM<mode> Rn, {<registers>}
IA: addr:=Rn
IB: addr:=Rn+4 DA: addr:=Rn-#<registers>*4+4
#
DB: addr:=Rn
-
#
<registers>*4
For each Ri in <registers>
IB: addr:=addr+4 IB:

addr:=addr+4
DB: addr:=addr-4 R
i:=M[addr]
IA: addr:=addr+4 DA: addr:=addr-4
!
Rn
<!
>: Rn:=addr
R1
R2
R3

Multiple load/store register LDM<mode> Rn, {<registers>}
IA: addr:=Rn
IB: addr:=Rn+4 DA: addr:=Rn-#<registers>*4+4
#
DB: addr:=Rn
-
#
<registers>*4
For each Ri in <registers>
IB: addr:=addr+4
R1
IB:

addr:=addr+4
DB: addr:=addr-4 R
i:=M[addr]
R3R2R1
IA: addr:=addr+4 DA: addr:=addr-4
!
Rn
R3
<!
>: Rn:=addr

Multiple load/store register LDM<mode> Rn, {<registers>}
IA: addr:=Rn
IB: addr:=Rn+4 DA: addr:=Rn-#<registers>*4+4
#
DB: addr:=Rn
-
#
<registers>*4
For each Ri in <registers>
IB: addr:=addr+4
R1 R2
IB:

addr:=addr+4
DB: addr:=addr-4 R
i:=M[addr]
R2 R3
IA: addr:=addr+4 DA: addr:=addr-4
!
Rn
<!
>: Rn:=addr

Multiple load/store register LDMIA R0, {R1,R2,R3}
o
r
LDMIA R0, {R1-R3}
addr data
0
x010
1
0
R0
R1: 10
R2: 20
0x014 20 0x018
30
R0
R3: 30 R0: 0x10
0x018
30
0x01C 40 0x020
50
R0:

0x10
0x020
50
0x024 60

Multiple load/store register LDMIA R0!, {R1,R2,R3}
addr data
0
x010
1
0
R0
R1: 10
R2: 20
0x014 20 0x018
30
R0
R3: 30 R0:
0x01C
0x018
30
0x01C 40 0x020
50
R0:

0x01C
0x020
50
0x024 60

Multiple load/store register LDMIBR0!, {R1,R2,R3}
addr data
0
x010
1
0
R0
R1: 20
R2: 30
0x014 20 0x018
30
R0
R3: 40 R0: 0x01C
0x018
30
0x01C 40 0x020
50
R0:

0x01C
0x020
50
0x024 60

Multiple load/store register LDMDAR0!, {R1,R2,R3}
addr data
0
x010
1
0
R1: 40
R2: 50
0x014 20 0x018
30
R3: 60 R0: 0x018
0x018
30
0x01C 40 0x020
50
R0:

0x018
0x020
50
0x024 60
R0

Multiple load/store register LDMDBR0!, {R1,R2,R3}
addr data
0
x010
1
0
R1: 30
R2: 40
0x014 20 0x018
30
R3: 50 R0: 0x018
0x018
30
0x01C 40 0x020
50
R0:

0x018
0x020
50
0x024 60
R0

Example

Example LDMIA r0!, {r1-r3}

Example LDMIB r0!, {r1-r3}

Application • Copy a block of memory
R9 dd f th

R9
:

a
dd
ress

o
f th
e

source
– R10: address of the destination
R11 d dd f th

R11
:

en
d
a
dd
ress

o
f th
e

source
loop: LDMIA R9!, {R0-R7}
STMIA R10!, {R0
-
R
7}
CMP R9, R11 BNE loop BNE

loop

Application • Stack (full: pointing to the last used; ascending:
grow towards increasing memory addresses) grow towards increasing memory addresses)
mode
P
OP =LDM
P
USH
=
STM
Full ascending (FA)LDMFA LDMDA STMFA STMIB
Full descending
(
FD
)
LDMFD
LDMIA
STMFD
STMDB
Full descending
(
FD
)
LDMFD
LDMIA
STMFD
STMDB
Empty ascending (EA)LDMEA LDMDB STMEA STMIA
Et d di (
)
LDMED
LDMIB
STMED
STMDA
LDMFD R13! {R2
R9} @ used for ATPCS
E
mp
t
y
d
escen
di
ng
(
E
D
)
LDMED
LDMIB
STMED
STMDA
LDMFD

R13!
,
{R2
-
R9}

@

used

for

ATPCS
… @ modify R2-R9 STMFD R13!, {R2
-
R
9}

Example

Swap instruction • Swap between memory and register. Atomic
operation preventing any other instruction from operation preventing any other instruction from reading/writing to that location until it completes completes

Example

Application
Process A Process B
OS
While (1) {
if (s
==
0) {
While (1) {
if (s
==
0) {
S=0/1
if

(s 0)

{
s=1;
}
}
if

(s 0)

{
s=1;
}
}
}// use the // resource
}// use the // resource

Software interrupt • A software interrupt instruction causes a
software interrupt exception which provides a software interrupt exception
,
which provides a
mechanism for applications to call OS routines.

Example

Load constants • No ARM instruction loads a 32-bit constant into
a register because ARM instructions are 32
bit
a register because ARM instructions are 32
-
bit
long. There is a pseudo code for this.

Immediate numbers
cond
0 0operand 2
#
opcode
S
Rn
Rd
31 28 27 26 25 24 21 20 19 16 15 12 11 0
destination register first operand register set condition codes
v=n ror 2r
set

condition

codes
arithmetic/logic function
8
bit i di t
25 11 8 7 0
#t
n r
8
-
bit

i
mme
di
a
t
e
1
#
ro
t
11 7 6 5 4 3 0
immediate alignment
Rm
#shift
0
25
Sh
0
immediate shift length
f
Rm
0
11 876543 0
Rs
1
0
Sh
shi
f
t type
second operand register
encoding for data processing
Rm
Rs
1
0
Sh
register shift length
data processing instructions

Load constants • Assemblers implement this usually with two
options depending on the number you try to options depending on the number you try to load.

Load constants • Assume that you want to load 511 into R0
C t t i lti l i t ti

C
ons
t
ruc
t i
n

mu
lti
p
l
e
i
ns
t
ruc
ti
ons
mov r0, #256
dd 0 #255
add
r
0
,
#255
– Load from memory; declare L511 .word 511
#
l
dr r0, L511 ldr r0, [pc,
#
0]
• Guideline: if you can construct it in two
instructions, do it; otherwise, load it.
• The assembler decides for
y
ou
y
ldr r0, =255 mov r0, 255 l
dr r0
,
=511 ldr r0
,

[p
c
,
#4
]
,,[p,]

PC-relative modes
Impossible to use Impossible to use direct addressing
encoding for data transfer data transfer instructions

PC-relative addressing main:
MOV R0, #0 MOV

R0,

#0
ADR R1, a @ add r1, pc, #4 STR R0 [R1] STR

R0
,
[R1]
SWI #11
a: word 100 PC a:
.
word

100
.end
fetch
decode
exec
fetch
decode
exec
fetch
decode
exec
fetch
decode
exec

Instruction set