x86 architecture

ssuserf217c2 3,168 views 42 slides Sep 30, 2017
Slide 1
Slide 1 of 42
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42

About This Presentation

intro to x86 instruction set architecture


Slide Content

x86 Features and Instruction Set Architecture

Stored-program computer Stores program instructions in electronic memory, where programs and data in memory can be treated interchangeably or uniformly. von Neumann architecture also known as the von Neumann model and Princeton architecture , after 1945 work by John von Neumann and others in the First Draft of a Report on the EDVAC stores program data and instruction data in the same memory consists of: processing unit (arithmetic logic unit and processor registers) control unit (instruction register and program counter) memory (for data and instructions ) external mass storage input and output mechanisms instruction fetch and a data operation cannot occur concurrently because they share a common bus; referred to as the von Neumann bottleneck which often limits system performance.

Harvard architecture Based on the Harvard Mark I Data and instruction are stored in entirely separate memory systems CPU can fetch next instruction and load or store data simultaneously and independently

Modified Harvard architecture loose separation between code and data contents of the instruction memory can be accessed as if it were data. implemented on most modern CPU architectures Implementation Modifications Split-cache (or Almost-von-Neumann) architecture builds memory hierarchy with a CPU cache separating instructions and data; unifies all except small portions of the data and instruction address spaces, providing the von Neumann model cache coherency issues matter since it can greatly affect performance Instruction-memory-as-data architecture Preserves Harvard memory separation, but provides special machine operations to access the contents of the instruction memory as data. Data-memory-as-instruction architecture can execute instructions fetched from any memory segment can read an instruction and read a data value simultaneously if they're in separate memory segments with independent data buses (like Harvard). when executing an instruction from one memory segment, the same memory segment cannot be simultaneously accessed as data

Three characteristics to distinguish modified Harvard machines from pure Harvard and von Neumann machines: Pure Harvard Von Neumann Modified Harvard Instruction and data memories occupy different address spaces Separate address "zero" in instruction space and in data space store both instructions and data in a single address space Separate address "zero" in instruction space and in data space Instruction and data memories have separate hardware pathways to the central processing unit (CPU) Separate pathways for instruction and data memories to CPU unified address space such separate access paths for CPU caches or other tightly coupled memories, but a unified address space covers the rest of the memory hierarchy Instruction and data memories may be accessed in different ways stored instructions on a punched paper tape and data in electro-mechanical counters provides uniform access to flash memory and SRAM

Basic properties of the x86 architecture General consensus suggests that x86 is a modified Harvard architecture. The x86 architecture is a variable instruction length (typical 2 or 3 bytes, some are single-byte, others up to 15 bytes). Primarily "CISC" design with emphasis on backward compatibility. The instruction set is not typical CISC, but an extended version of the simple eight-bit 8008 and 8080 architectures Byte-addressing is enabled and words are stored in memory with little-endian byte order (LSB first) Memory access to unaligned addresses is allowed for all valid word sizes Native integer sizes for arithmetic and memory addresses (or offsets) is 16, 32 or 64 bits depending on architecture generation Multiple scalar values can be handled simultaneously via the SIMD unit (starting with Pentium 3) Floating point (separate prior to 80486, built-in ever since) instructions and registers for floating point operations SIMD (single instruction, multiple data) instructions works on (one or two) 128-bit words, each containing two or four floating point numbers (each 64 or 32 bits wide respectively), or alternatively, 2, 4, 8 or 16 integers (each 64, 32, 16 or 8 bits wide respectively). Pipelining and Superscalar features (starting with Pentium) added extra decoding steps to split most instructions into micro-operations buffered and scheduled by a control unit to be executed, partly in parallel, by one of several execution units. Out-of-order and speculative execution uses branch prediction, register renaming, and memory dependence prediction to allow execution of multiple x86 instructions simultaneously and not in the same order as given in the instruction stream. Simultaneous multithreading

x86 REGISTERS 16-bit The original Intel 8086 and 8088 have fourteen 16-bit registers. Four are general-purpose registers (GPRs): AX, BX, CX, DX; Each can be accessed as two separate bytes (the high byte and low byte) Two pointer registers have special roles: SP (stack pointer) points to the "top" of the stack, BP (base pointer) is used to point anywhere on the stack. The address/index registers: SI, DI, BX and BP Four segment registers : CS, DS, SS and ES (used to form a memory address in segmented memory mode) The FLAGS register contains, among others, carry flag (CF) , overflow flag (OF) and zero flag (ZF) . The instruction pointer (IP) points to the next instruction that will be fetched from memory and then executed; is read-only to the software. three special registers (GDTR, LDTR, IDTR) hold descriptor table addresses to support protected mode in 80286 and a fourth task register (TR) is used for task switching.

32-bit 32-bit processor (starting with 80386) expanded the 16-bit GPRs, base and index registers, instruction pointer, and FLAGS register to 32 bits (segment registers not affected) Represented by prefixing an " E " (for "extended") to the register names in x86 assembly language. The general-purpose, base, and index registers can all be used as the base in addressing modes, and all of those registers except for the stack pointer can be used as the index in addressing modes. Two new segment registers (FS and GS) were added. the machine code format was expanded to accommodate expanded registers. control/status register (MXCSR) 32-bit Streaming SIMD Extensions (SSE) added starting with the Pentium III. 64-bit 32-bit registers are expanded into 64-bit registers (introduced with AMD Opteron ) addressing extended to 64 bits An R -prefix identifies the 64-bit registers (RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, RFLAGS, RIP), eight additional 64-bit general registers (R8-R15) were also introduced (only usable in 64-bit mode, which is one of the two modes only available in long mode) extra addressing mode allows memory references relative to RIP (the instruction pointer), to ease the implementation of position-independent code, used in shared libraries in some operating systems. Miscellaneous/special purpose 32-bit x86 processors (starting with the 80386) also include various special/miscellaneous registers: control registers (CR0 through 4, CR8 for 64-bit only) debug registers (DR0 through 3, plus 6 and 7) test registers (TR3 through 7; 80486 only) model-specific registers (MSRs, appearing with the Pentium)

80-bit Available in all floating point units (FPU) also known as math co-processors They appears as part of the CPU 8087 (8086, 8088, 80186, and 80188), 80287 (80286), 80387 (80386), built-in starting with 80486 eight 80-bit wide registers: st (0) to st (7) each register holds numeric data in one of seven formats: 32-, 64-, or 80-bit floating point, 16-, 32-, or 64-bit (binary) integer, and 80-bit packed decimal integer The Pentium MMX added eight 64-bit MMX integer registers (MMX0 to MMX7, which share lower bits with the 80-bit-wide FPU stack). 128-bit SIMD registers XMM0–XMM15. 256-bit SIMD registers YMM0–YMM15. introduced with Intel's Sandy Bridge processors, SIMD registers widened to 256 bits; AVX (Advanced Vector Extensions) instructions also introduced. 512-bit SIMD registers ZMM0–ZMM31. Used by Knights Corner (on Intel Xeon Phi co-processors)

General Purpose Registers (A, B, C and D) 64 56 48 40 32 24 16 8 R?X E?X  ?X  ?H  ?L General Purpose AL/AH/AX/EAX/RAX: Accumulator BL/BH/BX/EBX/RBX: Base index (for use with arrays) CL/CH/CX/ECX/RCX: Counter (for use with loops and strings) DL/DH/DX/EDX/RDX: Extend the precision of the accumulator (e.g. combine 32-bit EAX and EDX for 64-bit integer operations in 32-bit code) R8-R15 (for 64-bit CPUs) 64-bit mode-only General Purpose Registers ( R8, R9, R10, R11, R12, R13, R14, R15) 64 56 48 40 32 24 16 8  ?  ?D  ?W  ?B

Address/Index Registers SI/ESI/RSI: Source index for string operations. DI/EDI/RDI: Destination index for string operations. Index Registers (S and D) 64 56 48 40 32 24 16 8 R?I E?I  ?I  ?IL Note: The ?IL registers are only available in 64-bit mode. Stack Pointer Register SP/ESP/RSP: Stack pointer for top address of the stack. BP/EBP/RBP: Stack base pointer for holding the address of the current stack frame. Pointer Registers (S and B) 64 56 48 40 32 24 16 8 R?P E?P ?P  ?PL Note: The ?PL registers are only available in 64-bit mode.

Instruction Pointer Register IP/EIP/RIP: Instruction pointer. Holds the program counter, the current instruction address. Instruction Pointer Register (I) 64 56 48 40 32 24 16 8 RIP EIP IP Segment registers CS: Code DS: Data SS: Stack ES: Extra data FS: Extra data #2 GS: Extra data #3 Segment Registers (C, D, S, E, F and G) 16 8  ?S

MODERN x86 REGISTER MAP

First introduced with Intel 8086 and 8088 16-bit CPUs. Used by Intel, AMD , Cyrix, NEC, and Zilog Inherited many characteristics and instructions from the previous generation of 8-bit CPUs such as the 8080. modern x86 instruction set is a superset of 8086 instructions and a series of extensions to this instruction set that began with the Intel 8008 microprocessor. Nearly full binary backward compatibility (between the Intel 8086 chip through to the current generation of x86 processors, with certain exceptions) Using instructions that will execute on either anything later than an Intel 80386 (or fully compatible clone) processor or else anything later than an Intel Pentium (or compatible clone) processor, (In recent years various software requirements need at least support for later specific extensions to the instruction set, e.g., MMX or SIMD). x86 INSTRUCTION SET ARCHITECTURE

Basic Instruction Format most registers are expressed in opcodes using three or four bits to conserve encoding space; at most one operand to an instruction can be a memory location memory operand may also be the destination (or a combined source and destination), while the other operand, the source , can be either register or immediate . The relatively small number of general registers (also inherited from its 8-bit ancestors) has made register-relative addressing (using small immediate offsets) an important method of accessing operands, especially on the stack, making such accesses as fast as register accesses, i.e. a one cycle instruction throughput, in most circumstances where the accessed data is available in the top-level cache.

IA-32E Mode sub-modes: Compatibility Mode ( 64-bit, legacy protected mode) 64-Bit Mode (full access to 64-bit address) REX Prefixes REX prefixes are instruction-prefix bytes used in 64-bit mode. They do the following: • Specify GPRs and SSE registers. • Specify 64-bit operand size. • Specify extended control registers. Not all instructions require a REX prefix in 64-bit mode. A prefix is necessary only if an instruction references one of the extended registers or uses a 64-bit operand. If a REX prefix is used when it has no meaning, it is ignored. Only one REX prefix is allowed per instruction. If used, the prefix must immediately precede the opcode byte or the two-byte opcode escape prefix (if present). Other placements are ignored. The instruction-size limit of 15 bytes still applies to instructions with a REX prefix. Instruction format for protected mode, real-address mode, and virtual-8086 mode The Intel 64 and IA-32 architectures instruction encodings are subsets of the format shown. Instructions consist of optional instruction prefixes (in any order), primary opcode bytes (up to three bytes), an addressing-form specifier (if required) consisting of the ModR /M byte and sometimes the SIB (Scale-Index-Base) byte, a displacement (if required), and an immediate data field (if required)

Mnemonics and opcodes Each x86 assembly instruction is represented by a mnemonic which, often combined with one or more operands, translates to one or more bytes called an opcode; NOP : 0x90 HLT : 0xF4 There are potential opcodes with no documented mnemonic which different processors may interpret differently, making a program using them behave inconsistently or even generate an exception on some processors. These opcodes often turn up in code writing competitions as a way to make the code smaller, faster, more elegant or just show off the author's prowess. Demonstrates how to find undocumented opcodes in x86 CPUs: https://www.youtube.com/watch?v=KrksBdWcZgQ

Syntax x86 assembly language has two main syntax branches: Intel syntax , originally used for documentation of the x86 platform and is dominant in the MS-DOS and Windows world (Many x86 assemblers use Intel syntax , including NASM, FASM, MASM, TASM, and YASM) AT&T syntax is dominant in the Unix world, since Unix was created at AT&T Bell Labs Summary of the main differences between Intel syntax and AT&T syntax : AT&T Intel Parameter order Source before the destination. mov $5, % eax Destination before source. mov eax , 5 Parameter size Mnemonics are suffixed with a letter indicating the size of the operands: q for qword, l for long ( dword ), w for word, and b for byte . addl $4, % esp Derived from the name of the register that is used (e.g. rax , eax , ax, al imply q, l, w, b , respectively). add esp , 4 Sigils Immediate values prefixed with a "$" , registers prefixed with a " % ". The assembler automatically detects the type of symbols; i.e., whether they are registers, constants or something else. Effective addresses General syntax of DISP(BASE,INDEX,SCALE) . Example : movl mem_location (%ebx,%ecx, 4 ), % eax Arithmetic expressions in square brackets; additionally, size keywords like byte , word , or dword have to be used if the size cannot be determined from the operands. Example : mov eax , [ ebx + ecx * 4 + mem_location ]

Execution modes Real mode (16-bit) Original operating mode of early generation x86 CPUs Protected mode (16-bit and 32-bit) 16-bit subset of instructions are available on the 16-bit x86 processors. These instructions are available in real mode on all x86 processors, and in 16-bit protected mode (80286 onwards), additional instructions relating to protected mode are available. On the 80386 and later, 32-bit instructions (including later extensions) are also available in all modes, including real mode. protected of 80286 was extended to allow the 80386 to address up to 4 GB of memory, The 32-bit flat memory model of the 80386's helped drive large scale adoption of Windows 3.1 (which relied on protected mode) since Windows could now run many applications at once, including DOS applications, by using virtual memory and simple multitasking. Virtual 8086 mode (16-bit) virtual 8086 mode ( VM86 ) made it possible to run one or more real mode programs in a protected environment which emulated real mode, (some programs could not run fully compatible) System Management Mode (16-bit) SMM , with some of its own special instructions, is available on some Intel i386SL, i486 and later CPUs Long mode (64-bit) 64-bit instructions, and more registers, are also available. The instruction set is similar in each mode but memory addressing and word size vary, requiring different programming strategies.

Segmented addressing (real, vm86, 80286 protected modes) uses a process known as segmentation to address memory Segmentation composes a memory address from two parts: a segment and an offset ; the segment points to the beginning of a 64 KB group of addresses and the offset determines how far from this beginning address the desired address is. In segmented addressing, two registers are required for a complete memory address: one to hold the segment, the other to hold the offset. In order to translate back into a flat address, the segment value is shifted four bits left (equivalent to multiplication by 2 4 or 16) then added to the offset to form the full address, which allows breaking the 64k barrier through clever choice of addresses, though it makes programming considerably more complex. Example: DS = 0xDEAD, DX = 0xCAFE memory address = 0xDEAD * 0x10 + 0xCAFE = 0xEB5CE. Therefore, the CPU can address up to 1,048,576 bytes (1 MB) in real mode. By combining segment and offset values we find a 20-bit address. When referring to an address with a segment and an offset the notation of segment : offset is used, so in the above example the flat address 0xEB5CE can be written as 0xDEAD:0xCAFE or as a segment and offset register pair; DS:DX.

There are some special combinations of segment registers and general registers that point to important addresses: CS:IP (CS is Code Segment , IP is Instruction Pointer ) points to the address where the processor will fetch the next byte of code. SS:SP (SS is Stack Segment , SP is Stack Pointer ) points to the address of the top of the stack, i.e. the most recently pushed byte. DS:SI (DS is Data Segment , SI is Source Index ) is often used to point to string data that is about to be copied to ES:DI. ES:DI (ES is Extra Segment , DI is Destination Index ) is typically used to point to the destination for a string copy, as mentioned above. In 80286 protected mode (utilized by OS/2 ) 80286 had 16-bit address registers, limiting only 2 16 bytes (64 kilobytes) of addressable space. In protected mode, the CPU can use 24-bit addressing to access 2 24 bytes of memory (16 megabytes). In protected mode, the segment selector can be broken down into three parts: a 13-bit index, a Table Indicator bit that determines whether the entry is in the GDT or LDT and a 2-bit Requested Privilege Level

BASIC x86 INSTRUCTIONS

Stack instructions PUSH  src / immed Decrements SP by the size of the operand (two or four, byte values are sign extended) and transfers one word from source to the stack top (SS:SP). POP dest Transfers word at the current stack top (SS:SP) to the destination then increments SP by two to point to the new stack top. CS is not a valid destination. PUSHA PUSHAD (386+) Pushes all general purpose registers onto the stack in the following order: (E)AX, (E)CX, (E)DX, (E)BX, (E)SP, (E)BP, (E)SI, (E)DI. The value of SP is the value before the actual push of SP. POPA POPAD (386+) Pops the top 8 words off the stack into the 8 general purpose 16/32 bit registers. Registers are popped in the following order: (E)DI, (E)SI, (E)BP, (E)SP, (E)DX, (E)CX and (E)AX. The (E)SP value popped from the stack is actually discarded. POPF POPFD (386+) Pops word / doubleword from stack into the Flags Register and then increments SP by 2 (for POPF) or 4 (for POPFD).

Integer ALU instructions standard mathematical operations: ADD dest,src Modifies flags: AF CF OF PF SF ZF Adds " src " to " dest " and replacing the original contents of " dest ". Both operands are binary. SUB dest,src Modifies flags: AF CF OF PF SF ZF The source is subtracted from the destination and the result is stored in the destination. MUL src Modifies flags: CF OF (AF,PF,SF,ZF undefined) Unsigned multiply of the accumulator by the source. If " src " is a byte value, then AL is used as the other multiplicand and the result is placed in AX. If " src " is a word value, then AX is multiplied by " src " and DX:AX receives the result. If " src " is a double word value, then EAX is multiplied by " src " and EDX:EAX receives the result. The 386+ uses an early out algorithm which makes multiplying any size value in EAX as fast as in the 8 or 16 bit registers. DIV src Modifies flags: (AF,CF,OF,PF,SF,ZF undefined) Unsigned binary division of accumulator by source. If the source divisor is a byte value then AX is divided by " src " and the quotient is placed in AL and the remainder in AH. If source operand is a word value, then DX:AX is divided by " src " and the quotient is stored in AX and the remainder in DX.

logical operators: AND dest,src Modifies flags: CF OF PF SF ZF (AF undefined) Performs a logical AND of the two operands replacing the destination with the result. OR dest,src Modifies flags: CF OF PF SF ZF (AF undefined) Logical inclusive OR of the two operands returning the result in the destination. Any bit set in either operand will be set in the destination. XOR dest,src Modifies flags: CF OF PF SF ZF (AF undefined) Performs a bitwise exclusive OR of the operands and returns the result in the destination. NEG dest Modifies flags: AF CF OF PF SF ZF Subtracts the destination from 0 and saves the 2s complement of " dest " back into " dest "

bitshift arithmetic and logical: SAL dest,count Modifies flags: CF OF PF SF ZF (AF undefined) SHL dest,count .-. .---------------. .-. |C|<----|7 <---------- 0|<----|0| '-' '---------------' '-' Shifts the destination left by "count" bits with zeroes shifted in on right. The Carry Flag contains the last bit shifted out. SAR dest,count Modifies flags: CF OF PF SF ZF (AF undefined) .---------------. .-. .--|7 ----------> 0|---->|C| | '---------------' '-' '---^ Shifts the destination right by "count" bits with the current sign bit replicated in the leftmost bit. The Carry Flag contains the last bit shifted out. SHR dest,count Modifies flags: CF OF PF SF ZF (AF undefined) .-. .---------------. .-. |0|---->|7 ----------> 0|---->|C| '-' '---------------' '-' Shifts the destination right by "count" bits with zeroes shifted in on the left. The Carry Flag contains the last bit shifted out.

rotate with and without carry: RCL dest,count Modifies flags: CF OF .-. .---------------. .--|C|<----|7 <---------- 0|<-. | '-' '---------------' | '-----------------------------' Rotates the bits in the destination to the left "count" times with all data pushed out the left side re-entering on the right. The Carry Flag holds the last bit rotated out. RCR dest,count Modifies flags: CF OF .---------------. .-. .->|7 ----------> 0|---->|C|--. | '---------------' '-' | '-----------------------------' Rotates the bits in the destination to the right "count" times with all data pushed out the right side re-entering on the left. The Carry Flag holds the last bit rotated out. ROL dest,count Modifies flags: CF OF .-. .---------------. |C|<-.--|7 <---------- 0|<-. '-' | '---------------' | '---------------------‘ Rotates the bits in the destination to the left "count" times with all data pushed out the left side re-entering on the right. The Carry Flag will contain the value of the last bit rotated out. ROR dest,count Modifies flags: CF OF .---------------. .-. .->|7 ----------> 0|--.->|C| | '---------------' | '-' '---------------------' Rotates the bits in the destination to the right "count" times with all data pushed out the right side re-entering on the left. The Carry Flag will contain the value of the last bit rotated out.

complement of BCD arithmetic instructions / others AAA Modifies flags: AF CF (OF,PF,SF,ZF undefined) Changes contents of AL to valid unpacked decimal. The high order nibble is zeroed. AAD Modifies flags: SF ZF PF (AF,CF,OF undefined) Used before dividing unpacked decimal numbers. Multiplies AH by 10 and the adds result into AL. Sets AH to zero. This instruction is also known to have an undocumented behavior. AL := 10*AH+AL AH := 0 AAM Modifies flags: PF SF ZF (AF,CF,OF undefined) AH := AL / 10 AL := AL mod 10 Used after multiplication of two unpacked decimal numbers, this instruction adjusts an unpacked decimal number. The high order nibble of each byte must be zeroed before using this instruction. This instruction is also known to have an undocumented behavior. AAS Modifies flags: AF CF (OF,PF,SF,ZF undefined) Corrects result of a previous unpacked decimal subtraction in AL. High order nibble is zeroed. DAA Modifies flags: AF CF PF SF ZF (OF undefined) Corrects result (in AL) of a previous BCD addition operation. Contents of AL are changed to a pair of packed decimal digits. DAS Modifies flags: AF CF PF SF ZF (OF undefined) Corrects result (in AL) of a previous BCD subtraction operation. Contents of AL are changed to a pair of packed decimal digits.

Data manipulation instructions data transfer instructions MOV dest,src Copies byte or word from the source operand to the destination operand. If the destination is SS interrupts are disabled except on early buggy 808x CPUs. Some CPUs disable interrupts if the destination is any of the segment registers XCHG dest,src Exchanges contents of source and destination. MOVSX dest,src Copies the value of the source operand to the destination register with the sign extended. MOVZX dest,src Copies the value of the source operand to the destination register with the zeroes extended. CMPXCHG dest,src (486+) Modifies flags: AF CF OF PF SF ZF Compares the accumulator (8-32 bits) with " dest ". If equal the " dest " is loaded with " src ", otherwise the accumulator is loaded with " dest ". CWD Extends sign of word in register AX throughout register DX forming a doubleword quantity in DX:AX. CDQ Converts signed DWORD in EAX to a signed quad word in EDX:EAX by extending the high order bit of EAX throughout EDX

string/array instructions MOVS dest,src MOVSB MOVSW MOVSD (386+) Copies data from addressed by DS:SI (even if operands are given) to the location ES:DI destination and updates SI and DI based on the size of the operand or instruction used. SI and DI are incremented when the Direction Flag is cleared and decremented when the Direction Flag is Set. Use with REP prefixes. CMPS dest,src Modifies flags: AF CF OF PF SF ZF CMPSB CMPSW CMPSD (386+) Subtracts destination value from source without saving results. Updates flags based on the subtraction and the index registers (E)SI and (E)DI are incremented or decremented depending on the state of the Direction Flag. CMPSB inc/decrements the index registers by 1, CMPSW inc/decrements by 2, while CMPSD increments or decrements by 4. The REP prefixes can be used to process entire data items.

SCAS string Modifies flags: AF CF OF PF SF ZF SCASB SCASW SCASD (386+) Compares value at ES:DI (even if operand is specified) from the accumulator and sets the flags similar to a subtraction. DI is incremented/decremented based on the instruction format (or operand size) and the state of the Direction Flag. Use with REP prefixes. LODS src LODSB LODSW LODSD (386+) Transfers string element addressed by DS:SI (even if an operand is supplied) to the accumulator. SI is incremented based on the size of the operand or based on the instruction used. If the Direction Flag is set SI is decremented, if the Direction Flag is clear SI is incremented. Use with REP prefixes. STOS dest STOSB STOSW STOSD Stores value in accumulator to location at ES:(E)DI (even if operand is given). (E)DI is incremented/decremented based on the size of the operand (or instruction format) and the state of the Direction Flag. Use with REP prefixes.

REP Repeats execution of string instructions while CX != 0. After each string operation, CX is decremented and the Zero Flag is tested. The combination of a repeat prefix and a segment override on CPU's before the 386 may result in errors if an interrupt occurs before CX=0. The following code shows code that is susceptible to this and how to avoid it: again: rep movs byte ptr ES:[DI],ES:[SI] ; vulnerable instr. jcxz next ; continue if REP successful loop again ; interrupt goofed count next: REPE REPZ Repeats execution of string instructions while CX != 0 and the Zero Flag is set. CX is decremented and the Zero Flag tested after each string operation. The combination of a repeat prefix and a segment override on processors other than the 386 may result in errors if an interrupt occurs before CX=0. REPNE REPNZ Repeats execution of string instructions while CX != 0 and the Zero Flag is clear. CX is decremented and the Zero Flag tested after each string operation. The combination of a repeat prefix and a segment override on processors other than the 386 may result in errors if an interrupt occurs before CX=0.

Program flow conditional jumps Mnemonic Meaning Jump Condition JA Jump if Above CF=0 and ZF=0 JAE Jump if Above or Equal CF=0 JB Jump if Below CF=1 JBE Jump if Below or Equal CF=1 or ZF=1 JC Jump if Carry CF=1 JCXZ Jump if CX Zero CX=0 JE Jump if Equal ZF=1 JG Jump if Greater (signed) ZF=0 and SF=OF JGE Jump if Greater or Equal (signed) SF=OF JL Jump if Less (signed) SF != OF JLE Jump if Less or Equal (signed) ZF=1 or SF != OF JNA Jump if Not Above CF=1 or ZF=1 JNAE Jump if Not Above or Equal CF=1 JNB Jump if Not Below CF=0 JNBE Jump if Not Below or Equal CF=0 and ZF=0 JNC Jump if Not Carry CF=0 JNE Jump if Not Equal ZF=0 JNG Jump if Not Greater (signed) ZF=1 or SF != OF JNGE Jump if Not Greater or Equal (signed) SF != OF JNL Jump if Not Less (signed) SF=OF JNLE Jump if Not Less or Equal (signed) ZF=0 and SF=OF JNO Jump if Not Overflow (signed) OF=0 JNP Jump if No Parity PF=0 JNS Jump if Not Signed (signed) SF=0 JNZ Jump if Not Zero ZF=0 JO Jump if Overflow (signed) OF=1 JP Jump if Parity PF=1 JPE Jump if Parity Even PF=1 JPO Jump if Parity Odd PF=0 JS Jump if Signed (signed) SF=1 JZ Jump if Zero

JCXZ label JECXZ label (386+) Causes execution to branch to "label" if register CX is zero. Uses unsigned comparision . JMP target Unconditionally transfers control to "label". Jumps by default are within -32768 to 32767 bytes from the instruction following the jump. NEAR and SHORT jumps cause the IP to be updated while FAR jumps cause CS and IP to be updated. LEAVE Releases the local variables created by the previous ENTER instruction by restoring SP and BP to their condition before the procedure stack frame was initialized. ENTER locals, level Modifies stack for entry to procedure for high level language. Operand "locals" specifies the amount of storage to be allocated on the stack. "Level" specifies the nesting level of the routine. Paired with the LEAVE instruction, this is an efficient method of entry and exit to procedures.

LOOP label Decrements CX by 1 and transfers control to "label" if CX is not Zero. The "label" operand must be within -128 or 127 bytes of the instruction following the loop instruction LOOPE label LOOPZ label Decrements CX by 1 (without modifying the flags) and transfers control to "label" if CX != 0 and the Zero Flag is set. The "label" operand must be within -128 or 127 bytes of the instruction following the loop instruction. LOOPNZ label LOOPNE label Decrements CX by 1 (without modifying the flags) and transfers control to "label" if CX != 0 and the Zero Flag is clear. The "label" operand must be within -128 or 127 bytes of the instruction following the loop instruction. INT num Modifies flags: TF IF Initiates a software interrupt by pushing the flags, clearing the Trap and Interrupt Flags, pushing CS followed by IP and loading CS:IP with the value found in the interrupt vector table. Execution then begins at the location addressed by the new CS:IP CALL destination Pushes Instruction Pointer (and Code Segment for far calls) onto stack and loads Instruction Pointer with the address of proc-name. Code continues with execution at CS:IP. RET/RETF/RETN nBytes Transfers control from a procedure back to the instruction address saved on the stack. "n bytes“ is an optional number of bytes to release. Far returns pop the IP followed by the CS, while near returns pop only the IP register.

Segment Register Instructions The segment register instructions allow far pointers (segment addresses) to be loaded into the segment registers. LDS dest,src Loads 32-bit pointer from memory source to destination register and DS. The offset is placed in the destination register and the segment is placed in DS. To use this instruction the word at the lower memory address must contain the offset and the word at the higher address must contain the segment. This simplifies the loading of far pointers from the stack and the interrupt vector table. LFS dest,src Loads 32-bit pointer from memory source to destination register and FS. The offset is placed in the destination register and the segment is placed in FS. To use this instruction the word at the lower memory address must contain the offset and the word at the higher address must contain the segment. This simplifies the loading of far pointers from the stack and the interrupt vector table. LEA dest,src Transfers offset address of " src " to the destination register. LES dest,src Loads 32-bit pointer from memory source to destination register and ES. The offset is placed in the destination register and the segment is placed in ES. To use this instruction the word at the lower memory address must contain the offset and the word at the higher address must contain the segment. This simplifies the loading of far pointers from the stack and the interrupt vector table. LSS dest,src Loads 32-bit pointer from memory source to destination register and SS. The offset is placed in the destination register and the segment is placed in SS. To use this instruction the word at the lower memory address must contain the offset and the word at the higher address must contain the segment. This simplifies the loading of far pointers from the stack and the interrupt vector table.

I/O INSTRUCTIONS These instructions move data between the processor’s I/O ports and a register or memory. IN accum,port A byte, word or dword is read from "port" and placed in AL, AX or EAX respectively. If the port number is in the range of 0-255 it can be specified as an immediate, otherwise the port number must be specified in DX. Valid port ranges on the PC are 0-1024, though values through 65535 may be specified and recognized by third party vendors and PS/2's. OUT port,accum Transfers byte in AL,word in AX or dword in EAX to the specified hardware port address. If the port number is in the range of 0-255 it can be specified as an immediate. If greater than 255 then the port number must be specified in DX. Since the PC only decodes 10 bits of the port address, values over 1023 can only be decoded by third party vendor equipment and also map to the port range 0-1023. INS dest,port INSB INSW INSD (386+) Loads data from port to the destination ES:(E)DI (even if a destination operand is supplied). (E)DI is adjusted by the size of the operand and increased if the Direction Flag is cleared and decreased if the Direction Flag is set. For INSB, INSW, INSD no operands are allowed and the size is determined by the mnemonic. OUTS port,src OUTSB OUTSW OUTSD (386+) Transfers a byte, word or doubleword from " src " to the hardware port specified in DX. For instructions with no operands the " src " is located at DS:SI and SI is incremented or decremented by the size of the operand or the size dictated by the instruction format. When the Direction Flag is set SI is decremented, when clear, SI is incremented. If the port number is in the range of 0-255 it can be specified as an immediate. If greater than 255 then the port number must be specified in DX. Since the PC only decodes 10 bits of the port address, values over 1023 can only be decoded by third party vendor equipment and also map to the port range 0-1023.

Flag Control (EFLAG) Instructions The flag control instructions operate on the flags in the EFLAGS register STC Modifies flags: CF Sets the Carry Flag to 1. STD Modifies flags: DF Sets the Direction Flag to 1 causing string instructions to auto-decrement SI and DI instead of auto-increment STI Modifies flags: IF Sets the Interrupt Flag to 1, which enables recognition of all hardware interrupts. If an interrupt is generated by a hardware device, an End of Interrupt (EOI) must also be issued to enable other hardware interrupts of the same or lower priority. SAHF Modifies flags: AF CF PF SF ZF Transfers bits 0-7 of AH into the Flags Register. This includes AF, CF, PF, SF and ZF. LAHF Copies bits 0-7 of the flags register into AH. This includes flags AF, CF, PF, SF and ZF other bits are undefined. AH := SF ZF xx AF xx PF xx CF CLC Modifies flags: CF Clears the Carry Flag. CLD Modifies flags: DF Clears the Direction Flag causing string instructions to increment the SI and DI index registers. CLI Modifies flags: IF Disables the maskable hardware interrupts by clearing the Interrupt flag. NMI's and software interrupts are not inhibited. CLTS Clears the Task Switched Flag in the Machine Status Register. This is a privileged operation and is generally used only by operating system code.

Miscellaneous Instructions The miscellaneous instructions provide such functions as loading an effective address, executing a “no-operation,” and retrieving processor identification information. NOP This is a do nothing instruction. It results in occupation of both space and time and is most useful for patching code segments. (This is the original XCHG AL,AL instruction) XLAT translation-table XLATB ( masm 5.x) Replaces the byte in AL with byte from a user table addressed by BX. The original value of AL is the index into the translate table. The best way to discripe this is MOV AL,[BX+AL] CPUID Processor Identification

OTHERS Floating-point instructions instructions for a stack-based floating-point unit (FPU). The FPU instructions: addition, subtraction, negation, multiplication, division, remainder, square roots, integer truncation, fraction truncation, and scale by power of two. The operations also include conversion instructions, which can load or store a value from memory in any of the following formats: binary-coded decimal, 32-bit integer, 64-bit integer, 32-bit floating-point, 64-bit floating-point or 80-bit floating-point (upon loading, the value is converted to the currently used floating-point mode). transcendental functions: sine, cosine, tangent, arctangent, exponentiation with the base 2 and logarithms to bases 2, 10, or e . The stack register to stack register format of the instructions: f op st , st (n) or f op st (n), st where st is equivalent to st (0) , and st (n) is one of the 8 stack registers ( st (0 ), st (1),…, st (7 )). Like the integers, the first operand is both the first source operand and the destination operand. fsubr and fdivr should be singled out as first swapping the source operands before performing the subtraction or division. The addition, subtraction, multiplication, division, store and comparison instructions include instruction modes that pop the top of the stack after their operation is complete. So, for example, faddp st (1), st performs the calculation st (1) = st (1) + st (0), then removes st (0) from the top of stack, thus making what was the result in st (1) the top of the stack in st (0).

SIMD instructions Modern x86 CPUs contain SIMD instructions, which largely perform the same operation in parallel on many values encoded in a wide SIMD register. Various instruction technologies support different operations on different register sets, but taken as complete whole (from MMX to SSE4.2) they include general computations on integer or floating point arithmetic (addition, subtraction, multiplication, shift, minimization, maximization, comparison, division or square root). So for example, paddw mm0, mm1 performs 4 parallel 16-bit (indicated by the w) integer adds (indicated by the padd ) of mm0 values to mm1 and stores the result in mm0. Streaming SIMD Extensions or SSE also includes a floating point mode in which only the very first value of the registers is actually modified (expanded in SSE2). Some other unusual instructions have been added including a sum of absolute differences (used for motion estimation in video compression, such as is done in MPEG) and a 16-bit multiply accumulation instruction (useful for software-based alpha-blending and digital filtering). SSE (since SSE3) and 3DNow! extensions include addition and subtraction instructions for treating paired floating point values like complex numbers. These instruction sets also include numerous fixed sub-word instructions for shuffling, inserting and extracting the values around within the registers. In addition there are instructions for moving data between the integer registers and XMM (used in SSE)/FPU (used in MMX) registers.

Sources: https://en.wikipedia.org/wiki/Stored-program_computer https://en.wikipedia.org/wiki/Von_Neumann_architecture https://en.wikipedia.org/wiki/Harvard_architecture https://en.wikipedia.org/wiki/X86 https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf http://www.masm32.com/
Tags