How to read the Intel Opcode notation

10,414

Solution 1

My favorite source is Intel itself: Intel® 64 and IA-32 Architectures Software Developer Manuals. And unlike versions past, all of the volumes are now nicely wrapped up in a single (3044 page) PDF.

It looks like the section that will help you most is 3.1.1.1 in Chapter 3 of Volume 2 (page 432 of the latest PDF as of the date I am writing this).

Solution 2

3.1.1.1 Opcode Column in the Instruction Summary Table (Instructions without VEX Prefix)

The “Opcode” column in the table above shows the object code produced for each form of the instruction. When possible, codes are given as hexadecimal bytes in the same order in which they appear in memory. Definitions of entries other than hexadecimal bytes are as follows:

• REX.W — Indicates the use of a REX prefix that affects operand size or instruction semantics. The ordering of the REX prefix and other optional/mandatory instruction prefixes are discussed in Chapter 2. Note that REX prefixes that promote legacy instructions to 64-bit behavior are not listed explicitly in the opcode column.

• /digit — A digit between 0 and 7 indicates that the ModR/M byte of the instruction uses only the r/m (register or memory) operand. The reg field contains the digit that provides an extension to the instruction's opcode.

• /r — Indicates that the ModR/M byte of the instruction contains a register operand and an r/m operand.

• cb, cw, cd, cp, co, ct — A 1-byte (cb), 2-byte (cw), 4-byte (cd), 6-byte (cp), 8-byte (co) or 10-byte (ct) value following the opcode. This value is used to specify a code offset and possibly a new value for the code segment register.

• ib, iw, id, io — A 1-byte (ib), 2-byte (iw), 4-byte (id) or 8-byte (io) immediate operand to the instruction that follows the opcode, ModR/M bytes or scale-indexing bytes. The opcode determines if the operand is a signed value. All words, doublewords and quadwords are given with the low-order byte first.

• +rb, +rw, +rd, +ro — Indicates the lower 3 bits of the opcode byte is used to encode the register operand without a modR/M byte. The instruction lists the corresponding hexadecimal value of the opcode byte with low 3 bits as 000b. In non-64-bit mode, a register code, from 0 through 7, is added to the hexadecimal value of the opcode byte. In 64-bit mode, indicates the four bit field of REX.b and opcode[2:0] field encodes the register operand of the instruction. “+ro” is applicable only in 64-bit mode. See Table 3-1 for the codes.

• +i — A number used in floating-point instructions when one of the operands is ST(i) from the FPU register stack. The number i (which can range from 0 to 7) is added to the hexadecimal byte given at the left of the plus sign to form a single opcode byte.

3.1.1.3 Instruction Column in the Opcode Summary Table

The “Instruction” column gives the syntax of the instruction statement as it would appear in an ASM386 program.

The following is a list of the symbols used to represent operands in the instruction statements:

• rel8 — A relative address in the range from 128 bytes before the end of the instruction to 127 bytes after the end of the instruction.

• rel16, rel32 — A relative address within the same code segment as the instruction assembled. The rel16 symbol applies to instructions with an operand-size attribute of 16 bits; the rel32 symbol applies to instructions with an operand-size attribute of 32 bits.

• ptr16:16, ptr16:32 — A far pointer, typically to a code segment different from that of the instruction. The notation 16:16 indicates that the value of the pointer has two parts. The value to the left of the colon is a 16-bit selector or value destined for the code segment register. The value to the right corresponds to the offset within the destination segment. The ptr16:16 symbol is used when the instruction's operand-size attribute is 16 bits; the ptr16:32 symbol is used when the operand-size attribute is 32 bits.

• r8 — One of the byte general-purpose registers: AL, CL, DL, BL, AH, CH, DH, BH, BPL, SPL, DIL and SIL; or one of the byte registers (R8L - R15L) available when using REX.R and 64-bit mode.

• r16 — One of the word general-purpose registers: AX, CX, DX, BX, SP, BP, SI, DI; or one of the word registers (R8-R15) available when using REX.R and 64-bit mode.

• r32 — One of the doubleword general-purpose registers: EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI; or one of the doubleword registers (R8D - R15D) available when using REX.R in 64-bit mode.

• r64 — One of the quadword general-purpose registers: RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8–R15. These are available when using REX.R and 64-bit mode.

• imm8 — An immediate byte value. The imm8 symbol is a signed number between –128 and +127 inclusive. For instructions in which imm8 is combined with a word or doubleword operand, the immediate value is sign-extended to form a word or doubleword. The upper byte of the word is filled with the topmost bit of the immediate value.

• imm16 — An immediate word value used for instructions whose operand-size attribute is 16 bits. This is a number between –32,768 and +32,767 inclusive.

• imm32 — An immediate doubleword value used for instructions whose operand-size attribute is 32 bits. It allows the use of a number between +2,147,483,647 and –2,147,483,648 inclusive.

• imm64 — An immediate quadword value used for instructions whose operand-size attribute is 64 bits. The value allows the use of a number between +9,223,372,036,854,775,807 and –9,223,372,036,854,775,808 inclusive.

• r/m8 — A byte operand that is either the contents of a byte general-purpose register (AL, CL, DL, BL, AH, CH, DH, BH, BPL, SPL, DIL and SIL) or a byte from memory. Byte registers R8L - R15L are available using REX.R in 64-bit mode.

• r/m16 — A word general-purpose register or memory operand used for instructions whose operand-size attribute is 16 bits. The word general-purpose registers are: AX, CX, DX, BX, SP, BP, SI, DI. The contents of memory are found at the address provided by the effective address computation. Word registers R8W - R15W are available using REX.R in 64-bit mode.

• r/m32 — A doubleword general-purpose register or memory operand used for instructions whose operand size attribute is 32 bits. The doubleword general-purpose registers are: EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI. The contents of memory are found at the address provided by the effective address computation. Doubleword registers R8D - R15D are available when using REX.R in 64-bit mode.

• r/m64 — A quadword general-purpose register or memory operand used for instructions whose operand-size attribute is 64 bits when using REX.W. Quadword general-purpose registers are: RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8–R15; these are available only in 64-bit mode. The contents of memory are found at the address provided by the effective address computation.

• m — A 16-, 32- or 64-bit operand in memory.

• m8 — A byte operand in memory, usually expressed as a variable or array name, but pointed to by the DS:(E)SI or ES:(E)DI registers. In 64-bit mode, it is pointed to by the RSI or RDI registers.

• m16 — A word operand in memory, usually expressed as a variable or array name, but pointed to by the DS:(E)SI or ES:(E)DI registers. This nomenclature is used only with the string instructions.

• m32 — A doubleword operand in memory, usually expressed as a variable or array name, but pointed to by the DS:(E)SI or ES:(E)DI registers. This nomenclature is used only with the string instructions.

• m64 — A memory quadword operand in memory.

• m128 — A memory double quadword operand in memory.

• m16:16, m16:32 & m16:64 — A memory operand containing a far pointer composed of two numbers. The number to the left of the colon corresponds to the pointer's segment selector. The number to the right corresponds to its offset.

• m16&32, m16&16, m32&32, m16&64 — A memory operand consisting of data item pairs whose sizes are indicated on the left and the right side of the ampersand. All memory addressing modes are allowed. The m16&16 and m32&32 operands are used by the BOUND instruction to provide an operand containing an upper and lower bounds for array indices. The m16&32 operand is used by LIDT and LGDT to provide a word with which to load the limit field, and a doubleword with which to load the base field of the corresponding GDTR and IDTR registers. The m16&64 operand is used by LIDT and LGDT in 64-bit mode to provide a word with which to load the limit field, and a quadword with which to load the base field of the corresponding GDTR and IDTR registers.

• moffs8, moffs16, moffs32, moffs64 — A simple memory variable (memory offset) of type byte, word, or doubleword used by some variants of the MOV instruction. The actual address is given by a simple offset relative to the segment base. No ModR/M byte is used in the instruction. The number shown with moffs indicates its size, which is determined by the address-size attribute of the instruction.

• Sreg — A segment register. The segment register bit assignments are ES = 0, CS = 1, SS = 2, DS = 3, FS = 4, and GS = 5.

• m32fp, m64fp, m80fp — A single-precision, double-precision, and double extended-precision (respectively) floating-point operand in memory. These symbols designate floating-point values that are used as operands for x87 FPU floating-point instructions.

• m16int, m32int, m64int — A word, doubleword, and quadword integer (respectively) operand in memory. These symbols designate integers that are used as operands for x87 FPU integer instructions.

• ST or ST(0) — The top element of the FPU register stack.

• ST(i) — The ith element from the top of the FPU register stack (i ← 0 through 7).

• mm — An MMX register. The 64-bit MMX registers are: MM0 through MM7.

• mm/m32 — The low order 32 bits of an MMX register or a 32-bit memory operand. The 64-bit MMX registers are: MM0 through MM7. The contents of memory are found at the address provided by the effective address computation.

• mm/m64 — An MMX register or a 64-bit memory operand. The 64-bit MMX registers are: MM0 through MM7. The contents of memory are found at the address provided by the effective address computation.

• xmm — An XMM register. The 128-bit XMM registers are: XMM0 through XMM7; XMM8 through XMM15 are available using REX.R in 64-bit mode.

• xmm/m32— An XMM register or a 32-bit memory operand. The 128-bit XMM registers are XMM0 through XMM7; XMM8 through XMM15 are available using REX.R in 64-bit mode. The contents of memory are found at the address provided by the effective address computation.

• xmm/m64 — An XMM register or a 64-bit memory operand. The 128-bit SIMD floating-point registers are XMM0 through XMM7; XMM8 through XMM15 are available using REX.R in 64-bit mode. The contents of memory are found at the address provided by the effective address computation.

• xmm/m128 — An XMM register or a 128-bit memory operand. The 128-bit XMM registers are XMM0 through XMM7; XMM8 through XMM15 are available using REX.R in 64-bit mode. The contents of memory are found at the address provided by the effective address computation.

• — Indicates implied use of the XMM0 register. When there is ambiguity, xmm1 indicates the first source operand using an XMM register and xmm2 the second source operand using an XMM register. Some instructions use the XMM0 register as the third source operand, indicated by . The use of the third XMM register operand is implicit in the instruction encoding and does not affect the ModR/M encoding.

• ymm — A YMM register. The 256-bit YMM registers are: YMM0 through YMM7; YMM8 through YMM15 are available in 64-bit mode.

• m256 — A 32-byte operand in memory. This nomenclature is used only with AVX instructions.

• ymm/m256 — A YMM register or 256-bit memory operand.

• — Indicates use of the YMM0 register as an implicit argument.

• bnd — A 128-bit bounds register. BND0 through BND3.

• mib — A memory operand using SIB addressing form, where the index register is not used in address calculation, Scale is ignored. Only the base and displacement are used in effective address calculation.

• m512 — A 64-byte operand in memory.

• zmm/m512 — A ZMM register or 512-bit memory operand.

• {k1}{z} — A mask register used as instruction writemask. The 64-bit k registers are: k1 through k7. Writemask specification is available exclusively via EVEX prefix. The masking can either be done as a merging masking, where the old values are preserved for masked out elements or as a zeroing masking. The type of masking is determined by using the EVEX.z bit.

• {k1} — Without {z}: a mask register used as instruction writemask for instructions that do not allow zeroing masking but support merging-masking. This corresponds to instructions that require the value of the aaa field to be different than 0 (e.g., gather) and store-type instructions which allow only merging-masking.

• k1 — A mask register used as a regular operand (either destination or source). The 64-bit k registers are: k0 through k7.

• mV — A vector memory operand; the operand size is dependent on the instruction.

• vm32{x,y, z} — A vector array of memory operands specified using VSIB memory addressing. The array of memory addresses are specified using a common base register, a constant scale factor, and a vector index register with individual elements of 32-bit index value in an XMM register (vm32x), a YMM register (vm32y) or a ZMM register (vm32z).

• vm64{x,y, z} — A vector array of memory operands specified using VSIB memory addressing. The array of memory addresses are specified using a common base register, a constant scale factor, and a vector index register with individual elements of 64-bit index value in an XMM register (vm64x), a YMM register (vm64y) or a ZMM register (vm64z).

• zmm/m512/m32bcst — An operand that can be a ZMM register, a 512-bit memory location or a 512-bit vector loaded from a 32-bit memory location.

• zmm/m512/m64bcst — An operand that can be a ZMM register, a 512-bit memory location or a 512-bit vector loaded from a 64-bit memory location.

• — Indicates use of the ZMM0 register as an implicit argument.

• {er} — Indicates support for embedded rounding control, which is only applicable to the register-register form of the instruction. This also implies support for SAE (Suppress All Exceptions).

• {sae} — Indicates support for SAE (Suppress All Exceptions). This is used for instructions that support SAE, but do not support embedded rounding control.

• SRC1 — Denotes the first source operand in the instruction syntax of an instruction encoded with the VEX/EVEX prefix and having two or more source operands.

• SRC2 — Denotes the second source operand in the instruction syntax of an instruction encoded with the VEX/EVEX prefix and having two or more source operands.

• SRC3 — Denotes the third source operand in the instruction syntax of an instruction encoded with the VEX/EVEX prefix and having three source operands.

• SRC — The source in a single-source instruction.

• DST — the destination in an instruction. This field is encoded by reg_field.

Solution 3

Many opcodes for immediate versions of instructions, including 83, use the 3-bit /r field in the ModR/M byte as 3 extra opcode bits. Intel's vol.2 manual documents this, and the opcode table in an appendix includes it, I think.

This is why most original-8086 immediate instructions, like and r/m, imm still only allow 2 operands, unlike shrd eax, edx, 4 or imul edx, [rdi], 12345 where both ModRM fields are used to encode dst/src operands, as well as the opcode implying an immediate operand.

SHRD/SHLD and were added with 386, and imul-immediate was added with 186. It's maybe unfortunate that copy-and-AND (and eax, edx, 0xf) isn't encodeable, but at least x86 can use LEA for very common copy-and-add or sub operations.

But if every immediate and one-operand instruction (like push or not) needed a full opcode to itself, 8086 would have run out of 1-byte opcodes. (Especially because the designer chose to spend a lot of coding space on short forms with no modrm byte for AL and AX, like cmp ax, 12345 being only 3 bytes instead of 4 in 16-bit mode, or cmp eax, imm32 being only 5 bytes instead of 6 for cmp r/m32, imm32 in 32-bit mode. And for single-byte xchg-with-ax, and one-byte inc/dec register.)


Example: decoding 48 83 C4 38. (from How does one opcode byte decode to different instructions depending on the "register/opcode" field? What is that?, a duplicate of this Q)

48 is a REX.W prefix (REX with only the W bit set, so it indicates 64-bit operand size, but no high registers).

Opcode 83 says it can be 7 different instructions depending on a field called "register/opcode field"

Each instruction's own docs, e.g. add (html extract of the vol2 manual), shows encodings like
REX.W + 83 /0 ib for ADD r/m64, imm8, which is what you have.

diagram of the ModRM bit fields from wiki.osdev.org

  7                           0
+---+---+---+---+---+---+---+---+
|  mod  |    reg    |     rm    |
+---+---+---+---+---+---+---+---+

0xc4 = 0b11000100, so the reg field = 0. Thus our opcode is 83 /0, in Intel's notation.

The rest of the ModRM fields are:

So the instruction is add rsp, 0x38

ndisasm -b64 agrees:

$ cat > foo.asm
db 0x48, 0x83, 0xC4, 0x38
$ nasm foo.asm     # create a flat binary with those bytes, not an object file
$ ndisasm -b64 foo
00000000  4883C438          add rsp,byte +0x38
Share:
10,414
asher
Author by

asher

Updated on June 07, 2022

Comments

  • asher
    asher almost 2 years

    I am reading some material about Intel Opcodes of assembly instructions, but I cannot understand what does it mean that follows the opcode byte. For example: cw, cd, /2, cp, /3.

    Please give me a hint what does it mean or where can I find the complete reference ?

    E8 cw CALL rel16 Call near, relative, displacement relative to next instruction
    E8 cd CALL rel32 Call near, relative, displacement relative to next instruction
    FF /2 CALL r/m16 Call near, absolute indirect, address given in r/m16
    FF /2 CALL r/m32 Call near, absolute indirect, address given in r/m32
    9A cd CALL ptr16:16 Call far, absolute, address given in operand
    9A cp CALL ptr16:32 Call far, absolute, address given in operand
    FF /3 CALL m16:16 Call far, absolute indirect, address given in m16:16
    FF /3 CALL m16:32 Call far, absolute indirect, address given in m16:32