how to get address of variable and dereference it in nasm x86 assembly?

11,990

There are no variables in assembly. (*)

variable db 'A'

Does several things. It defines assembly-time symbol variable, which is like bookmark into memory, containing address of *here* in the time of compilation. It's same thing as doing label on empty line like:

variable:

The db 'A' directive is "define byte", and you give it single byte value to be defined, so it will produce single byte into resulting machine code with value 0x41 or 65 in decimal. That's the value of big letter A in ASCII encoding.

Then:

mov ecx , [variable]

Does load 4 bytes from memory cells at address variable, which means the low 8 bits ecx will contain the value 65, and the upper 24 bits will contain some junk which happened to reside in the following 3 bytes after the 'A' .. (would you use db 'ABCD', then the ecx would be equal to value 0x44434241 ('D' 'C' 'B' 'A' letters, "reversed" in bits due to little-endian encoding of dword values on x86).

But the sys_write expect the ecx to hold address of memory, where the content bytes are stored, so you need instead:

mov ecx, variable

That will in NASM load address of the data into ecx.

(in MASM/TASM this would instead assemble as mov ecx,[variable] and to get address you have to use mov ecx, OFFSET variable, in case you happen to find some MASM/TASM example, be aware of the syntax difference).


*) some more info about "no variables". Keep in mind in assembly you are on the machine level. On the machine level there is computer memory, which is addressable by bytes (on x86 platform! There are some platforms, where memory may be addressable by different size, they are not common, but in micro-controllers world you may find some). So by using some memory address, you can access some particular byte(s) in the physical memory chip (which particular physical place in memory chip is addressed depends on your platform, the modern OS will usually give user application virtual addressing space, translated to physical addresses by CPU on the fly, transparently, without bothering user code about that translation).

All the advanced logical concepts like "variables", "arrays", "strings", etc... are just bunch of byte values in memory, and all that logical meaning is given to the memory data by the instructions being executed. When you look at those data without the context of the instructions, they are just some byte values in memory, nothing more.

So if you are not precise with your code, and you access single-byte "variable" by instruction fetching dword, like you did in your mov ecx,[variable] example, there's nothing wrong about that from the machine point of view, and it will happily fetch 4 bytes of memory into ecx register, nor the NASM is bothered to report you, that you are probably out-of-bounds accessing memory beyond your original variable definition. This is sort of stupid behaviour, if you think in terms like "variables", and other high-level programming languages concepts. But assembly is not intended for such work, actually having the full control over machine is the main purpose of assembly, and if you want to fetch 4 bytes, you can, it's all up to programmer. It just requires tremendous amount of precision, and attention to detail, staying aware of your memory structures layout, and using correct instructions with desired memory operand sizes, like movzx ecx,byte [variable] to load only single byte from memory, and zero-extend that value into full 32b value in the target ecx register.

Share:
11,990
Naveen prakash
Author by

Naveen prakash

I am basically a curiosity freak! i am very much interested in computer programming and how the computer works on low-level. i mostly write code in java and c .

Updated on July 23, 2022

Comments

  • Naveen prakash
    Naveen prakash over 1 year

    in c language we use & to get the address of a variable and * to dereference the variable.

    
        int variable=10; 
        int *pointer;
        pointer = &variable;

    How to do it in nasm x86 assembly language.
    i read nasm manual and found that [ variable_address ] works like dereferencing.( i maybe wrong ).

    section .data
    variable db 'A'
    section .text
    global _start
    _start:
    mov eax , 4
    mov ebx , 1
    mov ecx , [variable]  
    mov edx , 8
    int 0x80
    mov eax ,1
    int 0x80
    



    i executed this code it prints nothing. i can't understand what is wrong with my code. need your help to understand pointer and dereferencing in nasm x86.