Asm CALL instruction - how does it work?


Solution 1

It (that is, directly calling an import with a normal relative call) doesn't work, and that's why that's not how it's done.

To call an imported function, you go through something called the Import Address Table (IAT). In short, entries in the IAT first point to function names (ie it starts out as a copy of the Import Name Table), and those pointers are changed to point to the actual functions by the loader.

The IAT is at a fixed address, but can be relocated if the image has been rebased, so calling through it only involves a single indirection - so call r/m is used with a memory operand (which is just a simple constant) to call imported functions, for example call [0x40206C].

Solution 2

22 jan 2013: added additional more simple concrete examples and discussion, since (A) an incorrect answer has been selected as solution, and (B) my original answer was evidently not understood by some readers, including the OP. Sorry about that, mea culpa. I just posted an answer in a hurry then, adding a code example that I already had on hand.

How I interpret the question.

You ask,

“I've been studying the PE format but I'm quite confused about the relationship between the CALL ADDRESS instruction, the importing of a function from a dll and how does the CALL ADDRESS reach out the code in a DLL.”

The term CALL ADDRESS does not make much sense at the C++ level, so I’m assuming that you mean CALL ADDRESS at the assembly language or machine code level.

The problem is then, when a DLL is loaded at some address other than the preferred one, how are the call instructions connected to the DLL functions?

The short of it.

  • At the machine code level a call with specified address works by calling a minimal forwarding routine that consists of a single jmp instruction. The jmp instruction calls the DLL function via a table lookup. Typically an import library for a DLL exports both the DLL function itself, with an __imp__ name prefix, and the wrapper routine without such name prefix, e.g. __imp__MessageBoxA@16 and _MessageBoxA@16.

I.e., except that I’ve invented the names below, the assembler usually translates

    call MessageBox


    call MessageBox_forwarder
     ; whatever here
    MessageBox_forwarder: jmp ds:[MessageBox_tableEntry]

When the DLL is loaded the loader places the relevant addresses in the table(s).

  • At the assembly language level a call with the routine specified as just an identifier can map to either a call to a forwarder, or a call directly to the DLL function via a table lookup, depending on the type declared for the identifier.

  • There can be more than one table of DLL function addresses, even for imports from the same DLL. But in general they’re thought of as one big table, then called “the” Import Address Table, or IAT for short. The IAT table (or more precisely tables) are each at a fixed place in the image, i.e. they’re moved along with the code when it’s loaded somewhere not preferred, and not at a fixed address.

The currently selected solution answer is incorrect in these ways:

  • The answer maintains that “It doesn't work, and that's why that's not how it's done.”, where presumably the “It” refers to a CALL ADDRESS. But using CALL ADDRESS, in assembly or at the machine code level, works just fine for calling a DLL function. Provided it’s done correctly.

  • The answer maintains that the IAT is at a fixed address. But it isn’t.

CALL ADDRESS works just fine.

Let’s consider a concrete CALL ADDRESS instruction where the address is of a very well known DLL function, namely a call of the MessageBoxA Windows API function from the [user32.dll] DLL:

call MessageBoxA

There is no problem with using this instruction.

As you will see below, at the machine code level this call instruction itself just contains an offset that causes the call to go a jmp instruction, which looks up the DLL routine address in an Import Address Table of function pointers, which is usually fixed up by the loader when it loads the DLL in question.

In order to be able to inspect the machine code, here’s a complete 32-bit x86 assembly language program using that concrete example instruction:

.model flat, stdcall
option casemap :none        ; Case sensitive identifiers, please.
_as32bit        textequ <DWord ptr>

public  start

ExitProcess                     proto stdcall :DWord

MessageBoxA_t                   typedef proto stdcall :DWord, :DWord, :DWord, :DWord
extern MessageBoxA              : MessageBoxA_t
extern _imp__MessageBoxA@16     : ptr MessageBoxA_t

MB_ICONINFORMATION      equ     0040h
MB_SETFOREGROUND        equ     00010000h
infoBoxOptions          equ     MB_ICONINFORMATION or MB_SETFOREGROUND

boxtitle_1  db  "Just FYI 1 (of 3):", 0
boxtitle_2  db  "Just FYI 2 (of 3):", 0
boxtitle_3  db  "Just FYI 3 (of 3):", 0
boxtext     db  "There’s intelligence somewhere in the universe", 0

    push infoBoxOptions
    push offset boxtitle_1
    push offset boxtext
    push 0
    call MessageBoxA                    ; Call #1 - to jmp to DLL-func.

    push infoBoxOptions
    push offset boxtitle_2
    push offset boxtext
    push 0
    call ds:[_imp__MessageBoxA@16]      ; Call #2 - directly to DLL-func.

    push infoBoxOptions
    push offset boxtitle_3
    push offset boxtext
    push 0
    call _imp__MessageBoxA@16           ; Call #3 - same as #2, due to type of identifier.

    push 0  ; Exit code, 0 indicates success.
    call ExitProcess

Assembling and linking using Microsoft’s toolchain, where the /debug linker option asks the linker to produce a PDB debug info file for use with the Visual Studio debugger:

> ml /nologo /c asm_call.asm
 Assembling: asm_call.asm

> link /nologo asm_call.obj kernel32.lib user32.lib /entry:start /subsystem:windows /debug

> dir asm* /b

> _

One easy way to debug this is now to fire up Visual Studio (the [devenv.exe] program) and in Visual Studio, click [DebugStep into], or just press F11:

> devenv asm_call.exe

> _

enter image description here

In the figure above, showing the Visual Studio 2012 debugger in action, the leftmost big red arrow shows you the address information within the machine code instruction, namely 0000004E hex (note: the least significant byte is at lowest address, first in memory), and the other big red arrow shows you that, incredible as it may seem, this rather small magic number somehow designates the _MessageBoxA@16 function that, as far as the debugger knows, resides at address 01161064h hex.

  • The address data in the CALL ADDRESS instruction is an offset, which is relative to the address of the next instruction, and so it doesn't need any fixup for changed DLL placement.

  • The address that the call goes to just contains a jmp ds:[IAT_entry_for_MessageBoxA].

  • This forwarder code comes from the import library, not from the DLL, so it does not need fixups either (but apparently it does get some special treatment, as does the DLL function address).

The second call instruction does directly what the jmp does for the first, namely looking up the DLL function address in the IAT table.

The third call instruction can now be seen to be identical to the second one at the machine code level. Apparently it is not well known how to emulate Visual C++ declspec( dllimport ) in assembly. The above kind of declaration is one way, perhaps combined with a text equ.

The IAT is not at a fixed address.

The following C++ program reports the address where it has been loaded, what DLL functions it imports from what modules, and where the various IAT tables reside.

When it’ built with a modern version of Microsoft’s toolchain, just using the defaults, it is generally loaded at a different address each time it’s run.

You can prevent this behavior by using the linker option /dynamicbase:no.

#include <assert.h>         // assert
#include <stddef.h>         // ptrdiff_t
#include <sstream>
using std::ostringstream;

#undef UNICODE
#define UNICODE
#include <windows.h>

template< class Result, class SomeType >
Result as( SomeType const p ) { return reinterpret_cast<Result>( p ); }

template< class Type >
class OffsetTo
    ptrdiff_t offset_;
    ptrdiff_t asInteger() const { return offset_; }
    explicit OffsetTo( ptrdiff_t const offset ): offset_( offset ) {}

template< class ResultPointee, class SourcePointee >
ResultPointee* operator+(
    SourcePointee* const            p,
    OffsetTo<ResultPointee> const   offset
    return as<ResultPointee*>( as<char const*>( p ) + offset.asInteger() );

int main()
    auto const pImage =
        as<IMAGE_DOS_HEADER const*>( ::GetModuleHandle( nullptr ) );
    assert( pImage->e_magic == IMAGE_DOS_SIGNATURE );

    auto const pNTHeaders =
        pImage + OffsetTo<IMAGE_NT_HEADERS const>( pImage->e_lfanew );
    assert( pNTHeaders->Signature == IMAGE_NT_SIGNATURE );

    auto const& importDir =

    auto const pImportDescriptors = pImage + OffsetTo<IMAGE_IMPORT_DESCRIPTOR const>(
        importDir.VirtualAddress //+ importSectionHeader.PointerToRawData

    ostringstream stream;
    stream << "I'm loaded at " << pImage << ", and I'm using...\n";
    for( int i = 0;  pImportDescriptors[i].Name != 0;  ++i )
        auto const pModuleName = pImage + OffsetTo<char const>( pImportDescriptors[i].Name );

        DWORD const offsetNameTable = pImportDescriptors[i].OriginalFirstThunk;
        DWORD const offsetAddressTable = pImportDescriptors[i].FirstThunk;  // The module "IAT"

        auto const pNameTable = pImage + OffsetTo<IMAGE_THUNK_DATA const>( offsetNameTable );
        auto const pAddressTable = pImage + OffsetTo<IMAGE_THUNK_DATA const>( offsetAddressTable );

        stream << "\n* '" << pModuleName << "'";
        stream << " with IAT at " << pAddressTable << "\n";
        stream << "\t";
        for( int j = 0; pNameTable[j].u1.AddressOfData != 0; ++j )
            auto const pFuncName =
                pImage + OffsetTo<char const>( 2 + pNameTable[j].u1.AddressOfData );
            stream << pFuncName << " ";
        stream << "\n";


enter image description here

A self-replicating Windows machine code program.

Finally, from my original answer, here's a Microsoft assembler (MASM) program I made for another purpose that illustrates some of the issues, because by its nature (it produces as output source code that when assembled and run produces that same source code, and so on) it has to be completely relocatable code and with just the barest little help from the ordinary program loader:

.model flat, stdcall
option casemap :none        ; Case sensitive identifiers, please.
dword_aligned textequ <4>   ; Just for readability.

    ; Windows API functions:
    extern  ExitProcess@4: proc         ; from [kernel32.dll]
    extern  GetStdHandle@4: proc        ; from [kernel32.dll]
    extern  WriteFile@20: proc          ; from [kernel32.dll]
    extern  wsprintfA: proc             ; from [user32.dll]

    STD_OUTPUT_HANDLE       equ     -11

        ; The main code.
GlobalsStruct   struct  dword_aligned
    codeStart               dword   ?
    outputStreamHandle      dword   ?
GlobalsStruct   ends
globals         textequ     <(GlobalsStruct ptr [edi])>

    jmp     code_start

    ; Trampolines to add references to these functions.
myExitProcess:    jmp ExitProcess@4
myGetStdHandle:   jmp GetStdHandle@4     
myWriteFile:      jmp WriteFile@20
mywsprintfA:      jmp wsprintfA

;               The code below is reproduced, so it's all relative.

    jmp     main

byte    ".model flat, stdcall", 13, 10
byte    "option casemap :none", 13, 10
byte    13, 10
byte    "    extern  ExitProcess@4: proc", 13, 10
byte    "    extern  GetStdHandle@4: proc", 13, 10
byte    "    extern  WriteFile@20: proc", 13, 10
byte    "    extern  wsprintfA: proc", 13, 10
byte    13, 10
byte    "    .code", 13, 10
byte    "startup:", 13, 10
byte    "    jmp     code_start", 13, 10
byte    13, 10
byte    "jmp ExitProcess@4", 13, 10
byte    "jmp GetStdHandle@4", 13, 10
byte    "jmp WriteFile@20", 13, 10
byte    "jmp wsprintfA", 13, 10
byte    13, 10
byte    "code_start:", 13, 10
prologue_nBytes         equ     $ - prologue

byte    "code_end:", 13, 10
byte    "    end startup", 13, 10
epilogue_nBytes         equ     $ - epilogue

dbDirective             byte    4 dup( ' ' ), "byte       "
dbDirective_nBytes      equ     $ - dbDirective

numberFormat            byte    " 0%02Xh", 0
numberFormat_nBytes     equ     $ - numberFormat

comma                   byte    ","
windowsNewline          byte    13, 10

    push    0           ; space for nBytesWritten
    mov     ecx, esp    ; &nBytesWritten

    push    0           ; lpOverlapped
    push    ecx         ; &nBytesWritten
    push    ebx         ; nBytes
    push    eax         ; &s[0]
    push    globals.outputStreamHandle
    call    myWriteFile

    pop     eax         ; nBytesWritten

    dmc_LocalsStruct    struct  dword_aligned
        numberStringLen     dword   ?
        numberString        byte    16*4 DUP( ? )
        fileHandle          dword   ?
        nBytesWritten       dword   ?
        byteIndex           dword   ?
    dmc_LocalsStruct    ends
    dmc_locals          textequ     <[ebp - sizeof dmc_LocalsStruct].dmc_LocalsStruct>

    mov     ebp, esp
    sub     esp, sizeof dmc_LocalsStruct

    ; Output prologue that makes MASM happy (placing machine code data in context):
    ; lea     eax, prologue
        mov     eax, globals.codeStart
        add     eax, prologue - code_start
    mov     ebx, prologue_nBytes
    call    write

    ; Output the machine code bytes.
    mov     dmc_locals.byteIndex, 0

    ; loop start
            ; Output a db directive
        ;lea     eax, dbDirective
            mov     eax, globals.codeStart
            add     eax, dbDirective - code_start
        mov     ebx, dbDirective_nBytes
        call    write

        ; loop start
                ; Create string representation of a number
            mov     ecx, dmc_locals.byteIndex
            mov     eax, 0
            ;mov     al, byte ptr [code_start + ecx]
                mov     ebx, globals.codeStart
                mov     al, [ebx + ecx]
            push    eax
            ;push    offset numberFormat
                mov     eax, globals.codeStart
                add     eax, numberFormat - code_start
                push    eax
            lea     eax, dmc_locals.numberString
            push    eax
            call    mywsprintfA
            add     esp, 3*(sizeof dword)
            mov     dmc_locals.numberStringLen, eax

                ; Output string representation of number
            lea     eax, dmc_locals.numberString
            mov     ebx, dmc_locals.numberStringLen
            call    write

                ; Are we finished looping yet?
            inc     dmc_locals.byteIndex
            mov     ecx, dmc_locals.byteIndex
            cmp     ecx, code_end - code_start
            je      dmc_finalNewline
            and     ecx, 07h
            jz      dmc_after_byteIndexingLoop

                ; Output a comma
            ; lea     eax, comma
                mov     eax, globals.codeStart
                add     eax, comma - code_start
            mov     ebx, 1
            call    write
            jmp dmc_byteIndexingLoop
        ; loop end

            ; New line
        ; lea     eax, windowsNewline
            mov     eax, globals.codeStart
            add     eax, windowsNewline - code_start
        mov     ebx, 2
        call    write
        jmp     dmc_lineLoop;
    ; loop end

        ; New line
    ; lea     eax, windowsNewline
        mov     eax, globals.codeStart
        add     eax, windowsNewline - code_start
    mov     ebx, 2
    call    write

    ; Output epilogue that makes MASM happy:
    ; lea     eax, epilogue
        mov     eax, globals.codeStart
        add     eax, epilogue - code_start
    mov     ebx, epilogue_nBytes
    call    write

    mov     esp, ebp

    sub esp, sizeof GlobalsStruct
    mov edi, esp

    call    main_knownAddress
    pop     eax
    sub     eax, main_knownAddress - code_start
    mov     globals.codeStart, eax

    call    myGetStdHandle
    mov     globals.outputStreamHandle, eax

    call displayMachineCode

    ; Well behaved process exit:
    push 0                          ; Process exit code, 0 indicates success.
    call myExitProcess

    end startup

And here's the self-reproducing output:

.model flat, stdcall
option casemap :none

    extern  ExitProcess@4: proc
    extern  GetStdHandle@4: proc
    extern  WriteFile@20: proc
    extern  wsprintfA: proc

    jmp     code_start

jmp ExitProcess@4
jmp GetStdHandle@4
jmp WriteFile@20
jmp wsprintfA

    byte        0E9h, 03Bh, 002h, 000h, 000h, 02Eh, 06Dh, 06Fh
    byte        064h, 065h, 06Ch, 020h, 066h, 06Ch, 061h, 074h
    byte        02Ch, 020h, 073h, 074h, 064h, 063h, 061h, 06Ch
    byte        06Ch, 00Dh, 00Ah, 06Fh, 070h, 074h, 069h, 06Fh
    byte        06Eh, 020h, 063h, 061h, 073h, 065h, 06Dh, 061h
    byte        070h, 020h, 03Ah, 06Eh, 06Fh, 06Eh, 065h, 00Dh
    byte        00Ah, 00Dh, 00Ah, 020h, 020h, 020h, 020h, 065h
    byte        078h, 074h, 065h, 072h, 06Eh, 020h, 020h, 045h
    byte        078h, 069h, 074h, 050h, 072h, 06Fh, 063h, 065h
    byte        073h, 073h, 040h, 034h, 03Ah, 020h, 070h, 072h
    byte        06Fh, 063h, 00Dh, 00Ah, 020h, 020h, 020h, 020h
    byte        065h, 078h, 074h, 065h, 072h, 06Eh, 020h, 020h
    byte        047h, 065h, 074h, 053h, 074h, 064h, 048h, 061h
    byte        06Eh, 064h, 06Ch, 065h, 040h, 034h, 03Ah, 020h
    byte        070h, 072h, 06Fh, 063h, 00Dh, 00Ah, 020h, 020h
    byte        020h, 020h, 065h, 078h, 074h, 065h, 072h, 06Eh
    byte        020h, 020h, 057h, 072h, 069h, 074h, 065h, 046h
    byte        069h, 06Ch, 065h, 040h, 032h, 030h, 03Ah, 020h
    byte        070h, 072h, 06Fh, 063h, 00Dh, 00Ah, 020h, 020h
    byte        020h, 020h, 065h, 078h, 074h, 065h, 072h, 06Eh
    byte        020h, 020h, 077h, 073h, 070h, 072h, 069h, 06Eh
    byte        074h, 066h, 041h, 03Ah, 020h, 070h, 072h, 06Fh
    byte        063h, 00Dh, 00Ah, 00Dh, 00Ah, 020h, 020h, 020h
    byte        020h, 02Eh, 063h, 06Fh, 064h, 065h, 00Dh, 00Ah
    byte        073h, 074h, 061h, 072h, 074h, 075h, 070h, 03Ah
    byte        00Dh, 00Ah, 020h, 020h, 020h, 020h, 06Ah, 06Dh
    byte        070h, 020h, 020h, 020h, 020h, 020h, 063h, 06Fh
    byte        064h, 065h, 05Fh, 073h, 074h, 061h, 072h, 074h
    byte        00Dh, 00Ah, 00Dh, 00Ah, 06Ah, 06Dh, 070h, 020h
    byte        045h, 078h, 069h, 074h, 050h, 072h, 06Fh, 063h
    byte        065h, 073h, 073h, 040h, 034h, 00Dh, 00Ah, 06Ah
    byte        06Dh, 070h, 020h, 047h, 065h, 074h, 053h, 074h
    byte        064h, 048h, 061h, 06Eh, 064h, 06Ch, 065h, 040h
    byte        034h, 00Dh, 00Ah, 06Ah, 06Dh, 070h, 020h, 057h
    byte        072h, 069h, 074h, 065h, 046h, 069h, 06Ch, 065h
    byte        040h, 032h, 030h, 00Dh, 00Ah, 06Ah, 06Dh, 070h
    byte        020h, 077h, 073h, 070h, 072h, 069h, 06Eh, 074h
    byte        066h, 041h, 00Dh, 00Ah, 00Dh, 00Ah, 063h, 06Fh
    byte        064h, 065h, 05Fh, 073h, 074h, 061h, 072h, 074h
    byte        03Ah, 00Dh, 00Ah, 063h, 06Fh, 064h, 065h, 05Fh
    byte        065h, 06Eh, 064h, 03Ah, 00Dh, 00Ah, 020h, 020h
    byte        020h, 020h, 065h, 06Eh, 064h, 020h, 073h, 074h
    byte        061h, 072h, 074h, 075h, 070h, 00Dh, 00Ah, 020h
    byte        020h, 020h, 020h, 062h, 079h, 074h, 065h, 020h
    byte        020h, 020h, 020h, 020h, 020h, 020h, 020h, 030h
    byte        025h, 030h, 032h, 058h, 068h, 000h, 02Ch, 00Dh
    byte        00Ah, 06Ah, 000h, 08Bh, 0CCh, 06Ah, 000h, 051h
    byte        053h, 050h, 0FFh, 077h, 004h, 0E8h, 074h, 0FEh
    byte        0FFh, 0FFh, 058h, 0C3h, 08Bh, 0ECh, 083h, 0ECh
    byte        050h, 08Bh, 007h, 005h, 005h, 000h, 000h, 000h
    byte        0BBh, 036h, 001h, 000h, 000h, 0E8h, 0D7h, 0FFh
    byte        0FFh, 0FFh, 0C7h, 045h, 0FCh, 000h, 000h, 000h
    byte        000h, 08Bh, 007h, 005h, 057h, 001h, 000h, 000h
    byte        0BBh, 00Fh, 000h, 000h, 000h, 0E8h, 0BFh, 0FFh
    byte        0FFh, 0FFh, 08Bh, 04Dh, 0FCh, 0B8h, 000h, 000h
    byte        000h, 000h, 08Bh, 01Fh, 08Ah, 004h, 019h, 050h
    byte        08Bh, 007h, 005h, 066h, 001h, 000h, 000h, 050h
    byte        08Dh, 045h, 0B4h, 050h, 0E8h, 02Ah, 0FEh, 0FFh
    byte        0FFh, 083h, 0C4h, 00Ch, 089h, 045h, 0B0h, 08Dh
    byte        045h, 0B4h, 08Bh, 05Dh, 0B0h, 0E8h, 08Fh, 0FFh
    byte        0FFh, 0FFh, 0FFh, 045h, 0FCh, 08Bh, 04Dh, 0FCh
    byte        081h, 0F9h, 068h, 002h, 000h, 000h, 074h, 02Bh
    byte        083h, 0E1h, 007h, 074h, 013h, 08Bh, 007h, 005h
    byte        06Eh, 001h, 000h, 000h, 0BBh, 001h, 000h, 000h
    byte        000h, 0E8h, 06Bh, 0FFh, 0FFh, 0FFh, 0EBh, 0AAh
    byte        08Bh, 007h, 005h, 06Fh, 001h, 000h, 000h, 0BBh
    byte        002h, 000h, 000h, 000h, 0E8h, 058h, 0FFh, 0FFh
    byte        0FFh, 0EBh, 086h, 08Bh, 007h, 005h, 06Fh, 001h
    byte        000h, 000h, 0BBh, 002h, 000h, 000h, 000h, 0E8h
    byte        045h, 0FFh, 0FFh, 0FFh, 08Bh, 007h, 005h, 03Bh
    byte        001h, 000h, 000h, 0BBh, 01Ch, 000h, 000h, 000h
    byte        0E8h, 034h, 0FFh, 0FFh, 0FFh, 08Bh, 0E5h, 0C3h
    byte        083h, 0ECh, 008h, 08Bh, 0FCh, 0E8h, 000h, 000h
    byte        000h, 000h, 058h, 02Dh, 04Ah, 002h, 000h, 000h
    byte        089h, 007h, 06Ah, 0F5h, 0E8h, 098h, 0FDh, 0FFh
    byte        0FFh, 089h, 047h, 004h, 0E8h, 023h, 0FFh, 0FFh
    byte        0FFh, 06Ah, 000h, 0E8h, 084h, 0FDh, 0FFh, 0FFh
    end startup

    I'd love to have a clear explanation on, in a Windows environment (PE executables), how do CALL XXXXXXXXXXXXXXX instructions work. I've been studying the PE format but I'm quite confused about the relationship between the CALL ADDRESS instruction, the importing of a function from a dll and how does the CALL ADDRESS reach out the code in a DLL. Besides ASLR and other security functions may move around DLLs, how do executables cope with this?

