How to read / write .exe machine code manually?

15,865

Solution 1

OllyDbg is an awesome tool that disassembles an EXE into readable instructions and allows you to execute the instructions one-by-one. It also tells you what API functions the program uses and if possible, the arguments that it provides (as long as the arguments are found on the stack).

Generally speaking, CPU instructions are of variable length, some are one byte, others are two, some three, some four etc. It mostly depends on the kind of data that the instruction expects. Some instructions are generalised, like "mov" which tells the CPU to move data from a CPU register to a place in memory, or vice versa. In reality, there are many different "mov" instructions, ones for handling 8-bit, 16-bit, 32-bit data, ones for moving data from different registers and so on.

You could pick up Dr. Paul Carter's PC Assembly Language Tutorial which is a free entry level book that talks about assembly and how the Intel 386 CPU operates. Most of it is applicable even to modern day consumer Intel CPUs.

The EXE format is specific to Windows. The entry-point (i.e. the first executable instruction) is usually found at the same place within the EXE file. It's all kind of difficult to explain all at once, but the resources I've provided should help cure at least some of your curiosity! :)

Solution 2

The executable file you see is Microsofts PE (Portable Executable) format. It is essentially a container, which holds some operating system specific data about a program and the program data itself split into several sections. For example code, resources, static data are stored in seperate sections.

The format of the section depends on what is in it. The code section holds the machine code according to the executable target architecture. In the most common cases this is Intel x86 or AMD-64 (same as EM64T) for Microsoft PE binaries. The format of the machine code is CISC and originates back to the 8086 and earlier. The important aspect of CISC is that its instruction size is not constant, you have to start reading at the right place to get something valuable out of it. Intel publishes good manuals on the x86/x64 instruction set.

You can use a disassembler to view the machine code directly. In combination with the manuals you can guess the source code most of the time.

And then there's MSIL EXE: The .NET executables holding Microsofts Intermediate Language, these do not contain machine specific code, but .NET CIL code. The specifications for that are available online at the ECMA.

These can be viewed with a tool such as Reflector.

Solution 3

The contents of the EXE file are described in Portable Executable. It contains code, data, and instructions to OS on how to load the file.

There is an 1:1 mapping between machine code and assembly. A disassembler program will perform the reverse operation.

There isn't a fixed number of bytes per instruction on i386. Some are a single byte, some are much longer.

Solution 4

You can use debug from the command line, but that's hard.

C:\WINDOWS>debug taskman.exe
-u
0D69:0000 0E            PUSH    CS
0D69:0001 1F            POP     DS
0D69:0002 BA0E00        MOV     DX,000E
0D69:0005 B409          MOV     AH,09
0D69:0007 CD21          INT     21
0D69:0009 B8014C        MOV     AX,4C01
0D69:000C CD21          INT     21
0D69:000E 54            PUSH    SP
0D69:000F 68            DB      68
0D69:0010 69            DB      69
0D69:0011 7320          JNB     0033
0D69:0013 7072          JO      0087
0D69:0015 6F            DB      6F
0D69:0016 67            DB      67
0D69:0017 7261          JB      007A
0D69:0019 6D            DB      6D
0D69:001A 206361        AND     [BP+DI+61],AH
0D69:001D 6E            DB      6E
0D69:001E 6E            DB      6E
0D69:001F 6F            DB      6F

Solution 5

Just relating to this question, anyone still read things like CD 21?

I remembered Sandra Bullock in one show, actually reading a screenful of hex numbers and figure out what the program does. Sort of like the current version of reading Matrix code.

if you do read stuff like CD 21, how do you remember the different various combinations?

Share:
15,865
Peter Perháč
Author by

Peter Perháč

Currently am on a contract, building APIs for the Valuation Office Agency, transforming the way agents interact with the VOA. Coursera certificate - Functional Programming Principles in Scala Coursera certificate - Functional Program Design in Scala Oracle Certified Associate, Java SE 7 Programmer Oracle Certified Professional, Java SE 7 Programmer II Oracle Certified Expert, Java EE6 Web Component Developer github.com/PeterPerhac July 2018 update: Books currently on my desk: (see my goodreads for up-to-date info) Functional and Reactive Domain Modeling Practical Vim, 2nd Edition (Great) books I parked (for now): Reactive Messaging Patterns with the Actor Model: Applications and Integration in Scala and Akka Functional Programming in Scala Learn You a Haskell for Great Good A Practical Guide to Ubuntu Linux Domain-Driven Design: Tackling Complexity in the Heart of Software Plan to look into: Play framework Linux administration Dale Carnegie books (various) Read cover-to-cover: Advanced Scala with Cats Clean Code Release It! - Design and Deploy Production-Ready Software The Phonix Project Spring in Action (3rd edition) Pragmatic Scala - Create Expressive, Concise, and Scalable Applications UML distilled, 2nd edition Pro C# 2010 and the .NET 4 Platform The Art of Unit Testing: With Examples in C# Read selectively: The Well-Grounded Java Developer - Vital Techniques of Java 7 and polyglot programming Switch - how to change things when change is hard Practical Unit Testing with JUnit and Mockito Effective Java, second edition Java Concurrency in Practice Nationality: Slovak

Updated on June 17, 2022

Comments

  • Peter Perháč
    Peter Perháč almost 2 years

    I am not well acquainted to the compiler magic. The act of transforming human-readable code (or the not really readable Assembly instructions) into machine code is, for me, rocket science combined with sorcery.

    I will narrow down the subject of this question to Win32 executables (.exe). When I open these files up in a specialized viewer, I can find strings (usually 16b per character) scattered at various places, but the rest is just garbage. I suppose the unreadable part (majority) is the machine code (or maybe resources, such as images etc...).

    Is there any straightforward way of reading the machine code? Opening the exe as a file stream and reading it byte by byte, how could one turn these individual bytes into Assembly? Is there a straightforward mapping between these instruction bytes and the Assembly instruction?

    How is the .exe written? Four bytes per instruction? More? Less? I have noticed some applications can create executable files just like that: for example, in ACD See you can export a series of images into a slideshow. But this does not necessarily have to be a SWF slideshow, ACD See is also capable of producing EXEcutable presentations. How is that done?

    How can I understand what goes on inside an EXE file?

  • Peter Perháč
    Peter Perháč about 15 years
    this is mighty nice answer. you're right about my curiosity. It's not that I NEED to disassemble executables, I am just very interested, and would like to toy with executables a little. Get that wooow feeling when I get to understand something beyond my current horizon :)
  • Peter Perháč
    Peter Perháč about 15 years
    learned something new today. I hope I won't break my OS soon. Happened once when I got over-excited about tweaking registry entries... Never saw my desktop again.
  • Dead account
    Dead account about 15 years
    You can also write new code and save it back to the file. Only a madman [or hacker] would use Debug
  • Peter Perháč
    Peter Perháč about 15 years
    +1, I am using Delphi on casual basis and I have been intrigued by the CPU, FPU, etc... windows where one can step from one instruction to another and see what's going on. I was wondering how these instructions are then made into an EXE file. And how EXE files can be generated (see ACD See example). I especially like the idea introduced by BCS :)
  • Marco van de Voort
    Marco van de Voort about 15 years
    stack.nl/~marcov/compiler.pdf is a PDF version of the almost impossible to miss Crenshaw's tutorial. Unfortunately for a different CPU (m68k), but it illustrates the fundamentals of a compiler quite nicely.
  • Coding With Style
    Coding With Style over 14 years
    The same way programmers who don't understand english learn to code in languages with english syntax. I think anyone who's coded low level in DOS would remember CD 21, though.
  • Coding With Style
    Coding With Style over 14 years
    Count me among the few who still use debug. FYI: Microsoft's DEBUG only unassembles 16-bit real mode. If you want a 32-bit DPMI capable debug, try out japheth's version: japheth.de/debxxf.html
  • trincot
    trincot over 5 years
    This is not really an answer. Moreover, the machine code that the OP refers to is present in .exe files and is loaded into standard memory. Not "within the cpu".