How to write and executable Windows .exe manually (machine code with Hex editor)?

35,611

Solution 1

There's a quite minimalistic but fully working (on Win7, too) exe on corkami/wiki/PE101, every byte of it is explained in the nice graphic. You can type it all by hand in a hex editor, but the paddings may make that a little tedious.

As for the history, yes someone at Microsoft invented the exe format (the old DOS MZ exe format) and he (or someone else at Microsoft) wrote a loader for it and a linker, which is the thing that traditionally turns the output of a compiler ("object files") into executable files. It's possible (and even likely, I would say) that the first exe programs were written by hand, after all they were only meant to test the new loader.

Later, AT&T's COFF format was extended by Microsoft to the PE format, which still has the MZ header and typically (but optionally, it's not in the corkami example, and it can be anything really) includes a small DOS program just to print the message "This program cannot be run in DOS mode".

Solution 2

1) a .com file is the simplest place to start and will run on a dosbox, basically the program starts at something like offset 0x100 in the file, I think the first 0x100 can be whatever, dont remember

2) although true that first programs are often written and assembled by hand into machine code, we are talking about when you add two numbers save them in memory and are so happy that you take the rest of the day off. a "hello world" program that prints stuff to a video card is significantly more complicated. Now you can make a very simple one using dos system calls, and perhaps that is not what you are interested in, perhaps it is.

3) based on 2, anything more complicated than one or a few instructions at a time for testing back in the 1960s or 1970s, even when writing hand assembling a program you write your program in assembler by hand, then assemble it to machine code, then load it. Basically learn assembly language first, then learn how to generate the machine code for it, then start typing those bytes into a hex editor. It is not then 1960s, unless you enjoy excessive pain, learn the above by writing asm, using an assembler to generate the machine code, then use a disassembler to disassemble it and examine the assembly language and the machine code side by side to significantly improve the amount of time it is going to take you to get a working program. If you worked for a chip company before there were operating systems and instruction sets, you would still take advantage of other members of the team, the chip designers, etc for understanding how to make the machine code and arrange it. You wouldnt be coming at this with only high level language experience and doing it all on your own with a hope of success.

4) x86 is a horrible instruction set, if you dont know assembly I strongly discourage you to not learn it first. having an x86 is the worst excuse I have heard to learn x86 first. you already mentioned dosbox so are already planning to emulate/simulate so use a good instruction set and simulate it or buy that hardware (under $50 even under $20 will buy you a board with a much better instruction sets). I recommend simulate/emulate first and in parallel with the hardware if you choose to buy some. If you really want an education write your own simulator it is not difficult at all. Perhaps invent your own instruction set.

5) none of this will help you understand what a compiler does. Knowing assembly language then disassembling the compilers output is your best path toward that knowledge, machine code is not involved, no need to actually run the programs. A compiler goes from the higher level language to a lower level language (C to asm or C++ to asm for example). Then understand what an assembler does, there are many different solutions, both due to history and due to other reasons. The typical solution today is a separate compiler, assembler and linker (your compiler calls the assembler and linker for you unless you tell it not to, the three steps are hidden from view, in fact the compile process may be more than one program that is run to complete that task). Assemblers that output a binary will have to resolve the whole program, assemblers that output to an object will leave holes in the machine code for the linker to fill in. things like branching or calling items in another object that it cannot encode until the linker places things in the binary and knows the spacing/addressing. Also accessing variables that live in other objects.

You are likely not seeing actual examples on hex editing a program because first off it is such a broad question there isnt a simple answer (what operating, system, what system calls or are you creating those, what file format, what hex editor, etc). Also because it is a high level question and problem, the real questions are where do I learn assembly, where do I learn about the relationship between assembly and machine code, where do I learn about system calls (which are not an assembly question, they are unrelated to learning asm, you learn assembly language itself then you learn to USE it as a tool to perform system calls if you cannot perform the system calls directly using a higher language), where do I learn about executable file formats like .com, .exe, coff, elf, etc. What is a good or easy or some adjective, hex editor that runs on xyz operating system or environment. Ask those questions separately and you will find the answers and examples and once you have those answers you will know how to make a program using a hex editor typing in machine code. A shorter example is that you ARE seeing hex examples of complete programs when you see the disassembly of a program posted at SO, some of those are complete programs shown in hex. and if you know the file format you can simply type that stuff into a hex editor.

Solution 3

I make binaries by hand, but I think it's easier in assembly itself than a pure hex editor, where updating anything would be difficult.

  • The easiest is surely DOS COM format, which you can even type in notepad, or at least, it's very easy even for a normal Hello World.

  • The EXE (non DOS format) doesn't require much either see here.

  • If you're trying to make a PE, you can make a TinyPE.

Most binaries should be available as PE, and EXE and COM.

Solution 4

Not spot on, but this tutorial should give you a better insight into how assembly maps to machinde code (x86 ELF): http://timelessname.com/elfbin/ (especially look at the lower half of the page)

This page is [...] about my attempts at creating the smallest x86 ELF binary that would execute saying Hello World on Ubuntu Linux My first attempts started with C then progressed to x86 assembly and finally to a hexeditor.

It's great to analyze really small executables like these because the mapping between assembly and machine code will be easier to spot. This is also a really interesting article on the subject (not exactly related to your question though): http://www.phreedom.org/research/tinype/ (x86 PE)

Solution 5

I wrote an article on creating executable DOS binary files just by using the ECHO at the command prompt. No other 3rd party HEX utilities or x86 IDEs required!

The technique uses a a combination of keypad - ALT ASCII codes which convert OPCODES to a binary format readable directly under MSDOS. The output is a fully runnable binary *.com file.

http://colinord.blogspot.co.uk/2015/02/extreme-programming-hand-coded.html

Excerpt: Type the following key commands at the DOS prompt remembering to hold Left ALT.

c:\>Echo LALT-178 LALT-36 LALT-180 LALT-2 LALT-205 LALT-33 LALT-205 LALT-32 > $.com

The codes above are actually opcode values describing an X86 assembly program to print a dollar sign to the screen.

Your prompt should look something similar below when finished. Press enter to build!

c:\>Echo ▓$┤☻═!═  > $.com

Run the file '$.com' and you will see a single dollar ($) character displayed on the screen.

c:\>$.com
$
c:\> 

Congratulations! You just created your first hand coded executable file called $.com.

Share:
35,611

Related videos on Youtube

petersaints
Author by

petersaints

Updated on December 27, 2020

Comments

  • petersaints
    petersaints over 3 years

    I'd like to know how is it possible to write something as simple as an Hello World program just by using an Hex Editor. I know that I could use an assembler and assembly language to this at a near machine level but I just want to experiment with really writing machine code in a toy example such as Hello World.

    This could be a simple DOS .COM file that I can run on DOSBox. But it would be nice if someone could provide an example for an .EXE file for running it directly on my Windows PC.

    This is just pure curiosity. No... I'm not thinking of writing programs directly in binary machine code (I don't even usually write assembly code, I just use C/C++ as my most low level tools most of the time). I just want to see if that's possible to do it, because probably someone had to do it in the very early days of computers.

    P.S.: I know that there are similar questions about this topic around but none provide a working example. I just want a simple example so that it can help me understand how compilers and assemblers generate an executable file. I mean... someone must have done this by hand in the past for the very first programs. Also, for the Windows EXE format there must have been someone at Microsoft that wrote the first tools to generate the format and the way that Windows itself reads it and then executes it.

    • old_timer
      old_timer over 11 years
      there is nothing simple about a hello world program, it is extremely complicated and an advanced topic. adding a couple of numbers is a simple program.
    • Ciro Santilli OurBigBook.com
      Ciro Santilli OurBigBook.com almost 9 years
      possible duplicate of Reading/Writing machine code , both of which come down to: how does EXE work? They are also too broad :-)
    • B''H Bi'ezras -- Boruch Hashem
      B''H Bi'ezras -- Boruch Hashem about 4 years
      @old_timer how complicated is it to write the program mov eax,4 ret as machine language into an exe file? It can be done for ELF pretty easily (with a template), why not EXE?
    • old_timer
      old_timer about 4 years
      it is trivial to write some machine code into any binary format file you just look up the binary format, examine some examples made by other tools to verify the format documentation, and go from scratch or tweak. but a hello world program with printf has a massive amount of code behind it to see code come out on a window. yes you can make some system calls sure and simpify that into a loop with a string
    • old_timer
      old_timer about 4 years
      if I remember right .exe is a lot simpler than elf and elf files are simple (simple enough to not need libraries). .com is even simpler...
  • Alexey Frunze
    Alexey Frunze almost 12 years
    AFAIK, the PE-parsing code in Windows is now (in Vista/7) more strict and the bare minimum PE that worked on 9x/XP won't work on Vista/7. One should really study the format specification and begin with a valid PE image first and only then try to trim it.
  • petersaints
    petersaints almost 12 years
    Exactly. I found some examples that worked on older versions of Windows but not on Windows 7. Thanks for the tips. I already know the first link (I found it during my research), the second one is new though (and it's interesting because it focus on PE format and I'm on Windows :P)
  • petersaints
    petersaints almost 12 years
    Nice graphical explanation. Something like this is close to what I was looking for.
  • harold
    harold almost 12 years
    @petersaints I added some history that you might be interested in
  • n611x007
    n611x007 over 10 years
    does similar byte-to-byte visual explanation exist for networking topics? I know I can view packets in eg. wireshark, but all the visual possibilities add up in a poster.
  • harold
    harold over 10 years
    @naxa maybe, I don't know where to find it though
  • flarn2006
    flarn2006 over 9 years
    It doesn't start at 0x100 in the file; code starts executing from the very beginning of the file. It gets loaded into address 0100 in memory, however, in whatever segment is selected as the code segment.
  • Eduardo Wada
    Eduardo Wada about 4 years
    the link to source and binary is broken now
  • harold
    harold about 4 years
    @EduardoWada I've changed it to something less broken, but now it doesn't link to source or binary at all..