Are binaries portable across different CPU architectures?

17,133

Solution 1

No. Binaries must be (re)compiled for the target architecture, and Linux offers nothing like fat binaries out of the box. The reason is because the code is compiled to machine code for a specific architecture, and machine code is very different between most processor families (ARM and x86 for instance are very different).

EDIT: it is worth noting that some architectures offer levels of backwards compatibility (and even rarer, compatibility with other architectures); on 64-bit CPU's, it's common to have backwards compatibility to 32-bit editions (but remember: your dependent libraries must also be 32-bit, including your C standard library, unless you statically link). Also worth mentioning is Itanium, where it was possible to run x86 code (32-bit only), albeit very slowly; the poor execution speed of x86 code was at least part of the reason it wasn't very successful in the market.

Bear in mind that you still cannot use binaries compiled with newer instructions on older CPU's, even in compatibility modes (for example, you cannot use AVX in a 32-bit binary on Nehalem x86 processors; the CPU just doesn't support it.

Note that kernel modules must be compiled for the relevant architecture; in addition, 32-bit kernel modules will not work on 64-bit kernels or vice versa.

For information on cross-compiling binaries (so you don't have to have a toolchain on the target ARM device), see grochmal's comprehensive answer below.

Solution 2

Elizabeth Myers is correct, each architecture requires a compiled binary for the architecture in question. To build binaries for a different architecture than your system runs on you need a cross-compiler.


In most cases you need to compile a cross compiler. I only have experience with gcc (but I believe that llvm, and other compilers, have similar parameters). A gcc cross-compiler is achieved by adding --target to the configure:

./configure --build=i686-arch-linux-gnu --target=arm-none-linux-gnueabi

You need to compile gcc, glibc and binutils with these parameters (and provide the kernel headers of the kernel at the target machine).

In practice this is considerably more complicated and different build errors pop out on different systems.

There are several guides out there on how to compile the GNU toolchain but I'll recommend the Linux From Scratch, which is continuously maintained and does a very good job at explaining what the presented commands do.

Another option is a bootstrap compilation of a cross-compiler. Thanks to the struggle of compiling cross compilers to different architectures on different architectures crosstool-ng was created. It gives a bootstrap over the toolchain needed to build a cross compiler.

crosstool-ng supports several target triplets on different architectures, basically it is a bootstrap where people dedicate their time to sort out problems occurring during the compilation of a cross-compiler toolchain.


Several distros provide cross-compilers as packages:

In other words, check what your distro has available in terms of cross compilers. If your distro does not have a cross compiler for your needs you can always compile it yourself.

References:


Kernel modules note

If you are compiling your cross-compiler by hand, you have everything you need to compile kernel modules. This is because you need the kernel headers to compile glibc.

But, if you are using a cross-compiler provided by your distro, you will need the kernel headers of the kernel that runs on the target machine.

Solution 3

Note that as a last resort (i.e. when you don't have the source code), you can run binaries on a different architecture using emulators like qemu, dosbox or exagear. Some emulators are designed to emulate systems other than Linux (e.g. dosbox is designed to run MS-DOS programs, and there are plenty of emulators for popular gaming consoles). Emulation has a significant performance overhead: emulated programs run 2-10 times slower than their native counterparts.

If you need to run kernel modules on a non-native CPU, you'll have to emulate the whole OS including the kernel for the same architecture. AFAIK it's impossible to run foreign code inside Linux kernel.

Solution 4

Not only are binaries not portable between x86 and ARM, there are different flavours of ARM.

The one you are likely to encounter in practice is ARMv6 vs ARMv7. Raspberry Pi 1 is ARMv6, later versions are ARMv7. So it's possible to compile code on the later ones that does not work on the Pi 1.

Fortunately one benefit of open source and Free software is having the source so that you can rebuild it on any architecture. Although this may require some work.

(ARM versioning is confusing, but if there's a V before the number it's talking about the instruction set architecture (ISA). If there isn't, it's a model number like "Cortex M0" or "ARM926EJS". Model numbers have nothing to do with ISA numbers.)

Solution 5

You always need to target a platform. In the simplest case, the target CPU directly runs the code compiled in the binary (this roughly corresponds to MS DOS's COM executables). Let's consider two different platforms I just invented - Armistice and Intellio. In both cases, we'll have a simple hello world program that outputs 42 on the screen. I'll also assume that you're using a multi-platform language in a platform-agnostic manner, so the source code is the same for both:

Print(42)

On Armistice, you have a simple device driver that takes care of printing numbers, so all you have to do is output to a port. In our portable assembly language, this would correspond to something like this:

out 1234h, 42

However, or Intellio system has no such thing, so it has to go through other layers:

mov a, 10h
mov c, 42
int 13h

Oops, we already have a siginificant difference between the two, before we even get to machine code! This would roughly correspond to the kind of difference you have between Linux and MS DOS, or an IBM PC and an X-Box (even though both may use the same CPU).

But that's what OSes are for. Let's assume we have a HAL that makes sure that all different hardware configurations are handled the same way on the application layer - basically, we'll use the Intellio approach even on the Armistice, and our "portable assembly" code ends up the same. This is used by both modern Unix-like systems and Windows, often even in embedded scenarios. Good - now we can have the same truly portable assembly code on both Armistice and Intellio. But what about the binaries?

As we've assumed, the CPU needs to execute the binary directly. Let's look at the first line of our code, mov a, 10h, on Intellio:

20 10

Oh. Turns out that mov a, constant is so popular it has its own instruction, with its own opcode. How does Armistice handle this?

36 01 00 10

Hmm. There's the opcode for mov.reg.imm, so we need another argument to select the register we're assigning to. And the constant is always a 2-byte word, in big-endian notation - that's just how Armistice was designed, in fact, all instructions in Armistice are 4 bytes long, no exceptions.

Now imagine running the binary from Intellio on Armistice: the CPU starts decoding the instruction, finds opcode 20h. On Armistice, this corresponds, say, to the and.imm.reg instruction. It tries to read the 2-byte word constant (which reads 10XX, already a problem), and then the register number (another XX). We're executing the wrong instruction, with the wrong arguments. And worse, the next instruction will be complete bogus, because we actually ate another instruction, thinking it was data.

The application has no chance of working, and it will most likely crash or hang almost immediately.

Now, this doesn't mean that an executable always needs to say it runs on Intellio or Armistice. You just need to define a platform that's independent of the CPU (like bash on Unix), or both the CPU and OS (like Java or .NET, and nowadays even JavaScript, kind of). In this case, the application can use one executable for all the different CPUs and OSes, while there's some application or service on the target system (which targets the correct CPU and/or OS directly) that translates the platform-independent code into something the CPU can actually execute. This may or may not come with a hit to performance, cost or capability.

CPUs usually come in families. For example, all CPUs from the x86 family have a common set of instructions that are encoded in exactly the same way, so every x86 CPU can run every x86 program, as long as it doesn't try to use any extensions (for example, floating point operations or vector operations). On x86, the most common examples today are Intel and AMD, of course. Atmel is a well known company designing CPUs in the ARM family, quite popular for embedded devices. Apple also has ARM CPUs of their own, for example.

But ARM is utterly incompatible with x86 - they have very different design requirements, and have very little in common. The instructions have entirely different opcodes, they are decoded in a different manner, the memory addresses are treated differently... It might be possible to make a binary that runs on both an x86 CPU and an ARM CPU, by using some safe operations to distinguish between the two and jumping to two completely different sets of instructions, but it still means you have separate instructions for both versions, with just a bootstrapper that picks the correct set at runtime.

Share:
17,133

Related videos on Youtube

Rui F Ribeiro
Author by

Rui F Ribeiro

Updated on September 18, 2022

Comments

  • Rui F Ribeiro
    Rui F Ribeiro about 1 year

    My goal is to be able to develop for embedded Linux. I have experience on bare-metal embedded systems using ARM.

    I have some general questions about developing for different cpu targets. My questions are as below:

    1. If I have an application compiled to run on a 'x86 target, linux OS version x.y.z', can I just run the same compiled binary on another system 'ARM target, linux OS version x.y.z'?

    2. If above is not true, the only way is to get the application source code to rebuild/recompile using the relevant toolchain 'for example, arm-linux-gnueabi'?

    3. Similarly, if I have a loadable kernel module (device driver) that works on a 'x86 target, linux OS version x.y.z', can I just load/use the same compiled .ko on another system 'ARM target, linux OS version x.y.z'?

    4. If above is not true, the only way is to get the driver source code to rebuild/recompile using the relevant toolchain 'for example, arm-linux-gnueabi'?

    • Admin
      Admin over 7 years
      no, yes, no, yes.
    • Admin
      Admin over 7 years
      It helps to realize that we don't have an AMD target and an Intel target, just a single x86 target for both. That is because Intel and AMD are sufficiently compatible. It then becomes obvious that the ARM target exists for a specific reason, namely because ARM CPU's aren't compatible with Intel/AMD/x86.
    • Admin
      Admin over 7 years
      No, unless it's bytecode designed to run on a portable runtime environment like the Java Runtime. If you're writing code for embedded use, your code will likely rely on low-level processor-specific optimizations or features and will be very difficult to port, requiring more than just compilation for the target platform (e.g. assembly code changes, possibly rewriting several modules or the entire program).
    • Admin
      Admin over 7 years
      @MSalters: Actually, we do have an AMD target: amd64 which is often labeled x86-64 (while x86 is usually a re-labelling of i386). Fortunately Intel copied (and later expanded) the AMD architecture so any 64 bit x86 can run amd64 binaries.
  • mattdm
    mattdm over 7 years
    FWIW Fedora includes cross-compilers as well.
  • grochmal
    grochmal over 7 years
    @mattdm - thanks, answer tweaked, i believe i got the right part of the fedora wiki linked.
  • jpmc26
    jpmc26 over 7 years
    It may be worth clarifying about any compatibility (or lack thereof) between x86 and x64, given that some x86 binaries can run on x64 platforms. (I'm not sure this is the case on Linux, but it is on Windows, for instance.)
  • Dan Is Fiddling By Firelight
    Dan Is Fiddling By Firelight over 7 years
    @jpmc26 it's possible on Linux; but you might need to install compatibility libraries first. x86 support is a non-optional part of Win64 installs. In Linux it's optional; and because the Linux world's much farther along in making 64bit versions of everything available some distros don't default to having (all?) 32bit libraries installed. (I'm not sure how common it is; but have seen a few queries about it from people running mainstreamish distros before.)
  • Matteo Italia
    Matteo Italia over 7 years
    ... and then there are even different subflavors for the same ARM flavour, and even different ABIs for the exact same hardware (I'm thinking about the whole ARM soft/softfp/hard floating point mess).
  • Elizafox
    Elizafox over 7 years
    @jpmc26 I updated my answer with your notes; I thought about mentioning that but didn't want to complicate the answer.
  • Iwillnotexist Idonotexist
    Iwillnotexist Idonotexist over 7 years
    An easier way than Linux From Scratch to get a Linux and toolchain for another architecture is crosstool-ng. You might want to add that to the list. Also, configuring and compiling a GNU cross-toolchain by hand for any given architecture is incredibly involved and far more tedious than just --target flags. I suspect that's part of why LLVM is gaining popularity; It's architected in such a way that you don't need a rebuild to target another architecture - instead you can target multiple backends using the same frontend and optimizer libraries.
  • supercat
    supercat over 7 years
    The speed penalty for emulation is often even higher than 10x, but if one is trying to run code written for a 16Mhz machine on a 4GHz machine (a 250:1 difference in speed) an emulator that has a 50:1 speed penalty may still run code much faster than it would have run on the original platform.
  • grochmal
    grochmal over 7 years
    @IwillnotexistIdonotexist - thanks, i have tweaked the answer further. I have never heard of crosstool-ng before, and it is looks very useful. Your comment has actually been pretty useful for me.
  • Iwillnotexist Idonotexist
    Iwillnotexist Idonotexist over 7 years
    @MatteoItalia Ugh. The multiple ABIs were a snafu, a cure to something that was worse than the disease. Some ARMs didn't have VFP or NEON registers at all, some had 16, some 32. On Cortex-A8 and earlier the NEON engine ran a dozen CCs behind the rest of the core, so transferring a vector output to a GPR cost a lot. ARM has gotten round to doing the right thing - mandating a large common subset of features.