Difference between In-oder and Out-of-order execution in ARM architecture

arm cpu-architecture

12,108

Solution 1

That's pretty much it. Out-of-order execution "greedily" executes every instruction it can as quickly as possible without waiting for previous instructions to finish unless they depend on the result of an as-yet unfinished instruction.

This is obviously mostly useful if an instruction waits for memory to be read. An in-order implementation would just stall until the data becomes available, whereas an out of order implementation can (provided there are instructions ahead that can't be executed independently) get something else done while the processor waits for the data to be delivered from memory.

Note that both compilers and (if the compiler is not clever enough) programmers can take advantage of this by moving potentially expensive reads from memory as far away as possible from the point where the data is actually used. This makes no difference for an in-order implementation but can help hiding memory latency in an out-of-order implementation and therefore makes the code run faster.

The downside is of course that out-of-order implementations tend to be more complex and more power hungry because of all the book-keeping involved.

Solution 2

The architecture has little to do with it, in ARM one of the more significant differences is memory ordering can be quite relaxed (possibly under the control of the user). Even an in-order 3 stage pipeline Cortex-M has scenarios which necessitate the use of ISB and DSB.

Executes instructions in sequential order

This is the view presented to the programmer at all times, so it doesn't really describe much.

Until current instruction is completed, it will not execute next instruction.

Incorrect. All modern processors are pipelined, and fetch/decode/branch predict can all occur in an in-order machine whilst earlier instructions are still in flight. There are likely to be places where state is cached in case it needs to be reverted.

Have slower execution speed.

Not guaranteed. A wide in-order machine can have a higher IPC than an out of order machine. It won't necessarily make sense to build it though.

Executes instructions in non-sequential order

This is called 'out of order dispatch', or 'speculative execution' (which is a different thing, working at a higher level). In actual ARM cores, 'out of order completion' is more common. This is where the loads and stores are computed, then issued to a set of buffers. Even a single issue machine with a single memory interface can have multiple store buffers to permit stores to queue up whilst ALU operations continue in the processor. With more than one memory interface (or a bus like AXI), a slow load can be in progress whilst any number of other transactions complete. Out of order completion is much simpler to implement than any form of out of order dispatch, and is facilitated in the ARM architecture by 'precise aborts' (occurring at the logical place in the program order), and 'imprecise aborts' (occurring late when the memory system finally fails to resolve a transaction).

A further example of ordering is a scenario where there are 2 integer pipelines and one float pipeline. Not only are the pipelines of potentially different length, but there is nothing to say that they must map onto incoming instructions in a set order - provided the dependencies are handled.

Even if current instruction is NOT completed, it will execute next instruction. (This is done only if the next instruction does not depend on the result of current instruction)

This is generally true of all pipelined processors. Any stage could stall when it depends on some earlier instruction making progress.

Faster execution speed.

Maybe, depending on the constraints. Significantly, a compiler will benefit from understanding the optimum ordering, and it can make a difference if a binary needs to be optimum for a single target device or a wide range of devices.

Solution 3

It could also be mentioned that out of order executing proccessors have a "window" over the incoming units of bytes instructions. It comes naturally as a consequence from disordering the instructions. If the following letters are bytes of which the proccessor are to work through... C B D A E F, and the most optimal way would be to do AB CD EF - and the processor only would have a window of 3 bytes, it would do CB DE A F instead. It cannot see the total incoming of instructions. This window is one of many qualities that make up a good proccessor.

(In my example letters near eachother can be done simultanously - as they are unrelated - while others cannot).

12,108

Author by

rohit

Updated on August 02, 2022

Comments

rohit over 1 year

As per my understanding in ARM processors, following are the features of In-order execution (1) Executes instructions in sequential order (2) Until current instruction is completed, it will not execute next instruction. (3) Have slower execution speed.

Out-of order execution is just opposite behaviour of In-order. (1) Executes instructions in non-sequential order (2) Even if current instruction is NOT completed, it will execute next instruction. (This is done only if the next instruction does not depend on the result of current instruction) (3) Faster execution speed.

Is there any other feature difference, other than the above mentioned ??
Peter Cordes over 2 years

In-order execution just means instructions start in program order. Modern in-order ARM CPUs track incoming loads and don't actually stall until something tries to read a load-result register that isn't ready yet. This allows memory-level parallelism which is critical for performance on modern CPUs where DRAM is hundreds of cycles of latency away. (Like hit-under-miss to hide one cache miss and miss-under-miss to have multiple cache misses in flight.) Out-of-order exec can hide (some) memory latency even when a result is used soon after loading, so it's still very powerful in real code.