What does "rep; nop;" mean in x86 assembly? Is it the same as the "pause" instruction?
Solution 1
rep; nop
is indeed the same as the pause
instruction (opcode F390
). It might be used for assemblers which don't support the pause
instruction yet. On previous processors, this simply did nothing, just like nop
but in two bytes. On new processors which support hyperthreading, it is used as a hint to the processor that you are executing a spinloop to increase performance. From Intel's instruction reference:
Improves the performance of spin-wait loops. When executing a “spin-wait loop,” a Pentium 4 or Intel Xeon processor suffers a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops.
Solution 2
rep nop
= F3 90 = the encoding for pause
, as well as how it decodes on older CPUs that don't support pause
.
Prefixes (other than lock
) that don't apply to an instruction are ignored in practice by existing CPUs.
The documentation says using rep
with instructions it doesn't apply to is "reserved and can cause unpredictable behaviour" because future CPUs might recognize it as part of some new instruction. Once they establish any specific new instruction encoding using f3 xx
, they document how it runs on older CPUs. (Yes, the x86 opcode space is so limited that they do crazy stuff like this, and yes it makes the decoders complicated.)
In this case, it means you can use pause
in spinloops without breaking backwards compat. Old CPUs that don't know about pause
will decode it as a NOP with no harm done, as guaranteed by Intel's ISA ref manual entry for pause
. On new CPUs, you get the benefit of power-saving / HT friendliness, and avoiding memory-ordering mis-speculation when the memory you're spinning on does change and you leave the spin loop.
Links to Intel's manuals and tons of other good stuff on the x86 tag wiki info page
Another case of a meaningless rep
prefix becoming a new instruction on new CPUs: lzcnt
is F3 0F BD /r
. On CPUs that don't support that instruction (missing the LZCNT feature flag in their CPUID), it decodes as rep bsr
, which runs the same as bsr
. So on old CPUs, it produces 32 - expected_result
, and is undefined when the input was zero.
But tzcnt
and bsf
do the same thing with non-zero inputs, so compilers can and do use tzcnt
even when it's not guaranteed that the target CPU will run it as tzcnt
. AMD CPUs have fast tzcnt
, slow bsf
, and on Intel they're both fast. As long as it doesn't matter for correctness (you're not relying on flag-setting, or on leaving the destination unmodified behaviour in the input=0 case), having it decode as tzcnt
on CPUs that support it is helpful.
One case of a meaningless rep
prefix that will probably never decode differently: rep ret
is used by default by gcc when targeting "generic" CPUs (i.e. not targetting a specific CPU with -march
or -mtune
, and not targetting AMD K8 or K10.) It will be decades before anyone could make a CPU that decodes rep ret
as anything other than ret
, because it's present in most binaries in most Linux distros. See What does `rep ret` mean?
Denilson Sá Maia
Software developer || software engineer || programmer || developer. Whatever job title you want to call me. http://denilson.sa.nom.br/
Updated on July 08, 2022Comments
-
Denilson Sá Maia almost 2 years
- What does
rep; nop
mean? - Is it the same as
pause
instruction? - Is it the same as
rep nop
(without the semi-colon)? - What's the difference to the simple
nop
instruction? - Does it behave differently on AMD and Intel processors?
- (bonus) Where is the official documentation for these instructions?
Motivation for this question
After some discussion in the comments of another question, I realized that I don't know what
rep; nop;
means in x86 (or x86-64) assembly. And also I couldn't find a good explanation on the web.I know that
rep
is a prefix that means "repeat the next instructioncx
times" (or at least it was, in old 16-bit x86 assembly). According to this summary table at Wikipedia, it seemsrep
can only be used withmovs
,stos
,cmps
,lods
,scas
(but maybe this limitation was removed on newer processors). Thus, I would thinkrep nop
(without semi-colon) would repeat anop
operationcx
times.However, after further searching, I got even more confused. It seems that
rep; nop
andpause
map to the exactly same opcode, andpause
has a bit different behavior than justnop
. Some old mail from 2005 said different things:- "try not to burn too much power"
- "it is equivalent to 'nop' just with 2 byte encoding."
- "it is magic on intel. Its like 'nop but let the other HT sibling run'"
- "it is pause on intel and fast padding on Athlon"
With these different opinions, I couldn't understand the correct meaning.
It's being used in Linux kernel (on both i386 and x86_64), together with this comment:
/* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
It is also being used in BeRTOS, with the same comment. - What does
-
Denilson Sá Maia over 12 yearsIs spin-wait loop the same as busy-wait loop? Does this "improvement" only applies to hyperthreading processors? (and why?)
-
Brendan over 12 yearsYes, spin-wait loop is the same as busy-wait loop. The benefit also applies to CPUs that don't support hyper-threading. It can be thought of as limiting the number of (unnecessary) instructions in the pipeline (rather than attempting to do many iterations of the loop in parallel)
-
Prof. Falken over 12 years@Brendan, thanks! I didn't understand at all, until you said the thing about iterations of the loop in parallel.
-
Denilson Sá Maia over 12 years@Brendan, Oh, now I get it! These modern processors are superscalar, and thus they will attempt to run multiple instructions at the same time. If this is a busy-wait loop, running more instructions won't make it faster, as it is just waiting for another condition.
-
Paul A. Clayton over 8 yearsThe
rep
prefix was also used by Intel to add lock elision. -
Peter Cordes almost 7 years@Denilson: Yes, hyperthreading-friendliness (or just power-saving without HT) is one big benefit, but the other is avoiding a memory-ordering mis-speculation when leaving the spin-loop. Without
pause
, your spin-loop is effectively one pipeline-clear slower to notice the state-change of the memory location written by another core. -
St.Antario over 4 yearsPrefixes that don't apply to an instruction are ignored. But it is mentioned that Repeat Prefixes (
F2H
andF3H
) Reserved and may result in unpredictable behavior in Table 11-3. Effect of Prefixes on SSE, SSE2, and SSE3 Instructions. So the prefix application is ignored for some of the instructions, not for all. So is this feature considered undocumented? -
Peter Cordes over 4 years@St.Antario: They phrase it that way because future CPUs might recognize it as part of some new instruction. On all real CPUs that's been the case, and once they establish an encoding using
f3 xx
they document how it runs on older CPUs. -
St.Antario over 4 yearsPrefixes (other than lock) that don't apply to an instruction are ignored in practice by existing CPUs. It is documented that
rep movbe
causes#UD
, sorep
is not always ignored. Even if it does not apply to an instruction in the sense as it is specified in theREP/REPE/REPZ/REPNE/REPNZ
manual entry. -
Peter Cordes over 4 years@St.Antario: Interesting! In general though, for older instructions non-applicable prefixes are ignored. When introducing a new instruction it's possible to add stricter rules if they choose. IDK why they would choose that for this specific case.