Setting and clearing the zero flag in x86

11,398

Solution 1

ZF=0

This is harder. cmp between any two regs known to be not equal. Or cmp reg,imm with any value some reg couldn't possibly have. e.g. cmp reg,1 with any known-zero register.

In general test reg,reg is good with any known-non-0 register value, e.g. a pointer.
test rsp, rsp is probably a good choice, or even test esp, esp to save a byte will work except if your stack is in the unusual location of spanning the 4G boundary.

I don't see a way to create ZF=0 in one instruction without a false dependency on some input reg. xor eax,eax / inc eax or dec will do the trick in 2 uops if you don't mind destroying a register, breaking false dependencies. (not doesn't set FLAGS, and neg will just do 0-0 = 0.)

or eax, -1 doesn't need any pre-condition for the register value. (False dependency, but not a true dependency so you can pick any register even if it might be zero.) It doesn't have to be -1, it's not gaining you anything so if you can make it something useful so much the better.

or eax,-1 FLAG results: ZF=0 PF=1 SF=1 CF=0 OF=0 (AF=undefined).

If you need to do this in a loop, you can obviously set up for it outside the loop, if you can dedicate a register to being non-zero for use with test.


ZF=1

Least destructive: cmp eax,eax - but has a false dependency (I assume) and needs a back-end uop: not a zeroing idiom. RSP doesn't usually change much so cmp esp, esp could be a good choice. (Unless that forces a stack-sync uop).

Most efficient: xor-zeroing (like xor eax,eax using any free register) is definitely the most efficient way on SnB-family (same cost as a 2-byte nop, or 3-byte if it needs a REX because you want to zero one of r8d..r15d): 1 front-end uop, zero back-end uops on SnB-family, and the FLAGS result is ready in the same cycle it issues. (Relevant only in case the front-end was stalled, or some other case where a uop depending on it issues in the same cycle and there aren't any older uops in the RS with ready inputs, otherwise such uops would have priority for whichever execution port.)

Flag results: ZF=1 PF=1 SF=0 CF=0 OF=0 (AF=undefined). (Or use sub eax,eax to get well-defined AF=0. In practice modern CPUs pick AF=0 for xor-zeroing, too, so they can decode both zeroing idioms the same way. Silvermont only recognizes 32-bit operand-size xor as a zeroing idiom, not sub.)

xor-zero is very cheap on all other uarches as well, of course: no input dependencies, and doesn't need any pre-existing register value. (And thus doesn't contribute to P6-family register-read stalls). So it will be at worst tied with anything else you could do on any other uarch (where it does require an execution unit.)

(On early P6-family, before Pentium M, xor-zeroing does not break dependencies; it only triggers the special al=eax state that avoids partial-register stuff. But none of those CPUs are x86-64, all 32-bit only.)

It's pretty common to want a zeroed register for something anyway, e.g. as a sub destination for 0 - x to copy-and-negate, so take advantage of it by putting the xor-zeroing where you need it to also create a useful FLAG condition.

Interesting but probably not useful: test al, 0 is 2 bytes long. But so is cmp esp,esp.


As @prl suggested, cmp same,same with any register will work without disturbing a value. I suspect this is not special-cased as dependency breaking the way sub same,same is on some CPUs, so pick a "cold" register. Again 2 or 3 bytes, 1 uop. It can micro-fuse with a JCC, but that would be dumb (unless the JCC is also a branch target from some other condition?)

Flag results: same as xor-zeroing.

Downsides:

  • (probably) false dependency
  • on P6-family can contribute to a register-read stall, so pick a cold register you're already reading in nearby instructions.
  • needs a back-end execution unit on SnB-family

Just for fun, other as-cheap alternatives include test al, 0. 2 bytes for AL, 3 or 4 bytes for any other 8-bit register. (REX) + opcode + modrm + imm8. The original register value doesn't matter because an imm8 of zero guarantees that reg & 0 = 0.


If you happen to have a 1 or -1 in a register you can destroy, 32-bit mode inc or dec would set ZF in only 1 byte. But in x86-64 that's at least 2 bytes. Nothing comes to mind for a 1-byte instruction in 64-bit mode that's actually efficient and sets FLAGS.


ZF=!CF

sbb same,same can set ZF=!CF (leaving CF unmodified), and setting the reg to 0 (CF=0) or -1 (CF=1). On AMD since Bulldozer (BD-family and Zen-family), this has no dependency on the GP register, only CF. But on other uarches it's not special cased and there is a false dep on the reg. And it's 2 uops on Intel before Broadwell.


ZF=bool(integer register)

To set ZF=integer_reg, obviously the normal test reg,reg is your best bet. (Better than and reg,reg or or reg,reg, unless you're intentionally rewriting the register to avoid P6 register-read stalls.)


Other FLAGS conditions:

CF has clc/stc/cmc instructions. (clc is as efficient as xor-zeroing on SnB-family.)

Solution 2

Assuming you don’t need to preserve the values of the other flags,

cmp eax, eax
Share:
11,398
BeeOnRope
Author by

BeeOnRope

Performance Matters

Updated on June 04, 2022

Comments

  • BeeOnRope
    BeeOnRope almost 2 years

    What's the most efficient way to set and also to clear the zero flag (ZF) in x86-64?

    Methods that work without the need for a register with a known value, or without any free registers at all are preferred, but if a better method is available when those or other assumptions are true it is also worth mentioning.

  • ecm
    ecm over 2 years
    I believe sbb reg, reg sets ZF = ! CF. As CF = 1 leads to FFFFh, CY, NZ and CF = 0 leads to 0000h, NC, ZR.
  • Peter Cordes
    Peter Cordes over 2 years
    @ecm: Right, thanks, fixed. I probably got mixed up between register = 0 but ZF=1 when thinking that through originally.
  • 640KB
    640KB almost 2 years
    What about complimenting/inverting ZF = !ZF (similar to CMC)? Is there a "reasonable" way to do that?
  • Peter Cordes
    Peter Cordes almost 2 years
    @640KB: Not with a single instruction, but perhaps setz al / test al, al