What does ordered / unordered comparison mean?

assembly x86 floating-point sse

13,974

Solution 1

An ordered comparison checks if neither operand is NaN. Conversely, an unordered comparison checks if either operand is a NaN.

This page gives some more information on this:

http://csapp.cs.cmu.edu/public/waside/waside-sse.pdf (section 5)

The idea here is that comparisons with NaN are indeterminate. (can't decide the result) So an ordered/unordered comparison checks if this is (or isn't) the case.

double a = 0.;
double b = 0.;

__m128d x = _mm_set1_pd(a / b);     //  NaN
__m128d y = _mm_set1_pd(1.0);       //  1.0
__m128d z = _mm_set1_pd(1.0);       //  1.0

__m128d c0 = _mm_cmpord_pd(x,y);    //  NaN vs. 1.0
__m128d c1 = _mm_cmpunord_pd(x,y);  //  NaN vs. 1.0
__m128d c2 = _mm_cmpord_pd(y,z);    //  1.0 vs. 1.0
__m128d c3 = _mm_cmpunord_pd(y,z);  //  1.0 vs. 1.0
__m128d c4 = _mm_cmpord_pd(x,x);    //  NaN vs. NaN
__m128d c5 = _mm_cmpunord_pd(x,x);  //  NaN vs. NaN

cout << _mm_castpd_si128(c0).m128i_i64[0] << endl;
cout << _mm_castpd_si128(c1).m128i_i64[0] << endl;
cout << _mm_castpd_si128(c2).m128i_i64[0] << endl;
cout << _mm_castpd_si128(c3).m128i_i64[0] << endl;
cout << _mm_castpd_si128(c4).m128i_i64[0] << endl;
cout << _mm_castpd_si128(c5).m128i_i64[0] << endl;

Result:

0
-1
-1
0
0
-1

Ordered return true if the operands are comparable (neither number is NaN):

Ordered comparison of 1.0 and 1.0 gives true.
Ordered comparison of NaN and 1.0 gives false.
Ordered comparison of NaN and NaN gives false.

Unordered comparison is the exact opposite:

Unordered comparison of 1.0 and 1.0 gives false.
Unordered comparison of NaN and 1.0 gives true.
Unordered comparison of NaN and NaN gives true.

Solution 2

This Intel guide: http://intel80386.com/simd/mmx2-doc.html contains examples of the two which are fairly straight-forward:

CMPORDPS Compare Ordered Parallel Scalars

Opcode Cycles Instruction 0F C2 .. 07 2 (3) CMPORDPS xmm reg,xmm reg/mem128

CMPORDPS op1, op2

op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values
op1[0] = (op1[0] != NaN) && (op2[0] != NaN)
op1[1] = (op1[1] != NaN) && (op2[1] != NaN)
op1[2] = (op1[2] != NaN) && (op2[2] != NaN)
op1[3] = (op1[3] != NaN) && (op2[3] != NaN)

TRUE  = 0xFFFFFFFF
FALSE = 0x00000000
CMPUNORDPS Compare Unordered Parallel Scalars

Opcode Cycles Instruction 0F C2 .. 03 2 (3) CMPUNORDPS xmm reg,xmm reg/mem128

CMPUNORDPS op1, op2

op1 contains 4 single precision 32-bit floating point values op2 contains 4 single precision 32-bit floating point values
op1[0] = (op1[0] == NaN) || (op2[0] == NaN)
op1[1] = (op1[1] == NaN) || (op2[1] == NaN)
op1[2] = (op1[2] == NaN) || (op2[2] == NaN)
op1[3] = (op1[3] == NaN) || (op2[3] == NaN)

TRUE  = 0xFFFFFFFF
FALSE = 0x00000000

The difference is AND (ordered) vs OR (unordered).

Solution 3

short version: Unordered is a relation two FP values can have. Scalar compares set FLAGS so you can check any condition you want (e.g. ucomisd xmm0, xmm1 / jp unordered), but SIMD compares need to encode the condition (predicate) into the instruction to be checked in parallel to produce a vector with element values of 0 / 0xFF.... Nowhere to put a separate FLAGS result for each element.

The "Unordered" in FUCOM means it doesn't raise an FP "invalid" exception when the comparison result is unordered, while FCOM does. This is the same as the distinction between OQ and OS cmpps predicates, not the "unordered" predicate. (See the "Signals #IA on QNAN" column in the cmppd docs in Intel's asm manuals. (cmppd is alphabetically first and has the more complete docs, vs. cmpps / cmpss/sd))

(FP exceptions are masked by default so they don't cause the CPU to trap to a hardware exception handler, just set sticky flags in MXCSR, or the legacy x87 status word for x87 instructions.)

ORD and UNORD are two choices of predicate for the cmppd / cmpps / cmpss / cmpsd insns (full tables in the cmppd entry which is alphabetically first). That html extract has readable table formatting, but Intel's official PDF original is somewhat better. (See the x86 tag wiki for links).

Two floating point operands are ordered with respect to each other if neither is NaN. They're unordered if either is NaN. i.e. ordered = (x>y) | (x==y) | (x<y);. That's right, with floating point it's possible for none of those things to be true. For more Floating Point madness, see Bruce Dawson's excellent series of articles.

cmpps takes a predicate and produces a vector of results, instead of doing a comparison between two scalars and setting flags so you can check any predicate you want after the fact. So it needs specific predicates for everything you can check.

The scalar equivalent is comiss / ucomiss to set ZF/PF/CF from the FP comparison result (which works like the x87 compare instructions (see the last section of this answer), but on the low element of XMM regs).

To check for unordered, look at PF. If the comparison is ordered, you can look at the other flags to see whether the operands were greater, equal, or less (using the same conditions as for unsigned integers, like jae for Above or Equal).

The COMISS instruction differs from the UCOMISS instruction in that it signals a SIMD floating-point invalid operation exception (#I) when a source operand is either a QNaN or SNaN. The UCOMISS instruction signals an invalid numeric exception only if a source operand is an SNaN.

(SNaN is not naturally occurring; operations like sqrt(-1) or inf - inf will produce QNaN if exceptions are masked, else trap and not produce a result.)

Normally FP exceptions are masked, so this doesn't actually interrupt your program; it just sets the bit in the MXCSR which you can check later.

This is the same as O/UQ vs. O/US flavours of predicate for cmpps / vcmpps. The AVX version of the cmp[ps][sd] instructions have an expanded choice of predicate, so they needed a naming convention to keep track of them.

The O vs. U tells you whether the predicate is true when the operands are unordered.

The Q vs. S tells you whether #I will be raised if either operand is a Quiet NaN. #I will always be raised if either operand is a Signalling NaN, but those are not "naturally occurring". You don't get them as outputs from other operations, only by creating the bit pattern yourself (e.g. as an error-return value from a function, to ensure detection of problems later).

The x87 equivalent is using fcom or fucom to set the FPU status word -> fstsw ax -> sahf, or preferably fucomi to set EFLAGS directly like ucomiss.

The U / non-U distinction is the same with x87 instructions as for comiss / ucomiss

Solution 4

You may understand the meaning of 'ordered CC' and 'unordered CC' through llvm CC definition, where 'CC' means CondCode. In 'llvm/include/llvm/CodeGen/ISDOpcodes.h' (my source code version is llvm-10.0.1), you could see the enum of CondCode as below:

enum CondCode {
// Opcode          N U L G E       Intuitive operation
SETFALSE,      //    0 0 0 0       Always false (always folded)
SETOEQ,        //    0 0 0 1       True if ordered and equal
SETOGT,        //    0 0 1 0       True if ordered and greater than
SETOGE,        //    0 0 1 1       True if ordered and greater than or equal
SETOLT,        //    0 1 0 0       True if ordered and less than
SETOLE,        //    0 1 0 1       True if ordered and less than or equal
SETONE,        //    0 1 1 0       True if ordered and operands are unequal
SETO,          //    0 1 1 1       True if ordered (no nans)
SETUO,         //    1 0 0 0       True if unordered: isnan(X) | isnan(Y)
SETUEQ,        //    1 0 0 1       True if unordered or equal
SETUGT,        //    1 0 1 0       True if unordered or greater than
SETUGE,        //    1 0 1 1       True if unordered, greater than, or equal
SETULT,        //    1 1 0 0       True if unordered or less than
SETULE,        //    1 1 0 1       True if unordered, less than, or equal
SETUNE,        //    1 1 1 0       True if unordered or not equal
SETTRUE,       //    1 1 1 1       Always true (always folded)
// Don't care operations: undefined if the input is a nan.
SETFALSE2,     //  1 X 0 0 0       Always false (always folded)
SETEQ,         //  1 X 0 0 1       True if equal
SETGT,         //  1 X 0 1 0       True if greater than
SETGE,         //  1 X 0 1 1       True if greater than or equal
SETLT,         //  1 X 1 0 0       True if less than
SETLE,         //  1 X 1 0 1       True if less than or equal
SETNE,         //  1 X 1 1 0       True if not equal
SETTRUE2,      //  1 X 1 1 1       Always true (always folded)
SETCC_INVALID       // Marker value.
};

That means: for floating-point condition comparision, 'ordered CC' means 'ordered & CC', while 'unordered CC' means ' unordered | CC'.

In another word, in floating-point comparison, where NaN is 'Not A Number',

'ordered CC' returns true if: 'both operands are not NaN' AND 'CC is true'
'unordered CC' returns true if: 'one or more operands are NaN' OR 'CC is true'

You can also see, that 'ordered CC' is definitely the opposite of 'unordered !CC'.

View more solutions

13,974

Author by

Dan

You have your way. I have my way. As for the right way, the correct way, and the only way, it does not exist. -Nietzsche

Updated on June 10, 2022

Comments

Dan almost 2 years
Looking at the SSE operators
```
CMPORDPS - ordered compare packed singles
CMPUNORDPS - unordered compare packed singles
```
What do ordered and unordered mean? I looked for equivalent instructions in the x86 instruction set, and it only seems to have unordered (FUCOM).
Bram almost 11 years

Thanks. What about signalling versus non-signalling compares? e.g. _CMP_LE_OS versus _CMP_LE_OQ from avxintrin.h
Mysticial almost 11 years

@Bram Bleh... I've honestly never even heard of those. So I wouldn't know. Might be better to ask that as a separate question so someone else could answer.
Mark Lakata almost 8 years

I added NaN vs NaN to your Answer to make it complete.
Mysticial almost 8 years

@MarkLakata Thanks!
Peter Cordes almost 8 years

@Mysticial: OQ vs. OS controls whether it will raise #I (FP Invalid) when there are QNaNs, or only if there are SNaNs (which AFAIK are not naturally occurring; you don't get SNaN from any kind of divide by zero or inf-inf or anything.)
Peter Cordes almost 8 years

that just shows how the predicate is applied to each element of the vector operands. It doesn't say anything about what the predicate condition IS.
Peter Cordes about 3 years

Note that those flag bit columns do not reflect the bit-patterns you can use as an immediate in asm / machine code. (Or the immintrin.h values of constants like _CMP_EQ_UQ, which is 0). But yes, useful table, although I think the one in Intel's asm manual already covers the 4 cases of possible relations for every predicate: felixcloutier.com/x86/cmppd
Jason Yang about 3 years

Thank you for your point. Indeed the LLVM CondCode enum is not actual encoding or immediate in asm programs. It's just a representation of the condition code of llvm SelectionDAG used in instruction selection and lowering. The Intel's asm manual lists the encoding of comparison predicate operands and their corresponding comparison results. It's also a good source to understand ordered / unordered comparison.