How could I implement logical implication with bitwise or other efficient code in C?

11,091

Solution 1

FYI, with gcc-4.3.3:

int foo(int a, int b) { return !a || b; }
int bar(int a, int b) { return ~a | b; }

Gives (from objdump -d):

0000000000000000 <foo>:
   0:   85 ff                   test   %edi,%edi
   2:   0f 94 c2                sete   %dl
   5:   85 f6                   test   %esi,%esi
   7:   0f 95 c0                setne  %al
   a:   09 d0                   or     %edx,%eax
   c:   83 e0 01                and    $0x1,%eax
   f:   c3                      retq   

0000000000000010 <bar>:
  10:   f7 d7                   not    %edi
  12:   09 fe                   or     %edi,%esi
  14:   89 f0                   mov    %esi,%eax
  16:   c3                      retq   

So, no branches, but twice as many instructions.

And even better, with _Bool (thanks @litb):

_Bool baz(_Bool a, _Bool b) { return !a || b; }
0000000000000020 <baz>:
  20:   40 84 ff                test   %dil,%dil
  23:   b8 01 00 00 00          mov    $0x1,%eax
  28:   0f 45 c6                cmovne %esi,%eax
  2b:   c3                      retq   

So, using _Bool instead of int is a good idea.

Since I'm updating today, I've confirmed gcc 8.2.0 produces similar, though not identical, results for _Bool:

0000000000000020 <baz>:
  20:   83 f7 01                xor    $0x1,%edi
  23:   89 f8                   mov    %edi,%eax
  25:   09 f0                   or     %esi,%eax
  27:   c3                      retq   

Solution 2

!p || q

is plenty fast. seriously, don't worry about it.

Solution 3

~p | q

For visualization:

perl -e'printf "%x\n", (~0x1100 | 0x1010) & 0x1111'
1011

In tight code, this should be faster than "!p || q" because the latter has a branch, which might cause a stall in the CPU due to a branch prediction error. The bitwise version is deterministic and, as a bonus, can do 32 times as much work in a 32-bit integer than the boolean version!

Solution 4

You can read up on deriving boolean expressions from truth Tables (also see canonical form), on how you can express any truth table as a combination of boolean primitives or functions.

Solution 5

Another solution for C booleans (a bit dirty, but works):

((unsigned int)(p) <= (unsigned int)(q))

It works since by the C standard, 0 represents false, and any other value true (1 is returned for true by boolean operators, int type).

The "dirtiness" is that I use booleans (p and q) as integers, which contradicts some strong typing policies (such as MISRA), well, this is an optimization question. You may always #define it as a macro to hide the dirty stuff.

For proper boolean p and q (having either 0 or 1 binary representations) it works. Otherwise T->T might fail to produce T if p and q have arbitrary nonzero values for representing true.

If you need to store the result only, since the Pentium II, there is the cmovcc (Conditional Move) instruction (as shown in Derobert's answer). For booleans, however even the 386 had a branchless option, the setcc instruction, which produces 0 or 1 in a result byte location (byte register or memory). You can also see that in Derobert's answer, and this solution also compiles to a result involving a setcc (setbe: Set if below or equal).

Derobert and Chris Dolan's ~p | q variant should be the fastest for processing large quantities of data since it can process the implication on all bits of p and q individually.

Notice that not even the !p || q solution compiles to branching code on the x86: it uses setcc instructions. That's the best solution if p or q may contain arbitrary nonzero values representing true. If you use the _Bool type, it will generate very few instructions.

I got the following figures when compiling for the x86:

__attribute__((fastcall)) int imp1(int a, int b)
{
 return ((unsigned int)(a) <= (unsigned int)(b));
}

__attribute__((fastcall)) int imp2(int a, int b)
{
 return (!a || b);
}

__attribute__((fastcall)) _Bool imp3(_Bool a, _Bool b)
{
 return (!a || b);
}

__attribute__((fastcall)) int imp4(int a, int b)
{
 return (~a | b);
}

Assembly result:

00000000 <imp1>:
   0:   31 c0                   xor    %eax,%eax
   2:   39 d1                   cmp    %edx,%ecx
   4:   0f 96 c0                setbe  %al
   7:   c3                      ret    

00000010 <imp2>:
  10:   85 d2                   test   %edx,%edx
  12:   0f 95 c0                setne  %al
  15:   85 c9                   test   %ecx,%ecx
  17:   0f 94 c2                sete   %dl
  1a:   09 d0                   or     %edx,%eax
  1c:   0f b6 c0                movzbl %al,%eax
  1f:   c3                      ret    

00000020 <imp3>:
  20:   89 c8                   mov    %ecx,%eax
  22:   83 f0 01                xor    $0x1,%eax
  25:   09 d0                   or     %edx,%eax
  27:   c3                      ret    

00000030 <imp4>:
  30:   89 d0                   mov    %edx,%eax
  32:   f7 d1                   not    %ecx
  34:   09 c8                   or     %ecx,%eax
  36:   c3                      ret    

When using the _Bool type, the compiler clearly exploits that it only has two possible values (0 for false and 1 for true), producing a very similar result to the ~a | b solution (the only difference being that the latter performs a complement on all bits instead of just the lowest bit).

Compiling for 64 bits gives just about the same results.

Anyway, it is clear, the method doesn't really matter from the point of avoiding producing conditionals.

Share:
11,091
alvatar
Author by

alvatar

Hi there.

Updated on June 05, 2022

Comments

  • alvatar
    alvatar almost 2 years

    I want to implement a logical operation that works as efficient as possible. I need this truth table:

    p    q    p → q
    T    T      T
    T    F      F
    F    T      T
    F    F      T
    

    This, according to wikipedia is called "logical implication"

    I've been long trying to figure out how to make this with bitwise operations in C without using conditionals. Maybe someone has got some thoughts about it.

    Thanks

  • Ehab Developer
    Ehab Developer about 15 years
    who cares.. a bitwise operation will not be any faster than that.
  • alvatar
    alvatar about 15 years
    Yes, actually I wanted to know also if this would be as fast as bitwise. Thanks for claryfing me that.
  • Chris Dolan
    Chris Dolan about 15 years
    For a 32-bit int, "~p | q" does 32 times as much work as "!p || q" and doesn't require a jump.
  • Chris Dolan
    Chris Dolan about 15 years
    litb: jumps are slower than arithmetic in just about every CPU due to branch prediction misses.
  • Dinah
    Dinah about 15 years
    Agreed. I know I was unaware of it. Also, what do you mean by "jump"?
  • Johannes Schaub - litb
    Johannes Schaub - litb about 15 years
    Chris, yeah that's what i was on. || sequences so that q is evaluated after !p in any case. tho if p and q are normal bools, chances are good that it compiles down to code as fast as ~p | q i think.
  • Chris Dolan
    Chris Dolan about 15 years
    Dinah: By jump, I mean that the CPU has to decide whether to execute the next opcode or branch to another point in the program. See en.wikipedia.org/wiki/Branch_prediction
  • alvatar
    alvatar about 15 years
    Thanks! I would like to ask where could I get more information about these? I mean about the C implementation of these operations, so I can get to know the details of || vs | and so on...
  • Ehab Developer
    Ehab Developer about 15 years
    The bitwise operator is not 32 times as fast. It's one clock cycle per calculation either way. No difference in performance.
  • Johannes Schaub - litb
    Johannes Schaub - litb about 15 years
    unless q is volatile, there is no need for a branch if both variables are simple integers.
  • Johannes Schaub - litb
    Johannes Schaub - litb about 15 years
    i've tested both versions. GCC at least on x86 insists on using a branch returning 0/1 for the bool version in every scenario i could imagine. bitwise ops did not.
  • ephemient
    ephemient about 15 years
    Or you could just use gcc -S *.c; cat *.s and skip the objdump -d *.o step? ;-)
  • derobert
    derobert about 15 years
    Yeah, but I remembered the objdump option but not the gcc one :-p
  • derobert
    derobert about 15 years
    Actually, testing gcc -S it gives /much/ more output, all of the extra stuff irrelevant.
  • Johannes Schaub - litb
    Johannes Schaub - litb about 15 years
    I get much shorter code for the || version with _Bool (c99) or bool (c++)
  • Johannes Schaub - litb
    Johannes Schaub - litb about 15 years
    this is insane. with optimizations off, the bitwise way stays that short, while the || way bloats all the way up :p
  • Chris Dolan
    Chris Dolan about 15 years
    eduffy: by 32 times as fast, I meant that you get 32 boolean results in parallel instead of one boolean result.
  • derobert
    derobert about 15 years
    well, if you want fast code, it'd be pretty silly to compile without optimizations. I used -O3
  • alvatar
    alvatar about 15 years
    Very interesting link. Thanks!
  • Lundin
    Lundin over 7 years
    In addition to the potential branch caused by ||, it also mandates a sequence point between the 1st and 2nd operand. This might in some cases work as a memory barrier, meaning that the compiler will not be able to re-order the instructions as it pleases. So it might very well be far less effective than bitwise | for that reason too. But all such micro-optimizations is an advanced topic and indeed nothing the average programmer should even begin to consider.
  • derobert
    derobert over 5 years
    @vasili111 Added. It was just a trivial type change of the first one.
  • étale-cohomology
    étale-cohomology over 2 years
    This is actually quite beautiful, and it reflects the fact that logical implication encodes the SUBSET OF relation on the powerset, and, (apparently) also the SMALLER THAN OR EQUAL relation on a poset. Good job.