Assembly ADC (Add with carry) to C++

46,412

Solution 1

ADC is the same as ADD but adds an extra 1 if processor's carry flag is set.

Solution 2

From here (broken) or here

However, Intel processor has a special instruction called adc. This command behaves similarly as the add command. The only extra thing is that it also add the value carry flag along. So, this may be very handy to add large integers. Suppose you'd like to add a 32-bit integers with 16-bit registers. How can we do that? Well, let's say that the first integer is held on the register pair DX:AX, and the second one is on BX:CX. This is how:

add  ax, cx
adc  dx, bx

Ah, so first, the lower 16-bit is added by add ax, cx. Then the higher 16-bit is added using adc instead of add. It is because: if there are overflows, the carry bit is automatically added in the higher 16-bit. So, no cumbersome checking. This method can be extended to 64 bits and so on... Note that: If the 32-bit integer addition overflows too at the higher 16-bit, the result will not be correct and the carry flag is set, e.g. Adding 5 billion to 5 billion.

Everything from here on, remember that it falls pretty much into the zone of implementation defined behavior.

Here's a small sample that works for VS 2010 (32-bit, WinXp)

Caveat: $7.4/1- "The asm declaration is conditionally-supported; its meaning is implementation-defined. [ Note: Typically it is used to pass information through the implementation to an assembler. —end note ]"

int main(){
   bool carry = false;
   int x = 0xffffffff + 0xffffffff;
   __asm {
      jc setcarry
setcarry:
      mov carry, 1
   }
}

Solution 3

The ADC behaviour can be simulated in both C and C++. The following example adds two numbers (stored as arrays of unsigned as they are too large to fit into a single unsigned).

unsigned first[10];
unsigned second[10];
unsigned result[11];

....   /* first and second get defined */

unsigned carry = 0;
for (i = 0; i < 10; i++) {
    result[i] = first[i] + second[i] + carry;
    carry = (first[i] > result[i]);
}
result[10] = carry;

Hope this helps.

Solution 4

The C++ language doesn't have any concept of a carry flag, so making an intrinsic function wrapper around the ADC instruction is clunky. However, Intel did it anyway: unsigned char _addcarry_u32 (unsigned char c_in, unsigned a, unsigned b, unsigned * out);. Last I checked, gcc did a poor job with this (saving the carry result into an integer register, instead of leaving it in CF), but hopefully Intel's own compiler does better.

See also the tag wiki for assembly documentation.


The compiler will use ADC for you when adding integers wider than a single register, e.g. adding int64_t in 32bit code, or __int128_t in 64bit code.

#include <stdint.h>
#ifdef __x86_64__
__int128_t add128(__int128_t a, __int128_t b) { return a+b; }
#endif
    # clang 3.8 -O3  for x86-64, SystemV ABI.
    # __int128_t args passed in 2 regs each, and returned in rdx:rax
    add     rdi, rdx
    adc     rsi, rcx
    mov     rax, rdi
    mov     rdx, rsi
    ret

asm output from the Godbolt compiler explorer. clang's -fverbose-asm isn't very vebose, but gcc 5.3 / 6.1 wastes two mov instructions so it's less readable.

You can sometimes hand-hold compilers into emitting an adc or otherwise using the carry-out of add using the idiom uint64_t sum = a+b; / carry = sum < a;. But extending this to get a carry-out from an adc instead of add is not possible with current compilers; c+d+carry_in can wrap all the way around, and compilers don't manage to optimize the multiple checks for carry out on each + in c+d+carry if you do it safely.


Clang _ExtInt

There is one way I'm aware of to get a chain of add/adc/.../adc: Clang's new _ExtInt(width) feature that provides fixed-bit-width types of any size up to 16,777,215 bits (blog post). It was added to clang's development version on April 21, 2020, so it's not yet in any released version.

This will hopefully show up in ISO C and/or C++ at some point; The N2472 proposal is apparently being "being actively considered by the ISO WG14 C Language Committee"

typedef _ExtInt(256) wide_int;

wide_int add ( wide_int a, wide_int b) {
    return a+b;
}

compiles as follows with clang trunk -O2 for x86-64 (Godbolt):

add(int _ExtInt<256>, int _ExtInt<256>):
        add     rsi, r9
        adc     rdx, qword ptr [rsp + 8]
        adc     rcx, qword ptr [rsp + 16]
        mov     rax, rdi                        # return the retval pointer
        adc     r8, qword ptr [rsp + 24]        # chain of ADD / 3x ADC!

        mov     qword ptr [rdi + 8], rdx        # store results to mem
        mov     qword ptr [rdi], rsi
        mov     qword ptr [rdi + 16], rcx
        mov     qword ptr [rdi + 24], r8
        ret

Apparently _ExtInt is passed by value in integer registers until the calling convention runs out of registers. (At least in this early version; Perhaps x86-64 SysV should class it as "memory" when it's wider than 2 or maybe 3 registers, like structs larger than 16 bytes. Although moreso than structs, having it in registers is likely to be useful. Just put other args first so they're not displaced.)

The first _ExtInt arg is in R8:RCX:RDX:RSI, and the second has its low qword in R9, with the rest in memory.

A pointer to the return-value object is passed as a hidden first arg in RDI; x86-64 System V only ever returns in up to 2 integer registers (RDX:RAX) and this doesn't change that.

Solution 5

There is a bug in this. Try this input:

unsigned first[10] =  {0x00000001};
unsigned second[10] = {0xffffffff, 0xffffffff};

The result should be {0, 0, 1, ...} but the result is {0, 0, 0, ...}

Changing this line:

carry = (first[i] > result[i]);

to this:

if (carry)
    carry = (first[i] >= result[i]);
else
    carry = (first[i] > result[i]);

fixes it.

Share:
46,412
Martijn Courteaux
Author by

Martijn Courteaux

I'm writing Java, C/C++ and some Objective-C. I started programming in 2007 (when I was 11). Right now, I'm working on my magnum opus: an iOS, Android, OS X, Linux, Windows game to be released soon on all relevant stores. The game is written in C++ using SDL and OpenGL. A couple of seeds for my name (for java.util.Random, radix 26): 4611686047252874006 -9223372008029289706 -4611685989601901802 28825486102

Updated on April 15, 2021

Comments

  • Martijn Courteaux
    Martijn Courteaux about 3 years

    There is an x86 assembly instruction ADC. I've found this means "Add with carry". What does this mean/do? How would one implement the behavior of this instruction in C++?

    INFO:
    Compiled on Windows. I'm using a 32-bit Windows Installation. My processor is Core 2 Duo from Intel.

  • Martijn Courteaux
    Martijn Courteaux over 13 years
    Alright! Thank you. But now, I have to know IF the flag is set. Is this possible in C++?
  • Simone
    Simone over 13 years
    Not with standard C++, you have to use an "asm" code block. I don't remember the exact syntax, but you'll lose code portability.
  • Chubsdad
    Chubsdad over 13 years
    I can't block quote the 'Caveat....' portion in my response. Sometimes this formating just doesn't behave right.
  • stefaanv
    stefaanv over 13 years
    The idea of ADC is not to know the carry flag, but to do an ADD before ADC, so the carry will be set when the ADD overflows
  • Simone
    Simone over 13 years
    @Martijn, if you want to know the carry flag status you may do something like this: pushfd; pop eax; now carry flag is at bit 0 of eax.
  • jww
    jww almost 13 years
    0xffffffff is either -1 or UINT_MAX, which is being stored in an int. Perhaps 'x' should be an unsigned int, or the summands should be INT_MAX (0x7fffffff). If we take the summands to be the same type as the result (ie, signed integer), then OVERFLOW flag is not set - the result is -2 (0xfffffffe).
  • Hassedev
    Hassedev almost 11 years
    carry=(carry&&first[i]>=result[i])||(!carry&&first[i]>result‌​[i]) avoids branching and does the same thing, if anyone is interested.
  • Stephane Hockenhull
    Stephane Hockenhull over 8 years
    This code will fail to set the carry if second==~0U && carry==1. e.g.: with 32bits unsigned that would be second[i]==0xFFFFFFFF && carry==1. In this case first[i] == result[i] even though an overflow (carry) has happened.
  • Stephane Hockenhull
    Stephane Hockenhull over 8 years
    Actually || and && causes branching since they only evaluate the right side as necessary. There are more branching in the one-liner than with the easy-to-read if() statement.
  • Stephane Hockenhull
    Stephane Hockenhull over 8 years
    Working code is unsigned tmp = second[i] + carry; result[i] = first[i] + tmp; carry = (first[i] > result[i]) | (second[i] > tmp);
  • Peter Cordes
    Peter Cordes almost 8 years
    That code is ridiculous; you can't depend on CF being set or not from a C statement outside the asm block. It might happen to work in debug mode, but that's not going to be useful with optimization enabled. Also, use setc carry to set carry to 0 or 1, according to CF.
  • Peter Cordes
    Peter Cordes almost 8 years
    Downvoted for not really answering the C++ aspect of the question. Also, that asm sequence kinda sucks compare to setc al (and movzx eax, al if desired). pushf is a 3-uop instruction on Intel SnB-family CPUs. The push/pop store-forwarding round-trip adds ~5 cycles of latency to the dependency chain involving CF.
  • Peter Cordes
    Peter Cordes almost 8 years
    I was going to edit this answer to expand on it and link the insn set reference manual, but it ended up being too big an edit, so I posted my change as a new answer.
  • vpalmu
    vpalmu over 7 years
    And the fact this is maddenly slow is why it's written in assembly right now.
  • madhur4127
    madhur4127 about 4 years
    It's almost 4 years now, I still can't make the compilers (gcc, clang) work to generate add, adc instruction chain. Do you think it's possible now?
  • Peter Cordes
    Peter Cordes about 4 years
    @madhur4127: For just two instructions, yes that's possible from pure C with the sum=a+b; / carry = sum<a; trick, or __int128. But for longer chains compilers are still terrible AFAIK, even with _addcarry_u32.
  • Peter Cordes
    Peter Cordes about 4 years
    @madhur4127: update: clang _ExtInt can let you use a fixed-width integer type of any width up to 16,777,215 bits. (blog.llvm.org/2020/04/…). godbolt.org/z/bsDCvh shows that + on _ExtInt(256) compiles to add / adc / adc / adc.
  • madhur4127
    madhur4127 about 4 years
    The current version of ExtInt just unrolls everything, I see 4000+ lines of assembly with 2048*2048 multiplication without any loops. At (1<<20) bits, compiler explorer killed the process because of timeout.
  • Peter Cordes
    Peter Cordes about 4 years
    @madhur4127: lol, that's amusing, thanks for checking on that. So not currently practical for very large integers, especially for multiply.
  • Peter Cordes
    Peter Cordes about 3 years
    Building a 16-bit add-with-carry out of wider operations kind of defeats the purpose. Just right-shift result >> 16 to get the high half (carry out), or just use that wider type directly and let the compiler implement it with adc or whatever is efficient on the target ISA.
  • Peter Cordes
    Peter Cordes about 3 years
    Also, if unsigned int actually is a 16-bit type like you seem to be assuming, first + second already wraps before you assign to result. Perhaps you meant first + (unsigned long)second?
  • Peter Cordes
    Peter Cordes about 3 years
    You can edit your answer to replace bad code with good code. But note that the hard part of implementing ADC in pure C is handling carry-in. Your result < second is sufficient to detect carry-out, but you shouldn't add it to this element, you should return it separately. That's like add eax, ecx / adc eax, 0 which is almost never what you want.
  • Peter Cordes
    Peter Cordes about 3 years
    I think this new version fails for cases like 0xffffffff + 0 + carry=1. The while loop does 0xffffffff + 1 = 0, producing a carry-out, which makes the while loop run again producing res=1 and carry=0. Then it tailcalls itself to do 1 + 0, returning 1 and leaving carry=0, when the correct result is 0 with carry=1. Carry-out in either + operation in first + second + *carry needs to send a carry-out to the final *carry output, not add back into the return value. Remember, carry-out is bit 33 of a 32+32 => 33-bit addition, but carry-in has a place value of just 1.