Assembly ADC (Add with carry) to C++

c++ assembly x86 bigint carryflag

46,412

Solution 1

ADC is the same as ADD but adds an extra 1 if processor's carry flag is set.

Solution 2

From here (broken) or here

However, Intel processor has a special instruction called adc. This command behaves similarly as the add command. The only extra thing is that it also add the value carry flag along. So, this may be very handy to add large integers. Suppose you'd like to add a 32-bit integers with 16-bit registers. How can we do that? Well, let's say that the first integer is held on the register pair DX:AX, and the second one is on BX:CX. This is how:
add  ax, cx
adc  dx, bx
Ah, so first, the lower 16-bit is added by add ax, cx. Then the higher 16-bit is added using adc instead of add. It is because: if there are overflows, the carry bit is automatically added in the higher 16-bit. So, no cumbersome checking. This method can be extended to 64 bits and so on... Note that: If the 32-bit integer addition overflows too at the higher 16-bit, the result will not be correct and the carry flag is set, e.g. Adding 5 billion to 5 billion.

Everything from here on, remember that it falls pretty much into the zone of implementation defined behavior.

Here's a small sample that works for VS 2010 (32-bit, WinXp)

Caveat: $7.4/1- "The asm declaration is conditionally-supported; its meaning is implementation-defined. [ Note: Typically it is used to pass information through the implementation to an assembler. —end note ]"

int main(){
   bool carry = false;
   int x = 0xffffffff + 0xffffffff;
   __asm {
      jc setcarry
setcarry:
      mov carry, 1
   }
}

Solution 3

The ADC behaviour can be simulated in both C and C++. The following example adds two numbers (stored as arrays of unsigned as they are too large to fit into a single unsigned).

unsigned first[10];
unsigned second[10];
unsigned result[11];

....   /* first and second get defined */

unsigned carry = 0;
for (i = 0; i < 10; i++) {
    result[i] = first[i] + second[i] + carry;
    carry = (first[i] > result[i]);
}
result[10] = carry;

Hope this helps.

Solution 4

The C++ language doesn't have any concept of a carry flag, so making an intrinsic function wrapper around the ADC instruction is clunky. However, Intel did it anyway: unsigned char _addcarry_u32 (unsigned char c_in, unsigned a, unsigned b, unsigned * out);. Last I checked, gcc did a poor job with this (saving the carry result into an integer register, instead of leaving it in CF), but hopefully Intel's own compiler does better.

See also the x86 tag wiki for assembly documentation.

The compiler will use ADC for you when adding integers wider than a single register, e.g. adding int64_t in 32bit code, or __int128_t in 64bit code.

#include <stdint.h>
#ifdef __x86_64__
__int128_t add128(__int128_t a, __int128_t b) { return a+b; }
#endif
    # clang 3.8 -O3  for x86-64, SystemV ABI.
    # __int128_t args passed in 2 regs each, and returned in rdx:rax
    add     rdi, rdx
    adc     rsi, rcx
    mov     rax, rdi
    mov     rdx, rsi
    ret

asm output from the Godbolt compiler explorer. clang's -fverbose-asm isn't very vebose, but gcc 5.3 / 6.1 wastes two mov instructions so it's less readable.

You can sometimes hand-hold compilers into emitting an adc or otherwise using the carry-out of add using the idiom uint64_t sum = a+b; / carry = sum < a;. But extending this to get a carry-out from an adc instead of add is not possible with current compilers; c+d+carry_in can wrap all the way around, and compilers don't manage to optimize the multiple checks for carry out on each + in c+d+carry if you do it safely.

Clang `_ExtInt`

There is one way I'm aware of to get a chain of add/adc/.../adc: Clang's new _ExtInt(width) feature that provides fixed-bit-width types of any size up to 16,777,215 bits (blog post). It was added to clang's development version on April 21, 2020, so it's not yet in any released version.

This will hopefully show up in ISO C and/or C++ at some point; The N2472 proposal is apparently being "being actively considered by the ISO WG14 C Language Committee"

typedef _ExtInt(256) wide_int;

wide_int add ( wide_int a, wide_int b) {
    return a+b;
}

compiles as follows with clang trunk -O2 for x86-64 (Godbolt):

add(int _ExtInt<256>, int _ExtInt<256>):
        add     rsi, r9
        adc     rdx, qword ptr [rsp + 8]
        adc     rcx, qword ptr [rsp + 16]
        mov     rax, rdi                        # return the retval pointer
        adc     r8, qword ptr [rsp + 24]        # chain of ADD / 3x ADC!

        mov     qword ptr [rdi + 8], rdx        # store results to mem
        mov     qword ptr [rdi], rsi
        mov     qword ptr [rdi + 16], rcx
        mov     qword ptr [rdi + 24], r8
        ret

Apparently _ExtInt is passed by value in integer registers until the calling convention runs out of registers. (At least in this early version; Perhaps x86-64 SysV should class it as "memory" when it's wider than 2 or maybe 3 registers, like structs larger than 16 bytes. Although moreso than structs, having it in registers is likely to be useful. Just put other args first so they're not displaced.)

The first _ExtInt arg is in R8:RCX:RDX:RSI, and the second has its low qword in R9, with the rest in memory.

A pointer to the return-value object is passed as a hidden first arg in RDI; x86-64 System V only ever returns in up to 2 integer registers (RDX:RAX) and this doesn't change that.

Solution 5

There is a bug in this. Try this input:

unsigned first[10] =  {0x00000001};
unsigned second[10] = {0xffffffff, 0xffffffff};

The result should be {0, 0, 1, ...} but the result is {0, 0, 0, ...}

Changing this line:

carry = (first[i] > result[i]);

to this:

if (carry)
    carry = (first[i] >= result[i]);
else
    carry = (first[i] > result[i]);

fixes it.

View more solutions

46,412

Author by

Martijn Courteaux

I'm writing Java, C/C++ and some Objective-C. I started programming in 2007 (when I was 11). Right now, I'm working on my magnum opus: an iOS, Android, OS X, Linux, Windows game to be released soon on all relevant stores. The game is written in C++ using SDL and OpenGL. A couple of seeds for my name (for java.util.Random, radix 26): 4611686047252874006 -9223372008029289706 -4611685989601901802 28825486102

Updated on April 15, 2021

Comments

Martijn Courteaux about 3 years

There is an x86 assembly instruction ADC. I've found this means "Add with carry". What does this mean/do? How would one implement the behavior of this instruction in C++?

INFO:
Compiled on Windows. I'm using a 32-bit Windows Installation. My processor is Core 2 Duo from Intel.
Martijn Courteaux over 13 years

Alright! Thank you. But now, I have to know IF the flag is set. Is this possible in C++?
Simone over 13 years

Not with standard C++, you have to use an "asm" code block. I don't remember the exact syntax, but you'll lose code portability.
Chubsdad over 13 years

I can't block quote the 'Caveat....' portion in my response. Sometimes this formating just doesn't behave right.
stefaanv over 13 years

The idea of ADC is not to know the carry flag, but to do an ADD before ADC, so the carry will be set when the ADD overflows
Simone over 13 years

@Martijn, if you want to know the carry flag status you may do something like this: pushfd; pop eax; now carry flag is at bit 0 of eax.
jww almost 13 years

0xffffffff is either -1 or UINT_MAX, which is being stored in an int. Perhaps 'x' should be an unsigned int, or the summands should be INT_MAX (0x7fffffff). If we take the summands to be the same type as the result (ie, signed integer), then OVERFLOW flag is not set - the result is -2 (0xfffffffe).
Hassedev almost 11 years

carry=(carry&&first[i]>=result[i])||(!carry&&first[i]>result‌[i]) avoids branching and does the same thing, if anyone is interested.
Stephane Hockenhull over 8 years

This code will fail to set the carry if second==~0U && carry==1. e.g.: with 32bits unsigned that would be second[i]==0xFFFFFFFF && carry==1. In this case first[i] == result[i] even though an overflow (carry) has happened.
Stephane Hockenhull over 8 years

Actually || and && causes branching since they only evaluate the right side as necessary. There are more branching in the one-liner than with the easy-to-read if() statement.
Stephane Hockenhull over 8 years

Working code is unsigned tmp = second[i] + carry; result[i] = first[i] + tmp; carry = (first[i] > result[i]) | (second[i] > tmp);
Peter Cordes almost 8 years

That code is ridiculous; you can't depend on CF being set or not from a C statement outside the asm block. It might happen to work in debug mode, but that's not going to be useful with optimization enabled. Also, use setc carry to set carry to 0 or 1, according to CF.
Peter Cordes almost 8 years

Downvoted for not really answering the C++ aspect of the question. Also, that asm sequence kinda sucks compare to setc al (and movzx eax, al if desired). pushf is a 3-uop instruction on Intel SnB-family CPUs. The push/pop store-forwarding round-trip adds ~5 cycles of latency to the dependency chain involving CF.
Peter Cordes almost 8 years

I was going to edit this answer to expand on it and link the insn set reference manual, but it ended up being too big an edit, so I posted my change as a new answer.
vpalmu over 7 years

And the fact this is maddenly slow is why it's written in assembly right now.
madhur4127 about 4 years

It's almost 4 years now, I still can't make the compilers (gcc, clang) work to generate add, adc instruction chain. Do you think it's possible now?
Peter Cordes about 4 years

@madhur4127: For just two instructions, yes that's possible from pure C with the sum=a+b; / carry = sum<a; trick, or __int128. But for longer chains compilers are still terrible AFAIK, even with _addcarry_u32.
Peter Cordes about 4 years

@madhur4127: update: clang _ExtInt can let you use a fixed-width integer type of any width up to 16,777,215 bits. (blog.llvm.org/2020/04/…). godbolt.org/z/bsDCvh shows that + on _ExtInt(256) compiles to add / adc / adc / adc.
madhur4127 about 4 years

The current version of ExtInt just unrolls everything, I see 4000+ lines of assembly with 2048*2048 multiplication without any loops. At (1<<20) bits, compiler explorer killed the process because of timeout.
Peter Cordes about 4 years

@madhur4127: lol, that's amusing, thanks for checking on that. So not currently practical for very large integers, especially for multiply.
Peter Cordes about 3 years

Building a 16-bit add-with-carry out of wider operations kind of defeats the purpose. Just right-shift result >> 16 to get the high half (carry out), or just use that wider type directly and let the compiler implement it with adc or whatever is efficient on the target ISA.
Peter Cordes about 3 years

Also, if unsigned int actually is a 16-bit type like you seem to be assuming, first + second already wraps before you assign to result. Perhaps you meant first + (unsigned long)second?
Peter Cordes about 3 years

You can edit your answer to replace bad code with good code. But note that the hard part of implementing ADC in pure C is handling carry-in. Your result < second is sufficient to detect carry-out, but you shouldn't add it to this element, you should return it separately. That's like add eax, ecx / adc eax, 0 which is almost never what you want.
Peter Cordes about 3 years

I think this new version fails for cases like 0xffffffff + 0 + carry=1. The while loop does 0xffffffff + 1 = 0, producing a carry-out, which makes the while loop run again producing res=1 and carry=0. Then it tailcalls itself to do 1 + 0, returning 1 and leaving carry=0, when the correct result is 0 with carry=1. Carry-out in either + operation in first + second + *carry needs to send a carry-out to the final *carry output, not add back into the return value. Remember, carry-out is bit 33 of a 32+32 => 33-bit addition, but carry-in has a place value of just 1.