Assembly ADC (Add with carry) to C++
Solution 1
ADC is the same as ADD but adds an extra 1 if processor's carry flag is set.
Solution 2
However, Intel processor has a special instruction called adc. This command behaves similarly as the add command. The only extra thing is that it also add the value carry flag along. So, this may be very handy to add large integers. Suppose you'd like to add a 32-bit integers with 16-bit registers. How can we do that? Well, let's say that the first integer is held on the register pair DX:AX, and the second one is on BX:CX. This is how:
add ax, cx adc dx, bx
Ah, so first, the lower 16-bit is added by add ax, cx. Then the higher 16-bit is added using adc instead of add. It is because: if there are overflows, the carry bit is automatically added in the higher 16-bit. So, no cumbersome checking. This method can be extended to 64 bits and so on... Note that: If the 32-bit integer addition overflows too at the higher 16-bit, the result will not be correct and the carry flag is set, e.g. Adding 5 billion to 5 billion.
Everything from here on, remember that it falls pretty much into the zone of implementation defined behavior.
Here's a small sample that works for VS 2010 (32-bit, WinXp)
Caveat: $7.4/1- "The asm declaration is conditionally-supported; its meaning is implementation-defined. [ Note: Typically it is used to pass information through the implementation to an assembler. —end note ]"
int main(){
bool carry = false;
int x = 0xffffffff + 0xffffffff;
__asm {
jc setcarry
setcarry:
mov carry, 1
}
}
Solution 3
The ADC behaviour can be simulated in both C and C++. The following example adds two numbers (stored as arrays of unsigned as they are too large to fit into a single unsigned).
unsigned first[10];
unsigned second[10];
unsigned result[11];
.... /* first and second get defined */
unsigned carry = 0;
for (i = 0; i < 10; i++) {
result[i] = first[i] + second[i] + carry;
carry = (first[i] > result[i]);
}
result[10] = carry;
Hope this helps.
Solution 4
The C++ language doesn't have any concept of a carry flag, so making an intrinsic function wrapper around the ADC
instruction is clunky. However, Intel did it anyway: unsigned char _addcarry_u32 (unsigned char c_in, unsigned a, unsigned b, unsigned * out);
. Last I checked, gcc did a poor job with this (saving the carry result into an integer register, instead of leaving it in CF), but hopefully Intel's own compiler does better.
See also the x86 tag wiki for assembly documentation.
The compiler will use ADC for you when adding integers wider than a single register, e.g. adding int64_t
in 32bit code, or __int128_t
in 64bit code.
#include <stdint.h>
#ifdef __x86_64__
__int128_t add128(__int128_t a, __int128_t b) { return a+b; }
#endif
# clang 3.8 -O3 for x86-64, SystemV ABI.
# __int128_t args passed in 2 regs each, and returned in rdx:rax
add rdi, rdx
adc rsi, rcx
mov rax, rdi
mov rdx, rsi
ret
asm output from the Godbolt compiler explorer. clang's -fverbose-asm
isn't very vebose, but gcc 5.3 / 6.1 wastes two mov
instructions so it's less readable.
You can sometimes hand-hold compilers into emitting an adc
or otherwise using the carry-out of add
using the idiom uint64_t sum = a+b;
/ carry = sum < a;
. But extending this to get a carry-out from an adc
instead of add
is not possible with current compilers; c+d+carry_in
can wrap all the way around, and compilers don't manage to optimize the multiple checks for carry out on each +
in c+d+carry
if you do it safely.
Clang _ExtInt
There is one way I'm aware of to get a chain of add/adc/.../adc: Clang's new _ExtInt(width)
feature that provides fixed-bit-width types of any size up to 16,777,215 bits (blog post). It was added to clang's development version on April 21, 2020, so it's not yet in any released version.
This will hopefully show up in ISO C and/or C++ at some point; The N2472 proposal is apparently being "being actively considered by the ISO WG14 C Language Committee"
typedef _ExtInt(256) wide_int;
wide_int add ( wide_int a, wide_int b) {
return a+b;
}
compiles as follows with clang trunk -O2
for x86-64 (Godbolt):
add(int _ExtInt<256>, int _ExtInt<256>):
add rsi, r9
adc rdx, qword ptr [rsp + 8]
adc rcx, qword ptr [rsp + 16]
mov rax, rdi # return the retval pointer
adc r8, qword ptr [rsp + 24] # chain of ADD / 3x ADC!
mov qword ptr [rdi + 8], rdx # store results to mem
mov qword ptr [rdi], rsi
mov qword ptr [rdi + 16], rcx
mov qword ptr [rdi + 24], r8
ret
Apparently _ExtInt
is passed by value in integer registers until the calling convention runs out of registers. (At least in this early version; Perhaps x86-64 SysV should class it as "memory" when it's wider than 2 or maybe 3 registers, like structs larger than 16 bytes. Although moreso than structs, having it in registers is likely to be useful. Just put other args first so they're not displaced.)
The first _ExtInt arg is in R8:RCX:RDX:RSI, and the second has its low qword in R9, with the rest in memory.
A pointer to the return-value object is passed as a hidden first arg in RDI; x86-64 System V only ever returns in up to 2 integer registers (RDX:RAX) and this doesn't change that.
Solution 5
There is a bug in this. Try this input:
unsigned first[10] = {0x00000001};
unsigned second[10] = {0xffffffff, 0xffffffff};
The result should be {0, 0, 1, ...} but the result is {0, 0, 0, ...}
Changing this line:
carry = (first[i] > result[i]);
to this:
if (carry)
carry = (first[i] >= result[i]);
else
carry = (first[i] > result[i]);
fixes it.
Martijn Courteaux
I'm writing Java, C/C++ and some Objective-C. I started programming in 2007 (when I was 11). Right now, I'm working on my magnum opus: an iOS, Android, OS X, Linux, Windows game to be released soon on all relevant stores. The game is written in C++ using SDL and OpenGL. A couple of seeds for my name (for java.util.Random, radix 26): 4611686047252874006 -9223372008029289706 -4611685989601901802 28825486102
Updated on April 15, 2021Comments
-
Martijn Courteaux about 3 years
There is an x86 assembly instruction
ADC
. I've found this means "Add with carry". What does this mean/do? How would one implement the behavior of this instruction in C++?INFO:
Compiled on Windows. I'm using a 32-bit Windows Installation. My processor is Core 2 Duo from Intel. -
Martijn Courteaux over 13 yearsAlright! Thank you. But now, I have to know IF the flag is set. Is this possible in C++?
-
Simone over 13 yearsNot with standard C++, you have to use an "asm" code block. I don't remember the exact syntax, but you'll lose code portability.
-
Chubsdad over 13 yearsI can't block quote the 'Caveat....' portion in my response. Sometimes this formating just doesn't behave right.
-
stefaanv over 13 yearsThe idea of ADC is not to know the carry flag, but to do an ADD before ADC, so the carry will be set when the ADD overflows
-
Simone over 13 years@Martijn, if you want to know the carry flag status you may do something like this: pushfd; pop eax; now carry flag is at bit 0 of eax.
-
jww almost 13 years0xffffffff is either -1 or UINT_MAX, which is being stored in an int. Perhaps 'x' should be an unsigned int, or the summands should be INT_MAX (0x7fffffff). If we take the summands to be the same type as the result (ie, signed integer), then OVERFLOW flag is not set - the result is -2 (0xfffffffe).
-
Hassedev almost 11 years
carry=(carry&&first[i]>=result[i])||(!carry&&first[i]>result[i])
avoids branching and does the same thing, if anyone is interested. -
Stephane Hockenhull over 8 yearsThis code will fail to set the carry if
second==~0U && carry==1
. e.g.: with 32bits unsigned that would besecond[i]==0xFFFFFFFF && carry==1
. In this casefirst[i] == result[i]
even though an overflow (carry) has happened. -
Stephane Hockenhull over 8 yearsActually
||
and&&
causes branching since they only evaluate the right side as necessary. There are more branching in the one-liner than with the easy-to-read if() statement. -
Stephane Hockenhull over 8 yearsWorking code is
unsigned tmp = second[i] + carry; result[i] = first[i] + tmp; carry = (first[i] > result[i]) | (second[i] > tmp);
-
Peter Cordes almost 8 yearsThat code is ridiculous; you can't depend on
CF
being set or not from a C statement outside theasm
block. It might happen to work in debug mode, but that's not going to be useful with optimization enabled. Also, usesetc carry
to set carry to 0 or 1, according toCF
. -
Peter Cordes almost 8 yearsDownvoted for not really answering the C++ aspect of the question. Also, that asm sequence kinda sucks compare to
setc al
(andmovzx eax, al
if desired).pushf
is a 3-uop instruction on Intel SnB-family CPUs. The push/pop store-forwarding round-trip adds ~5 cycles of latency to the dependency chain involving CF. -
Peter Cordes almost 8 yearsI was going to edit this answer to expand on it and link the insn set reference manual, but it ended up being too big an edit, so I posted my change as a new answer.
-
vpalmu over 7 yearsAnd the fact this is maddenly slow is why it's written in assembly right now.
-
madhur4127 about 4 yearsIt's almost 4 years now, I still can't make the compilers (gcc, clang) work to generate
add
,adc
instruction chain. Do you think it's possible now? -
Peter Cordes about 4 years@madhur4127: For just two instructions, yes that's possible from pure C with the
sum=a+b;
/carry = sum<a;
trick, or__int128
. But for longer chains compilers are still terrible AFAIK, even with_addcarry_u32
. -
Peter Cordes about 4 years@madhur4127: update: clang
_ExtInt
can let you use a fixed-width integer type of any width up to 16,777,215 bits. (blog.llvm.org/2020/04/…). godbolt.org/z/bsDCvh shows that+
on_ExtInt(256)
compiles toadd
/adc
/adc
/adc
. -
madhur4127 about 4 yearsThe current version of ExtInt just unrolls everything, I see 4000+ lines of assembly with 2048*2048 multiplication without any loops. At (1<<20) bits, compiler explorer killed the process because of timeout.
-
Peter Cordes about 4 years@madhur4127: lol, that's amusing, thanks for checking on that. So not currently practical for very large integers, especially for multiply.
-
Peter Cordes about 3 yearsBuilding a 16-bit add-with-carry out of wider operations kind of defeats the purpose. Just right-shift
result >> 16
to get the high half (carry out), or just use that wider type directly and let the compiler implement it withadc
or whatever is efficient on the target ISA. -
Peter Cordes about 3 yearsAlso, if
unsigned int
actually is a 16-bit type like you seem to be assuming,first + second
already wraps before you assign toresult
. Perhaps you meantfirst + (unsigned long)second
? -
Peter Cordes about 3 yearsYou can edit your answer to replace bad code with good code. But note that the hard part of implementing ADC in pure C is handling carry-in. Your
result < second
is sufficient to detect carry-out, but you shouldn't add it to this element, you should return it separately. That's likeadd eax, ecx
/adc eax, 0
which is almost never what you want. -
Peter Cordes about 3 yearsI think this new version fails for cases like
0xffffffff + 0 + carry=1
. The while loop does0xffffffff + 1 = 0
, producing a carry-out, which makes the while loop run again producing res=1 and carry=0. Then it tailcalls itself to do1 + 0
, returning 1 and leaving carry=0, when the correct result is0
with carry=1. Carry-out in either+
operation infirst + second + *carry
needs to send a carry-out to the final*carry
output, not add back into the return value. Remember, carry-out is bit 33 of a 32+32 => 33-bit addition, but carry-in has a place value of just 1.