Need an efficient subtraction algorithm modulo a number

10,871

Solution 1

As many have pointed out, in current C and C++ standards, x % n is no longer implementation-defined for any values of x and n. It is undefined behaviour in the cases where x / n is undefined [1]. Also, x - y is undefined behaviour in the case of integer overflow, which is possible if the signs of x and y might differ.

So the main problem for a general solution is avoiding integer overflow, either in the division or the subtraction. If we know that x and y are non-negative and n is positive, then overflow and division by zero are not possible, and we can confidently say that (x - y) % n is defined. Unfortunately, x - y might be negative, in which case so will be the result of the % operator.

It's easy to correct for the result being negative if we know that n is positive; all we have to do is unconditionally add n and do another modulo operation. That's unlikely to be the best solution, unless you have a computer where division is faster than branching.

If a conditional load instruction is available (pretty common these days), then the compiler will probably do well with the following code, which is portable and well-defined, subject to the constraints that x,y ≥ 0 ∧ n > 0:

((x - y) % n) + ((x >= y) ? 0 : n)

For example, gcc produces this code for my core I5 (although it's generic enough to work on any non-Paleozoic intel chip):

    idivq   %rcx
    cmpq    %rsi, %rdi
    movl    $0, %eax
    cmovge  %rax, %rcx
    leaq    (%rdx,%rcx), %rax

which is cheerfully branch-free. (Conditional move is usually a lot faster than branching.)

Another way of doing this would be (except that the function sign needs to be written):

((x - y) % n) + (sign(x - y) & (unsigned long)n)

where sign is all 1s if its argument is negative, and otherwise 0. One possible implementation of sign (adapted from bithacks) is

unsigned long sign(unsigned long x) {
  return x >> (sizeof(long) * CHAR_BIT - 1);
}

This is portable (casting negative integer values to unsigned is defined), but it may be slow on architectures which lack high-speed shift. It's unlikely to be faster than the previous solution, but YMMV. TIAS.

Neither of these produce correct results for the general case where integer overflow is possible. It's very difficult to deal with integer overflow. (One particularly annoying case is n == -1, although you can test for that and return 0 without any use of %.) Also, you need to decide your preference for the result of modulo of negative n. I personally prefer the definition where x%n is either 0 or has the same sign as n -- otherwise why would you bother with a negative divisor -- but applications differ.

The three-modulo solution proposed by Tom Tanner will work if n is not -1 and n + n does not overflow. n == -1 will fail if either x or y is INT_MIN, and the simple fix of using abs(n) instead of n will fail if n is INT_MIN. The cases where n has a large absolute value could be replaced with comparisons, but there are a lot of corner cases, and made more complicated by the fact that the standard does not require 2's complement arithmetic, so it's not easily predictable what the corner cases are [2].

As a final note, some tempting solutions do not work. You cannot just take the absolute value of (x - y):

(-z) % n == -(z % n) == n - (z % n) ≠ z % n (unless z % n happens to be n / 2)

And, for the same reason, you cannot just take the absolute value of the result of modulo.

Also, you cannot just cast (x - y) to unsigned:

(unsigned)z == z + 2k (for some k) if z < 0
(z + 2k) % n == (z % n) + (2k % n) ≠ z % n unless (2k % n) == 0


[1] x/n and x%n are both undefined if n==0. But x%n is also undefined if x/n is "not representable" (i.e. there was integer overflow), which will happen on twos-complement machines (that is, all the ones you care about) if x is most negative representable number and n == -1. It's clear why x/n should be undefined in this case, but slightly less so in the case of x%n, since that value is (mathematically) 0.

[2] Most people who complain about the difficulty of predicting the results of floating-point arithmetic haven't spent much time trying to write truly portable integer arithmetic code :)

Solution 2

If you want to avoid undefined behaviour, without an if, the following would work

return (x % n - y % n + n) % n;

The efficiency depends on the implementation of the modulo operation, but I'd suspect algorithms involving if would be rather faster.

Alternatively you could treat x and y as unsigned. In which case there are no negative numbers involved and no undefined behaviour.

Solution 3

With C++11 the undefined behavior was removed. Depending on the the exact behavior you want you can there just stick with

return (x-y) % n;

For a full explanation read this answer:

https://stackoverflow.com/a/13100805/1149664

You still get undefined behavior for n==0 or if x-y can not be stored in the type you are using.

Solution 4

Whether branching is going to matter will depend on the CPU to some degree. According to the documentation abs (on MSDN) has intrinsic behavior and it might not be a bottleneck at all. This you'll have to test.

If you wan't unconditionally compute things there are several nice methods that can be adapted from the Bit Twiddling Hacks site.

int v;           // we want to find the absolute value of v
unsigned int r;  // the result goes here 
int const mask = v >> sizeof(int) * CHAR_BIT - 1;

r = (v + mask) ^ mask;

However, I don't know if this will be helpful to your situation without more information about hardware targets and testing.

Just out of curiosity I had to test this myself and when you look at the assembly generated by the compiler we can see there's no real overhead in the use of abs.

unsigned r = abs(i);
====
00381006  cdq              
00381007  xor         eax,edx 
00381009  sub         eax,edx

The following is just an alternate form of the above example which according to the Bit Twiddling Site is not patented (while the version used by the Visual C++ 2008 compiler is).

Throughout my answer I have been using MSDN and Visual C++ but I would assume that any sane compiler has similar behavior.

Solution 5

Assuming 0 <= x < n and 0 <= y < n, how about (x + n - y) % n? Then x + n will certainly be larger than y, subtracting y will always result in a positive integer, and the final mod n reduces the result if necessary.

Share:
10,871
Johannes
Author by

Johannes

Updated on June 25, 2022

Comments

  • Johannes
    Johannes almost 2 years

    For given numbers x,y and n, I would like to calculate x-y mod n in C. Look at this example:

    int substract_modulu(int x, int y, int n)
    {
        return (x-y) % n;
    }
    

    As long as x>y, we are fine. In the other case, however, the modulu operation is undefined.

    You can think of x,y,n>0. I would like the result to be positive, so if (x-y)<0, then ((x-y)-substract_modulu(x,y,n))/ n shall be an integer.

    What is the fastest algorithm you know for that? Is there one which avoids any calls of if and operator??

    • Admin
      Admin over 11 years
      You can use abs(), but that won't be more efficient by any means.
    • John Leidegren
      John Leidegren over 11 years
      How do you wish to define the negative result? What is the desirable outcome if y>x?
    • Tristram Gräbener
      Tristram Gräbener over 11 years
      Out of curiosity, in what situation is an IF to expensive? If it's cryptography, then implementing it by yourself is a very dangerous idea (excepted if it is only for learning, and not production code). If you have lots of calculations, maybe you can speed up thing with SSE
    • John Leidegren
      John Leidegren over 11 years
      IBM's PowerPC processor (XBOX360) and the PS3 PPU/SPU really don't like branching. It can be really useful in console programming scenarios. It's sometimes more efficient just to do other things even if it results in more instructions.
    • Johan Lundberg
      Johan Lundberg over 11 years
      You tagged with C++. Is it C++ or C? If C++, have a look at my answer.
  • Johannes
    Johannes over 11 years
    Thanks, the first seems good. But what do you mean with the unsigned method?
  • KillianDS
    KillianDS over 11 years
    I think you can drop the modulo op on x.
  • Tom Tanner
    Tom Tanner over 11 years
    @Johannes - if x and y are unsigned ints, then x - y can never be negative. n should be unsigned in any case, modulo a negative number doesn't make a lot of sense
  • Johannes
    Johannes over 11 years
    @TomTanner Okay, let's say x-y=-1, then in unsigned (suppose we have char instead of int, for easiness), x-y=255. But if n=3, then -1%3=2!=0=255%3. Are you sure this works?
  • Mats Petersson
    Mats Petersson over 11 years
    Doing TWO modulo operations is almost certainly BAD. Modulo of arbitrary numbers (not power of two) is done via divide. Divide is a fairly slow operation in EVERY processor that I know of. And there are usually a small number of divide units, so it can't be done in parallel. Foir example, AMD's optimisation guide gives latency times of 71 clock cycles for IDIV (64-bit number, a 32-bit number is 39 clock cycles). I believe Intel processors are similar, and just about all processors are worse than that.
  • Mats Petersson
    Mats Petersson over 11 years
    And further to the above, there is no integer SSE divide instruction.
  • Johannes
    Johannes over 11 years
    Good point. In my case, it this is the case indeed. Otherwise, I could just calculate (x+n-(y%n))%n. Which is indeed the same as in TomTanner's post.
  • Johannes
    Johannes over 11 years
    In my case, however, I can not use C++11. But really interesting!
  • Johannes
    Johannes over 11 years
    Thanks. Your post spotted some issues in TomTanners post. One thing, though: if x-y < -n , I think your "+n"-solution won't work, since x-y+n<0?
  • rici
    rici over 11 years
    @Johannes: I'm adding n to (x-y)%n, not to x-y. (x-y)%n > (-n) in all cases where (x-y) doesn't overflow.
  • rici
    rici over 11 years
    @Johannes, by the way, you could do the test and add solution in two steps, such as: int tmp = (x - y) % n; return tmp + (tmp < 0) ? n : 0;, which will work even with the pre-C++11 guarantees. In fact, I believe that both C++11 and C++03 use the same division op-code to do the modulo calculation; the improved guarantee in C++11 simply codifies the reality that all modern hardware works that way. C99 has the same guarantee, and that was over a decade ago.
  • rici
    rici over 11 years
    @Johannes, by the way, I forgot to mention this before. If n is known, you can replace the division by a multiplication. By known, I mean either it's a compile-time constant, or you're going to use it often enough that it's worth doing some computation to figure out what to use as a multiplier. I'm sure you can find the details on the internet somewhere. If n is known at compile-time, both gcc and clang are capable of working it out for you, so you don't really have to know the details.
  • Tom Tanner
    Tom Tanner over 11 years
    @MatsPetersson I did say that using 'if' would probably be faster. I'm not sure of the expense of 39 clock cycles vs a missed branch
  • Tom Tanner
    Tom Tanner over 11 years
    @Johannes I wasn't suggesting casting x - y to unsigned, but both x and y
  • Tom Tanner
    Tom Tanner over 11 years
    some clarifications: I wasn't suggesting casting x - y to unsigned, but casting both x and y to unsigned. Also, I don't believe this function makes any sense if n is -ve.
  • rici
    rici over 11 years
    @TomTanner, presumably the OP wants (x - y) mod n also when y > x, and not (2^32 + x - y) mod n, which is what he would get if he casted x and y to unsigned. It doesn't make any difference if you cast to unsigned before or after doing the subtraction (unless the subtraction causes integer overflow).
  • Edward
    Edward over 7 years
    Your first proposed solution seems to fail on the edge case of x == y, in which the result of (x - y) % n should be 0 (for n > 0). The C++ % operator correctly returns 0, but ((x - y) % n) + ((x > y) ? 0 : n) returns n (an invalid value modulo n) because (x > y) is false.
  • rici
    rici over 7 years
    @edward: i think you're right, and it wasn't the only mistake. Thanks.