Is there any alternative to using % (modulus) in C/C++?
Solution 1
Ah, the joys of bitwise arithmetic. A side effect of many division routines is the modulus - so in few cases should division actually be faster than modulus. I'm interested to see the source you got this information from. Processors with multipliers have interesting division routines using the multiplier, but you can get from division result to modulus with just another two steps (multiply and subtract) so it's still comparable. If the processor has a built in division routine you'll likely see it also provides the remainder.
Still, there is a small branch of number theory devoted to Modular Arithmetic which requires study if you really want to understand how to optimize a modulus operation. Modular arithmatic, for instance, is very handy for generating magic squares.
So, in that vein, here's a very low level look at the math of modulus for an example of x, which should show you how simple it can be compared to division:
Maybe a better way to think about the problem is in terms of number bases and modulo arithmetic. For example, your goal is to compute DOW mod 7 where DOW is the 16-bit representation of the day of the week. You can write this as:
DOW = DOW_HI*256 + DOW_LO
DOW%7 = (DOW_HI*256 + DOW_LO) % 7
= ((DOW_HI*256)%7 + (DOW_LO % 7)) %7
= ((DOW_HI%7 * 256%7) + (DOW_LO%7)) %7
= ((DOW_HI%7 * 4) + (DOW_LO%7)) %7
Expressed in this manner, you can separately compute the modulo 7 result for the high and low bytes. Multiply the result for the high by 4 and add it to the low and then finally compute result modulo 7.
Computing the mod 7 result of an 8-bit number can be performed in a similar fashion. You can write an 8-bit number in octal like so:
X = a*64 + b*8 + c
Where a, b, and c are 3-bit numbers.
X%7 = ((a%7)*(64%7) + (b%7)*(8%7) + c%7) % 7
= (a%7 + b%7 + c%7) % 7
= (a + b + c) % 7
since 64%7 = 8%7 = 1
Of course, a, b, and c are
c = X & 7
b = (X>>3) & 7
a = (X>>6) & 7 // (actually, a is only 2-bits).
The largest possible value for a+b+c
is 7+7+3 = 17
. So, you'll need
one more octal step. The complete (untested) C version could be
written like:
unsigned char Mod7Byte(unsigned char X)
{
X = (X&7) + ((X>>3)&7) + (X>>6);
X = (X&7) + (X>>3);
return X==7 ? 0 : X;
}
I spent a few moments writing a PIC version. The actual implementation is slightly different than described above
Mod7Byte:
movwf temp1 ;
andlw 7 ;W=c
movwf temp2 ;temp2=c
rlncf temp1,F ;
swapf temp1,W ;W= a*8+b
andlw 0x1F
addwf temp2,W ;W= a*8+b+c
movwf temp2 ;temp2 is now a 6-bit number
andlw 0x38 ;get the high 3 bits == a'
xorwf temp2,F ;temp2 now has the 3 low bits == b'
rlncf WREG,F ;shift the high bits right 4
swapf WREG,F ;
addwf temp2,W ;W = a' + b'
; at this point, W is between 0 and 10
addlw -7
bc Mod7Byte_L2
Mod7Byte_L1:
addlw 7
Mod7Byte_L2:
return
Here's a liitle routine to test the algorithm
clrf x
clrf count
TestLoop:
movf x,W
RCALL Mod7Byte
cpfseq count
bra fail
incf count,W
xorlw 7
skpz
xorlw 7
movwf count
incfsz x,F
bra TestLoop
passed:
Finally, for the 16-bit result (which I have not tested), you could write:
uint16 Mod7Word(uint16 X)
{
return Mod7Byte(Mod7Byte(X & 0xff) + Mod7Byte(X>>8)*4);
}
Scott
Solution 2
If you are calculating a number mod some power of two, you can use the bit-wise and operator. Just subtract one from the second number. For example:
x % 8 == x & 7
x % 256 == x & 255
A few caveats:
- This only works if the second number is a power of two.
- It's only equivalent if the modulus is always positive. The C and C++ standards don't specify the sign of the modulus when the first number is negative (until C++11, which does guarantee it will be negative, which is what most compilers were already doing). A bit-wise and gets rid of the sign bit, so it will always be positive (i.e. it's a true modulus, not a remainder). It sounds like that's what you want anyway though.
- Your compiler probably already does this when it can, so in most cases it's not worth doing it manually.
Solution 3
There is an overhead most of the time in using modulo that are not powers of 2. This is regardless of the processor as (AFAIK) even processors with modulus operators are a few cycles slower for divide as opposed to mask operations.
For most cases this is not an optimisation that is worth considering, and certainly not worth calculating your own shortcut operation (especially if it still involves divide or multiply).
However, one rule of thumb is to select array sizes etc. to be powers of 2.
so if calculating day of week, may as well use %7 regardless if setting up a circular buffer of around 100 entries... why not make it 128. You can then write % 128 and most (all) compilers will make this & 0x7F
Solution 4
Unless you really need high performance on multiple embedded platforms, don't change how you code for performance reasons until you profile!
Code that's written awkwardly to optimize for performance is hard to debug and hard to maintain. Write a test case, and profile it on your target. Once you know the actual cost of modulus, then decide if the alternate solution is worth coding.
Solution 5
@Matthew is right. Try this:
int main() {
int i;
for(i = 0; i<=1024; i++) {
if (!(i & 0xFF)) printf("& i = %d\n", i);
if (!(i % 0x100)) printf("mod i = %d\n", i);
}
}
JeffV
Electronics Technologist, currently working as a Software Engineer on VR Simulation Systems. Primarily work in Modern C++, C (embedded), Java, C#. Involved in full engineering life cycle including requirements analysis, system decomposition, high level and low level design, implementation and system verification. Designed and implemented sonar processing systems for beam forming line arrays using distributed services and message passing with ZeroMQ. Designed and implemented VR training simulators with hardware stimulation. Designed and implemented wireless blasting receivers for through the earth communications. Extreme low power and reliability design goals: 6mos on a single C-cell battery. Spend my spare time with my wife and daughter, and enjoy mountain biking, boating, scuba, photography and flying. I like to ask the good questions so all can benefit from the answers. Hard to beat the fastest guns in the west around here. ;-)
Updated on July 09, 2022Comments
-
JeffV almost 2 years
I read somewhere once that the modulus operator is inefficient on small embedded devices like 8 bit micro-controllers that do not have integer division instruction. Perhaps someone can confirm this but I thought the difference is 5-10 time slower than with an integer division operation.
Is there another way to do this other than keeping a counter variable and manually overflowing to 0 at the mod point?
const int FIZZ = 6; for(int x = 0; x < MAXCOUNT; x++) { if(!(x % FIZZ)) print("Fizz\n"); // slow on some systems }
vs:
The way I am currently doing it:
const int FIZZ = 6; int fizzcount = 1; for(int x = 1; x < MAXCOUNT; x++) { if(fizzcount >= FIZZ) { print("Fizz\n"); fizzcount = 0; } }
-
JeffV over 15 yearsHow would you handle the case where it wasn't a neat power of two number?
-
JeffV over 15 yearsI agree with you 100%, however, do you have an alternative to modulus in the case where performance matters?
-
JeffV over 15 yearsI meant division and modulus is 5-10 times faster on a micro with an integer division op code in the instruction set.
-
Rob Rolnick over 15 yearsOh, sorry, I misunderstood. (Clearly.) Have you profiled the other suggestion?
-
JeffV over 15 yearsYes, I see that too. I changed it to 7 because 6 evently divisible by two. I'm looking for an answer that could be used for any number known up front.
-
JeffV over 15 yearsWow, if I ever have to do it for a number other than 7, I'll have to come back and re-read this post.
-
shmuelp over 15 yearsI don't have a specific alternative. If I couldn't use a bitmask or right shift like @Paul-Tomblin suggested, I'd keep a counter, just like you suggested in your question. Of course, I'm used to working with more overhead (we're running CORBA on our processors).
-
Paul Tomblin over 15 yearsJeff V: Interesting fact about math: x * 7 == x + (x * 2) + (x * 4), or x + x >> 1 + x >> 2. Integer addition is usually pretty cheap.
-
JeffV almost 12 yearsPower of two is the easy case.
-
Johan Lundberg almost 11 years"The C and C++ standards don't specify the sign of the modulus when the first number is negative." This has now been resolved with C++11. stackoverflow.com/questions/12710801/c-operator-guarantees
-
étale-cohomology over 7 yearsThis is precisely what I was looking for!!
-
MSalters almost 4 yearsModulus is slow on microcontrollers that lack a hardware division instruction. So this doesn't actually solve the problem.