Fast way to calculate n! mod m where m is prime?

algorithm math factorial modulus

32,644

Solution 1

Expanding my comment to an answer:

Yes, there are more efficient ways to do this. But they are extremely messy.

So unless you really need that extra performance, I don't suggest to try to implement these.

The key is to note that the modulus (which is essentially a division) is going to be the bottleneck operation. Fortunately, there are some very fast algorithms that allow you to perform modulus over the same number many times.

These methods are fast because they essentially eliminate the modulus.

Those methods alone should give you a moderate speedup. To be truly efficient, you may need to unroll the loop to allow for better IPC:

Something like this:

ans0 = 1
ans1 = 1
for i in range(1,(n+1) / 2):
    ans0 = ans0 * (2*i + 0) % modulus    
    ans1 = ans1 * (2*i + 1) % modulus    

return ans0 * ans1 % modulus

but taking into account for an odd # of iterations and combining it with one of the methods I linked to above.

Some may argue that loop-unrolling should be left to the compiler. I will counter-argue that compilers are currently not smart enough to unroll this particular loop. Have a closer look and you will see why.

Note that although my answer is language-agnostic, it is meant primarily for C or C++.

Solution 2

n can be arbitrarily large

Well, n can't be arbitrarily large - if n >= m, then n! ≡ 0 (mod m) (because m is one of the factors, by the definition of factorial).

Assuming n << m and you need an exact value, your algorithm can't get any faster, to my knowledge. However, if n > m/2, you can use the following identity (Wilson's theorem - Thanks @Daniel Fischer!)

(image)

to cap the number of multiplications at about m-n

(m-1)! ≡ -1 (mod m)
1 * 2 * 3 * ... * (n-1) * n * (n+1) * ... * (m-2) * (m-1) ≡ -1 (mod m)
n! * (n+1) * ... * (m-2) * (m-1) ≡ -1 (mod m)
n! ≡ -[(n+1) * ... * (m-2) * (m-1)]^-1 (mod m)

This gives us a simple way to calculate n! (mod m) in m-n-1 multiplications, plus a modular inverse:

def factorialMod(n, modulus):
    ans=1
    if n <= modulus//2:
        #calculate the factorial normally (right argument of range() is exclusive)
        for i in range(1,n+1):
            ans = (ans * i) % modulus   
    else:
        #Fancypants method for large n
        for i in range(n+1,modulus):
            ans = (ans * i) % modulus
        ans = modinv(ans, modulus)
        ans = -1*ans + modulus
    return ans % modulus

We can rephrase the above equation in another way, that may or may-not perform slightly faster. Using the following identity:

(image)

we can rephrase the equation as

n! ≡ -[(n+1) * ... * (m-2) * (m-1)]^-1 (mod m)
n! ≡ -[(n+1-m) * ... * (m-2-m) * (m-1-m)]^-1 (mod m)
       (reverse order of terms)
n! ≡ -[(-1) * (-2) * ... * -(m-n-2) * -(m-n-1)]^-1 (mod m)
n! ≡ -[(1) * (2) * ... * (m-n-2) * (m-n-1) * (-1)^(m-n-1)]^-1 (mod m)
n! ≡ [(m-n-1)!]^-1 * (-1)^(m-n) (mod m)

This can be written in Python as follows:

def factorialMod(n, modulus):
    ans=1
    if n <= modulus//2:
        #calculate the factorial normally (right argument of range() is exclusive)
        for i in range(1,n+1):
            ans = (ans * i) % modulus   
    else:
        #Fancypants method for large n
        for i in range(1,modulus-n):
            ans = (ans * i) % modulus
        ans = modinv(ans, modulus)

        #Since m is an odd-prime, (-1)^(m-n) = -1 if n is even, +1 if n is odd
        if n % 2 == 0:
            ans = -1*ans + modulus
    return ans % modulus

If you don't need an exact value, life gets a bit easier - you can use Stirling's approximation to calculate an approximate value in O(log n) time (using exponentiation by squaring).

Finally, I should mention that if this is time-critical and you're using Python, try switching to C++. From personal experience, you should expect about an order-of-magnitude increase in speed or more, simply because this is exactly the sort of CPU-bound tight-loop that natively-compiled code excels at (also, for whatever reason, GMP seems much more finely-tuned than Python's Bignum).

Solution 3

n! mod m can be computed in O(n^{1/2 + ε}) operations instead of the naive O(n). This requires use of FFT polynomial multiplication, and is only worthwhile for very large n, e.g. n > 10⁴.

An outline of the algorithm and some timings can be seen here: http://fredrikj.net/blog/2012/03/factorials-mod-n-and-wilsons-theorem/

Solution 4

If we want to calculate M = a*(a+1) * ... * (b-1) * b (mod p), we can use the following approach, if we assume we can add, substract and multiply fast (mod p), and get a running time complexity of O( sqrt(b-a) * polylog(b-a) ).

For simplicity, assume (b-a+1) = k^2, is a square. Now, we can divide our product into k parts, i.e. M = [a*..*(a+k-1)] *...* [(b-k+1)*..*b]. Each of the factors in this product is of the form p(x)=x*..*(x+k-1), for appropriate x.

By using a fast multiplication algorithm of polynomials, such as Schönhage–Strassen algorithm, in a divide & conquer manner, one can find the coefficients of the polynomial p(x) in O( k * polylog(k) ). Now, apparently there is an algorithm for substituting k points in the same degree-k polynomial in O( k * polylog(k) ), which means, we can calculate p(a), p(a+k), ..., p(b-k+1) fast.

This algorithm of substituting many points into one polynomial is described in the book "Prime numbers" by C. Pomerance and R. Crandall. Eventually, when you have these k values, you can multiply them in O(k) and get the desired value.

Note that all of our operations where taken (mod p). The exact running time is O(sqrt(b-a) * log(b-a)^2 * log(log(b-a))).

Solution 5

Expanding on my comment, this takes about 50% of the time for all n in [100, 100007] where m=(117 | 1117):

Function facmod(n As Integer, m As Integer) As Integer
    Dim f As Integer = 1
    For i As Integer = 2 To n
        f = f * i
        If f > m Then
            f = f Mod m
        End If
    Next
    Return f
End Function

View more solutions

32,644

John Smith

Updated on July 09, 2022

Comments

John Smith almost 2 years
I was curious if there was a good way to do this. My current code is something like:
```
def factorialMod(n, modulus):
    ans=1
    for i in range(1,n+1):
        ans = ans * i % modulus    
    return ans % modulus
```
But it seems quite slow!

I also can't calculate n! and then apply the prime modulus because sometimes n is so large that n! is just not feasible to calculate explicitly.

I also came across http://en.wikipedia.org/wiki/Stirling%27s_approximation and wonder if this can be used at all here in some way?

Or, how might I create a recursive, memoized function in C++?
- Fred Foo about 12 years
  
  How slow is slow? From your pseudocode, I infer you're computing this in Python, is that right?
- John Smith about 12 years
  
  Any language, really; it's pretty much the same in C++ in terms of syntax. I chose Python here because it's easy to read. Even in C++, though, I need a faster function.
- Mysticial about 12 years
  
  There's a very fast way to do this using invariant multiplication or possibly Montgomery reduction. Both methods eliminate the modulus and will allow for loop-unrolling techniques.
- davin about 12 years
  
  You can break down modulus into prime factors to identify cases that will be zero more easily, although that won't help for large prime factors - how helpful this is depends on what you know about the modulus, if anything, and if prime factorisation tickles your fancy.
- Andrew Morton about 12 years
  
  You can shave a bit of time off by only doing the mod if ans > modulus (credit: tech.groups.yahoo.com/group/primenumbers/messages/… )
- John Smith about 12 years
  
  @kilotaras: Yes, M is prime (sorry, did not know this was relevant; will change post)
- BlueRaja - Danny Pflughoeft about 12 years
  
  @John: I've improved my answer, and included some Python code. This version should improve performance greatly when n >> m/2 *(performance will be the same when n <= m/2). I really don't see how memoization could be implemented with this problem - there is nothing to memoize.
- Thomas Ahle about 11 years
  
  If n! mod m could be calculated in polynomial time (in the number of bits), factoring would be in P. At least so they say on this complexity blog: rjlipton.wordpress.com/2013/05/06/a-most-perplexing-mystery
- Alex M. over 9 years
  
  In fact, the link that @ThomasAhle meant to give is: rjlipton.wordpress.com/2009/02/23/factoring-and-factorials . As a side note, I think that the ideas expressed there are flawed, as noticed in a comment on that page. No wonder there is no other mention of this fact on the whole internet.
John Smith about 12 years

however, for large n, calculating n! and then performing mod is not feasible
cdeszaq about 12 years

Not feasible...why? Due to memory constraints? From the question, speed was the issue, not memory. If you are looking to have as small a memory footprint as possible and then optimize for speed, please update your question to reflect this.
John Smith about 12 years

Is there usually a tradeoff between speed and memory when it comes to factorials? I'll update the question either way
cdeszaq about 12 years

@JohnSmith - There's always a speed/memory trade-off with any non-trivial calculation. I have updated my answer with some other non-approximate ways of calculating the values in a less time-dependent way. Typically, you run into issues dealing with gigantic numbers when you deal with factorials, which can lead to memory problems in some cases, but there are ways of handling it. (See Java's BigInt class)
sdcvvc about 12 years

-1 Computing n! and then mod is very slow, please try to compute 2000000! mod 5250307 that way. OP is doing it better in the question, you should interleave multiplication and taking modulo.
Mysticial about 12 years

It might be nice to get a comment from whoever just downvoted the 3 top ansewrs.
BlueRaja - Danny Pflughoeft about 12 years

@cdeszaq: What you seem to be missing is that multiplying two extremely large numbers (larger than the size of a register) is not O(1) on a computer: it's closer to O(m log m) (m = #bits). Multiplying two m-bit numbers results in (m+m)-bits, so your method takes approximately m log(m) + 2m log(m) + 3m log(m) + ... + nm log(m) = nm log(m)(n+1)/2 = O(mn^2 log(m)) operations. Taking a modulus after each operation, however, would result in about 2(m log (m)) + 2(m log(m)) + ...n additions... + 2(m log(m)) = 2mn log(m) = O(mn log(m)) which is significantly faster, even for small n.
John Smith about 12 years

How might recursion + memoization be done in C++ for factoral mod m?
Mysticial about 12 years

@JohnSmith TBH, Memoization is probably not going to help at all - there's nothing to memoize. The only way it might become helpful is if you try the prime-factorization approach and use the windowing algorithm for exponentiation by squaring. (The windowing algorithm is a memoization algorithm.) But prime factorizing all integers from 1 to n will probably be slower than your current algorithm.
John Smith about 12 years

Well in my case I am iterating from low n to high n, so doesn't that mean I can save time by storing values I've already calculated? For large n it seems like it'd save a lot of time by only doing a couple iterations rather than go from i=1 to n or n/2
Mysticial about 12 years

Well... There's nothing to "save". Knowing a couple iterations won't help you with the rest of them.
John Smith about 12 years

But if I am calculating a bunch of numbers on the order of something like (20 million)! and greater, don't these iterations matter?
John Smith about 12 years

I'll just save this as answer and make another question
Mysticial about 12 years

Ah ok, I was going to ask to clarify what you mean by "iterations matter". But I'll wait for your new question.
Daniel Fischer about 12 years

"Thus, when m/2 < n < m, you only need to calculate (m/2)! * (-2)^(n-m/2-1) (mod m)" You can do better then. By Wilson's theorem, (m-1)! ≡ -1 (mod m) if m is prime. Now (m-1)! = n! * (m - (m-n-1)) * ... * (m - 1) ≡ (-1)^(m-n-1) * n! * (m-n-1)! (mod m), so n! ≡ (-1)^(m-n) * ((m-n-1)!)^(-1) (mod m). So you need to calculate (m-n-1)! mod m, find its modular inverse (O(log m) steps), and adjust the sign if necessary. Not much difference when n is close to m/2, but nice when n > 3m/4 or so.
BlueRaja - Danny Pflughoeft about 12 years

@DanielFischer: Thanks! I've included that in the answer.
ohad over 9 years

The algorithm of "substituting many points into one polynomial" is described also in the well known book "introduction to algorithms" by H. Cormen and others (in the FFT chapter).
Douglas Zare about 9 years

This is not helpful. BlueRaja-Danny-Pflughoeft already mentioned Wilson's theorem, and it doesn't do much because you can't count on needing just (m-1)!, or (m-k)! for small k, which his answer covered but yours didn't.
Bogdan Alexandru about 9 years

Computing n! for very large n is not only slow, but quite impossible because the numbers get so large you can't address them any more.
Mercado over 7 years

It's exactly the same as the naive algorithm implemented as a recursive function.
Robin Houston over 6 years

This is a much better answer than the accepted answer.
Spencer about 2 years

I cam3 to this question wondering about the specific case where m=4k+1 and n=2k, k possibly VERY large.