Fast prime factorization module

65,353

Solution 1

If you don't want to reinvent the wheel, use the library sympy

pip install sympy

Use the function sympy.ntheory.factorint

Given a positive integer n, factorint(n) returns a dict containing the prime factors of n as keys and their respective multiplicities as values. For example:

Example:

>>> from sympy.ntheory import factorint
>>> factorint(10**20+1)
{73: 1, 5964848081: 1, 1676321: 1, 137: 1}

You can factor some very large numbers:

>>> factorint(10**100+1)
{401: 1, 5964848081: 1, 1676321: 1, 1601: 1, 1201: 1, 137: 1, 73: 1, 129694419029057750551385771184564274499075700947656757821537291527196801: 1}

Solution 2

There is no need to calculate smallprimes using primesbelow, use smallprimeset for that.

smallprimes = (2,) + tuple(n for n in xrange(3,1000,2) if n in smallprimeset)

Divide your primefactors into two functions for handling smallprimes and other for pollard_brent, this can save a couple of iterations as all the powers of smallprimes will be divided from n.

def primefactors(n, sort=False):
    factors = []

    limit = int(n ** .5) + 1
    for checker in smallprimes:
        print smallprimes[-1]
        if checker > limit: break
        while n % checker == 0:
            factors.append(checker)
            n //= checker


    if n < 2: return factors
    else : 
        factors.extend(bigfactors(n,sort))
        return factors

def bigfactors(n, sort = False):
    factors = []
    while n > 1:
        if isprime(n):
            factors.append(n)
            break
        factor = pollard_brent(n) 
        factors.extend(bigfactors(factor,sort)) # recurse to factor the not necessarily prime factor returned by pollard-brent
        n //= factor

    if sort: factors.sort()    
    return factors

By considering verified results of Pomerance, Selfridge and Wagstaff and Jaeschke, you can reduce the repetitions in isprime which uses Miller-Rabin primality test. From Wiki.

  • if n < 1,373,653, it is enough to test a = 2 and 3;
  • if n < 9,080,191, it is enough to test a = 31 and 73;
  • if n < 4,759,123,141, it is enough to test a = 2, 7, and 61;
  • if n < 2,152,302,898,747, it is enough to test a = 2, 3, 5, 7, and 11;
  • if n < 3,474,749,660,383, it is enough to test a = 2, 3, 5, 7, 11, and 13;
  • if n < 341,550,071,728,321, it is enough to test a = 2, 3, 5, 7, 11, 13, and 17.

Edit 1: Corrected return call of if-else to append bigfactors to factors in primefactors.

Solution 3

Even on the current one, there are several spots to be noticed.

  1. Don't do checker*checker every loop, use s=ceil(sqrt(num)) and checher < s
  2. checher should plus 2 each time, ignore all even numbers except 2
  3. Use divmod instead of % and //

Solution 4

You should probably do some prime detection which you could look here, Fast algorithm for finding prime numbers?

You should read that entire blog though, there is a few algorithms that he lists for testing primality.

Solution 5

There's a python library with a collection of primality tests (including incorrect ones for what not to do). It's called pyprimes. Figured it's worth mentioning for posterity's purpose. I don't think it includes the algorithms you mentioned.

Share:
65,353
orlp
Author by

orlp

Computer Science PhD student at CWI Amsterdam. Cryptography, information theory, compression, computer graphics, low-level optimization, discrete mathematics, algorithms &amp; data structures - it all interests me! Favourite languages: Rust, Python, C++, C. Achievements: Gold C++ badge #238 Gold Python badge #236 Gold C badge #225 #1021 to reach 100,000 reputation Pattern-defeating quicksort

Updated on July 17, 2022

Comments

  • orlp
    orlp almost 2 years

    I am looking for an implementation or clear algorithm for getting the prime factors of N in either python, pseudocode or anything else well-readable. There are a few requirements/constraints:

    • N is between 1 and ~20 digits
    • No pre-calculated lookup table, memoization is fine though
    • Need not to be mathematically proven (e.g. could rely on the Goldbach conjecture if needed)
    • Need not to be precise, is allowed to be probabilistic/deterministic if needed

    I need a fast prime factorization algorithm, not only for itself, but for usage in many other algorithms like calculating the Euler phi(n).

    I have tried other algorithms from Wikipedia and such but either I couldn't understand them (ECM) or I couldn't create a working implementation from the algorithm (Pollard-Brent).

    I am really interested in the Pollard-Brent algorithm, so any more information/implementations on it would be really nice.

    Thanks!

    EDIT

    After messing around a little I have created a pretty fast prime/factorization module. It combines an optimized trial division algorithm, the Pollard-Brent algorithm, a miller-rabin primality test and the fastest primesieve I found on the internet. gcd is a regular Euclid's GCD implementation (binary Euclid's GCD is much slower then the regular one).

    Bounty

    Oh joy, a bounty can be acquired! But how can I win it?

    • Find an optimization or bug in my module.
    • Provide alternative/better algorithms/implementations.

    The answer which is the most complete/constructive gets the bounty.

    And finally the module itself:

    import random
    
    def primesbelow(N):
        # http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
        #""" Input N>=6, Returns a list of primes, 2 <= p < N """
        correction = N % 6 > 1
        N = {0:N, 1:N-1, 2:N+4, 3:N+3, 4:N+2, 5:N+1}[N%6]
        sieve = [True] * (N // 3)
        sieve[0] = False
        for i in range(int(N ** .5) // 3 + 1):
            if sieve[i]:
                k = (3 * i + 1) | 1
                sieve[k*k // 3::2*k] = [False] * ((N//6 - (k*k)//6 - 1)//k + 1)
                sieve[(k*k + 4*k - 2*k*(i%2)) // 3::2*k] = [False] * ((N // 6 - (k*k + 4*k - 2*k*(i%2))//6 - 1) // k + 1)
        return [2, 3] + [(3 * i + 1) | 1 for i in range(1, N//3 - correction) if sieve[i]]
    
    smallprimeset = set(primesbelow(100000))
    _smallprimeset = 100000
    def isprime(n, precision=7):
        # http://en.wikipedia.org/wiki/Miller-Rabin_primality_test#Algorithm_and_running_time
        if n < 1:
            raise ValueError("Out of bounds, first argument must be > 0")
        elif n <= 3:
            return n >= 2
        elif n % 2 == 0:
            return False
        elif n < _smallprimeset:
            return n in smallprimeset
    
    
        d = n - 1
        s = 0
        while d % 2 == 0:
            d //= 2
            s += 1
    
        for repeat in range(precision):
            a = random.randrange(2, n - 2)
            x = pow(a, d, n)
        
            if x == 1 or x == n - 1: continue
        
            for r in range(s - 1):
                x = pow(x, 2, n)
                if x == 1: return False
                if x == n - 1: break
            else: return False
    
        return True
    
    # https://comeoncodeon.wordpress.com/2010/09/18/pollard-rho-brent-integer-factorization/
    def pollard_brent(n):
        if n % 2 == 0: return 2
        if n % 3 == 0: return 3
    
        y, c, m = random.randint(1, n-1), random.randint(1, n-1), random.randint(1, n-1)
        g, r, q = 1, 1, 1
        while g == 1:
            x = y
            for i in range(r):
                y = (pow(y, 2, n) + c) % n
    
            k = 0
            while k < r and g==1:
                ys = y
                for i in range(min(m, r-k)):
                    y = (pow(y, 2, n) + c) % n
                    q = q * abs(x-y) % n
                g = gcd(q, n)
                k += m
            r *= 2
        if g == n:
            while True:
                ys = (pow(ys, 2, n) + c) % n
                g = gcd(abs(x - ys), n)
                if g > 1:
                    break
    
        return g
    
    smallprimes = primesbelow(1000) # might seem low, but 1000*1000 = 1000000, so this will fully factor every composite < 1000000
    def primefactors(n, sort=False):
        factors = []
    
        for checker in smallprimes:
            while n % checker == 0:
                factors.append(checker)
                n //= checker
            if checker > n: break
    
        if n < 2: return factors
    
        while n > 1:
            if isprime(n):
                factors.append(n)
                break
            factor = pollard_brent(n) # trial division did not fully factor, switch to pollard-brent
            factors.extend(primefactors(factor)) # recurse to factor the not necessarily prime factor returned by pollard-brent
            n //= factor
    
        if sort: factors.sort()
    
        return factors
    
    def factorization(n):
        factors = {}
        for p1 in primefactors(n):
            try:
                factors[p1] += 1
            except KeyError:
                factors[p1] = 1
        return factors
    
    totients = {}
    def totient(n):
        if n == 0: return 1
    
        try: return totients[n]
        except KeyError: pass
    
        tot = 1
        for p, exp in factorization(n).items():
            tot *= (p - 1)  *  p ** (exp - 1)
    
        totients[n] = tot
        return tot
    
    def gcd(a, b):
        if a == b: return a
        while b > 0: a, b = b, a % b
        return a
    
    def lcm(a, b):
        return abs((a // gcd(a, b)) * b)
    
  • orlp
    orlp over 13 years
    I need to do checker*checker because num decreases constantly. I'll implement the even numbers skip though. The divmod decreases the function a lot (it will calculate the // on every loop, instead of only when checker divides n).
  • John La Rooy
    John La Rooy over 13 years
    @night, you just need to recalcuate s whenever you alter num then
  • orlp
    orlp over 13 years
    True, figured that while messing around :) Seems to be faster to recalculate sqrt then checker*checker.
  • Kabie
    Kabie over 13 years
    @nightcracker: Let N=n*n+1, ceil(sqrt(N)) cost about 2~4 times than n*n, num does not change that frequently.
  • orlp
    orlp over 13 years
    Do you know of an ceil/floor sqrt algorithm, because int(num ** .5) + 1 seems to be overkill (first calculate in float precision and then chop off).
  • orlp
    orlp over 13 years
    Enjoy your +100 (you're the only one who answered since the bounty). Your bigfactors is horribly inefficient though, because factors.extend(bigfactors(factor)) recurses back to bigfactors which is just plain wrong (what if pollard-brent finds the factor 234892, you don't want to factorize that with pollard-brent again). If you change factors.extend(bigfactors(factor)) to factors.extend(primefactors(factor, sort)) then it's fine.
  • Rozuur
    Rozuur over 13 years
    One primefactors calls bigfactors then its clear that there is will no power of small prime in the next factors obtained by pollard-brent.
  • Rozuur
    Rozuur over 13 years
    If its inefficient I would not have answered this. Once call goes from primefactors to bigfactors there will be no factor in n which is lessthan 1000 hence pollard-brent cannot return a number whose factors will be lessthan 1000. If its not clear, reply such that i will edit my answer with more explanations
  • orlp
    orlp over 13 years
    Shit NVM, ofcourse. If N doesn't contain any factors found by smallprimes then factor F of N won't either >.<
  • Rozuur
    Rozuur over 13 years
    Also you should use smallprimeset instead of smallprime and remove smallprimeset from miller-rabin.
  • orlp
    orlp over 13 years
    @Rozuur, I didn't get your last comment. Do you mean I should use smallprimeset in primefactors? I did get the part of miller-rabin though (replace it with the Pomerance, Selfridge and Wagstaff and Jaeschke data).
  • Rozuur
    Rozuur over 13 years
    As the numbers are 10**20 you should use smallprimeset in primefactors and remove if statement which returns n in smallprimeset.
  • hynekcer
    hynekcer over 11 years
    Your primefactors is really inefficient because bigfactors is called for all numbers with small factors where the biggest factor greather than 2. This is because the last factor is skipped due to n > int(n ** .5) + 1. It can be easy fixed before calling bigfactorc by if n in smallprimes: ` factors.append(n)` else: ` ...bigfactors.... The second bug is that return any_list.extend(...)` returns None. It can be also easy fixed.
  • Rozuur
    Rozuur over 11 years
    Well I haven't changed the primefactors implementation much from the given code. Nyways it was a great find.
  • Quuxplusone
    Quuxplusone over 9 years
    Multiplication is faster than square-root, and Python has difficulty taking the square-root of bignums anyway. I get this error with the current version of the code: limit = int(n ** .5) + 1 ... OverflowError: long int too large to convert to float
  • programmer
    programmer about 5 years
    Thanks for sharing this, @Colonel Panic . This is exactly what I was looking for : an integer factorization module in a well-maintained library, rather than snippets of code.
  • user26742873
    user26742873 almost 3 years
    4. write it in C/Fortran and link it as a Python library, not in native Python. :-D
  • Daniel Donnelly
    Daniel Donnelly over 2 years
    Yes, sympy is simple!