Efficient Prime Factorization for large numbers

c++ multithreading algorithm primes prime-factoring

13,047

Solution 1

Trial division is only suitable for factoring small numbers. For n up to 2^64, you'll need a better algorithm: I recommend starting with wheel factorization to get the small factors, followed by Pollard's rho algorithm to get the rest. Where trial division is O(sqrt(n)), rho is O(sqrt(sqrt(n))), so it's much faster. For 2^64, sqrt(n) = 2^32, but sqrt(sqrt(n)) = 2^16, which is a huge improvement. You should expect to factor your numbers in a few milliseconds, at most.

I don't have C++ code for factoring, but I do have readable Python code. Let me know if you want me to post it. If you want to know more about wheel factorization and the rho algorithm, I have lots of prime number stuff at my blog.

Solution 2

for(int i  = 2; i * i <= n; ++i) //no sqrt, please
{
    while(n%i == 0) //while, not if
    {
         factors.push_back(i);
         n/=i;
    }
}
if(n != 1)
{
    factors.push_back(n);
}

This is basically a neater implementation of your algorithm. Its complexity is sqrt of N. It will work pretty quickly even for a 18-digit number, but only if the prime factors are all small. If it's a product of two large prime numbers, or worse, is prime itself, this will run for approximately 10 seconds.

Solution 3

A simple speedup of two can easily be achieved by changing your loop:

if (n % 2) {
    return vec.push_back(i), getPrimes(n / i, vec);
}

for (int i = 3; i <= sqrt_of_n; i += 2)
{
    if (n % i == 0) 
    {
        return vec.push_back(i), getPrimes(n / i, vec); //cause recursion
    }
}

You first should test the number by two. Then, starting from 3 you test again incrementing your loop by two at a time. You already know thay 4, 6, 8, ... are even numbers and have 2 as a factor. Testing against even numbers you're reducing your complexity by half.

To factor a number N you only need the prime numbers <= sqrt(N). For a 18 digit number you only need to test against all primes less than 1e9, and since there are 98 millon primes less than 2e9 you can easily store 100 millon numbers on today's computers and run the factoring in parallel. If each number takes 8 bytes of RAM (int64_t), 100 millon primes would take 800 MB of memory. This algorithm is the classic solution to SPOJ problem #2, Prime Generator.

The best way to list all the small primes that can fit on a 32-bit int is to build a Sieve of Eratostenes. I told you that we need the primes less than sqrt(N) to factor any N, so to factor 64 bit integers you need all the primes that fit as a 32-bit number.

13,047

Author by

booky99

My name is Alexander "Alex" Miller. I am a Computer Science Major at DePauw University (class of 2016). While receiving technical knowledge, I am exploring Mobile Development outlets to better assist myself and others. I find stackoverflow an enjoyable place to collaborate and help fellow programmers in there coding issues.

Updated on June 04, 2022

Comments

booky99 almost 2 years

I've been working on a little problem where I need to compute 18-digit numbers into their respective prime factorization. Everything compiles and it runs just fine, considering that it actually works, but I am looking to reduce the run time of the prime factorization. I have implemented recursion and threading but I think I might need some help in understanding possible algorithms for large number computation.

Every time I run this on the 4 numbers I have pre-made, it takes about 10 seconds. I would like to reduce this to possibly 0.06 seconds if there are any ideas out there.

I noticed a few algorithms like Sieve of Eratosthenes and producing a list of all the prime numbers prior to computing. I'm just wondering if someone could elaborate on it. For instance, I'm having issues understanding how to implement Sieve of Eratosthenes into my program or if it would even be a good idea. Any and all pointers on how to approach this better would be really helpful!

Here is my code:

#include <iostream>
#include <thread>
#include <vector>
#include <chrono>

using namespace std;
using namespace std::chrono;

vector<thread> threads;
vector<long long> inputVector;
bool developer = false; 
vector<unsigned long long> factor_base;
vector<long long> primeVector;

class PrimeNumber
{
    long long initValue;        // the number being prime factored
    vector<long long> factors;  // all of the factor values
public:
    void setInitValue(long long n)
    {
        initValue = n;
    }
    void addToVector(long long m)
    {
        factors.push_back(m);
    }
    void setVector(vector<long long> m)
    {
        factors = m;
    }
    long long getInitValue()
    {
        return initValue;
    }
    vector<long long> getVector()
    {
        return factors;
    }
};

vector<PrimeNumber> primes;

// find primes recursively and have them returned in vectors
vector<long long> getPrimes(long long n, vector<long long> vec)
{
    double sqrt_of_n = sqrt(n);

    for (int i = 2; i <= sqrt_of_n; i++)
    {
        if (n % i == 0) 
        {
            return vec.push_back(i), getPrimes(n / i, vec); //cause recursion
        }
    }

    // pick up the last prime factorization number
    vec.push_back(n);

    //return the finished vector
    return vec;
}

void getUserInput()
{
    long long input = -1;
    cout << "Enter all of the numbers to find their prime factors. Enter 0 to compute" << endl;
    do
    {
        cin >> input;
        if (input == 0)
        {
            break;
        }
        inputVector.push_back(input);
    } while (input != 0);
}

int main() 
{

    vector<long long> temp1;   // empty vector
    vector<long long> result1; // temp vector

    if (developer == false)
    {
        getUserInput();
    }
    else
    {
        cout << "developer mode active" << endl;
        long long a1 = 771895004973090566;
        long long b1 = 788380500764597944;
        long long a2 = 100020000004324000;
        long long b2 = 200023423420000000;
        inputVector.push_back(a1);
        inputVector.push_back(b2);
        inputVector.push_back(b1);
        inputVector.push_back(a2);
    }

    high_resolution_clock::time_point time1 = high_resolution_clock::now();

    // give each thread a number to comput within the recursive function
    for (int i = 0; i < inputVector.size(); i++)
    {   
        PrimeNumber prime;
        prime.setInitValue(inputVector.at(i));
        threads.push_back(thread([&]{
            prime.setVector(result1 = getPrimes(inputVector.at(i), temp1));
            primes.push_back(prime);
        }));
    }

    // allow all of the threads to join back together.
    for (auto& th : threads)
    {
        cout << th.get_id() << endl;
        th.join();
    }

    high_resolution_clock::time_point time2 = high_resolution_clock::now();

    // print all of the information
    for (int i = 0; i < primes.size(); i++)
    {
        vector<long long> temp = primes.at(i).getVector();

        for (int m = 0; m < temp.size(); m++)
        {
            cout << temp.at(m) << " ";
        }
        cout << endl;
    }

    cout << endl;

    // so the running time
    auto duration = duration_cast<microseconds>(time2 - time1).count();

    cout << "Duration: " << (duration / 1000000.0) << endl;

    return 0;
}