Segmented Sieve of Eratosthenes?

algorithm primes sieve-of-eratosthenes prime-factoring factors

31,931

Solution 1

The basic idea of a segmented sieve is to choose the sieving primes less than the square root of n, choose a reasonably large segment size that nevertheless fits in memory, and then sieve each of the segments in turn, starting with the smallest. At the first segment, the smallest multiple of each sieving prime that is within the segment is calculated, then multiples of the sieving prime are marked as composite in the normal way; when all the sieving primes have been used, the remaining unmarked numbers in the segment are prime. Then, for the next segment, for each sieving prime you already know the first multiple in the current segment (it was the multiple that ended the sieving for that prime in the prior segment), so you sieve on each sieving prime, and so on until you are finished.

The size of n doesn't matter, except that a larger n will take longer to sieve than a smaller n; the size that matters is the size of the segment, which should be as large as convenient (say, the size of the primary memory cache on the machine).

You can see a simple implementation of a segmented sieve here. Note that a segmented sieve will be very much faster than O'Neill's priority-queue sieve mentioned in another answer; if you're interested, there's an implementation here.

EDIT: I wrote this for a different purpose, but I'll show it here because it might be useful:

Though the Sieve of Eratosthenes is very fast, it requires O(n) space. That can be reduced to O(sqrt(n)) for the sieving primes plus O(1) for the bitarray by performing the sieving in successive segments. At the first segment, the smallest multiple of each sieving prime that is within the segment is calculated, then multiples of the sieving prime are marked composite in the normal way; when all the sieving primes have been used, the remaining unmarked numbers in the segment are prime. Then, for the next segment, the smallest multiple of each sieving prime is the multiple that ended the sieving in the prior segment, and so the sieving continues until finished.

Consider the example of sieve from 100 to 200 in segments of 20. The five sieving primes are 3, 5, 7, 11 and 13. In the first segment from 100 to 120, the bitarray has ten slots, with slot 0 corresponding to 101, slot k corresponding to 100+2k+1, and slot 9 corresponding to 119. The smallest multiple of 3 in the segment is 105, corresponding to slot 2; slots 2+3=5 and 5+3=8 are also multiples of 3. The smallest multiple of 5 is 105 at slot 2, and slot 2+5=7 is also a multiple of 5. The smallest multiple of 7 is 105 at slot 2, and slot 2+7=9 is also a multiple of 7. And so on.

Function primesRange takes arguments lo, hi and delta; lo and hi must be even, with lo < hi, and lo must be greater than sqrt(hi). The segment size is twice delta. Ps is a linked list containing the sieving primes less than sqrt(hi), with 2 removed since even numbers are ignored. Qs is a linked list containing the offest into the sieve bitarray of the smallest multiple in the current segment of the corresponding sieving prime. After each segment, lo advances by twice delta, so the number corresponding to an index i of the sieve bitarray is lo + 2i + 1.

function primesRange(lo, hi, delta)
    function qInit(p)
        return (-1/2 * (lo + p + 1)) % p
    function qReset(p, q)
        return (q - delta) % p
    sieve := makeArray(0..delta-1)
    ps := tail(primes(sqrt(hi)))
    qs := map(qInit, ps)
    while lo < hi
        for i from 0 to delta-1
            sieve[i] := True
        for p,q in ps,qs
            for i from q to delta step p
                sieve[i] := False
        qs := map(qReset, ps, qs)
        for i,t from 0,lo+1 to delta-1,hi step 1,2
            if sieve[i]
                output t
        lo := lo + 2 * delta

When called as primesRange(100, 200, 10), the sieving primes ps are [3, 5, 7, 11, 13]; qs is initially [2, 2, 2, 10, 8] corresponding to smallest multiples 105, 105, 105, 121 and 117, and is reset for the second segment to [1, 2, 6, 0, 11] corresponding to smallest multiples 123, 125, 133, 121 and 143.

You can see this program in action at http://ideone.com/iHYr1f. And in addition to the links shown above, if you are interested in programming with prime numbers I modestly recommend this essay at my blog.

Solution 2

There's a version of the Sieve based on priority queues that yields as many primes as you request, rather than all of them up to an upper bound. It's discussed in the classic paper "The Genuine Sieve of Eratosthenes" and googling for "sieve of eratosthenes priority queue" turns up quite a few implementations in various programming languages.

Solution 3

It's just that we are making segmented with the sieve we have. The basic idea is let's say we have to find out prime numbers between 85 and 100. We have to apply the traditional sieve,but in the fashion as described below:

So we take the first prime number 2 , divide the starting number by 2(85/2) and taking round off to smaller number we get p=42,now multiply again by 2 we get p=84, from here onwards start adding 2 till the last number.So what we have done is that we have removed all the factors of 2(86,88,90,92,94,96,98,100) in the range.

We take the next prime number 3 , divide the starting number by 3(85/3) and taking round off to smaller number we get p=28,now multiply again by 3 we get p=84, from here onwards start adding 3 till the last number.So what we have done is that we have removed all the factors of 3(87,90,93,96,99) in the range.

Take the next prime number=5 and so on.................. Keep on doing the above steps.You can get the prime numbers (2,3,5,7,...) by using the traditional sieve upto sqrt(n).And then use it for segmented sieve.

Solution 4

If someone would like to see C++ implementation, here is mine:

void sito_delta( int delta, std::vector<int> &res)
{

std::unique_ptr<int[]> results(new int[delta+1]);
for(int i = 0; i <= delta; ++i)
    results[i] = 1;

int pierw = sqrt(delta);
for (int j = 2; j <= pierw; ++j)
{
    if(results[j])
    {
        for (int k = 2*j; k <= delta; k+=j)
        {
            results[k]=0;
        }
    }
}

for (int m = 2; m <= delta; ++m)
    if (results[m])
    {
        res.push_back(m);
        std::cout<<","<<m;
    }
};
void sito_segment(int n,std::vector<int> &fiPri)
{
int delta = sqrt(n);

if (delta>10)
{
    sito_segment(delta,fiPri);
   // COmpute using fiPri as primes
   // n=n,prime = fiPri;
      std::vector<int> prime=fiPri;
      int offset = delta;
      int low = offset;
      int high = offset * 2;
      while (low < n)
      {
          if (high >=n ) high = n;
          int mark[offset+1];
          for (int s=0;s<=offset;++s)
              mark[s]=1;

          for(int j = 0; j< prime.size(); ++j)
          {
            int lowMinimum = (low/prime[j]) * prime[j];
            if(lowMinimum < low)
                lowMinimum += prime[j];

            for(int k = lowMinimum; k<=high;k+=prime[j])
                mark[k-low]=0;
          }

          for(int i = low; i <= high; i++)
              if(mark[i-low])
              {
                fiPri.push_back(i);
                std::cout<<","<<i;
              }
          low=low+offset;
          high=high+offset;
      }
}
else
{

std::vector<int> prime;
sito_delta(delta, prime);
//
fiPri = prime;
//
int offset = delta;
int low = offset;
int high = offset * 2;
// Process segments one by one 
while (low < n)
{
    if (high >= n) high = n;
    int  mark[offset+1];
    for (int s = 0; s <= offset; ++s)
        mark[s] = 1;

    for (int j = 0; j < prime.size(); ++j)
    {
        // find the minimum number in [low..high] that is
        // multiple of prime[i] (divisible by prime[j])
        int lowMinimum = (low/prime[j]) * prime[j];
        if(lowMinimum < low)
            lowMinimum += prime[j];

        //Mark multiples of prime[i] in [low..high]
        for (int k = lowMinimum; k <= high; k+=prime[j])
            mark[k-low] = 0;
    }

    for (int i = low; i <= high; i++)
        if(mark[i-low])
        {
            fiPri.push_back(i);
            std::cout<<","<<i;
        }
    low = low + offset;
    high = high + offset;
}
}
};

int main()
{
std::vector<int> fiPri;
sito_segment(1013,fiPri);
}

Solution 5

Based on Swapnil Kumar answer I did the following algorithm in C. It was built with mingw32-make.exe.

#include<math.h>
#include<stdio.h>
#include<stdlib.h>

int main()
{
    const int MAX_PRIME_NUMBERS = 5000000;//The number of prime numbers we are looking for
    long long *prime_numbers = malloc(sizeof(long long) * MAX_PRIME_NUMBERS);
    prime_numbers[0] = 2;
    prime_numbers[1] = 3;
    prime_numbers[2] = 5;
    prime_numbers[3] = 7;
    prime_numbers[4] = 11;
    prime_numbers[5] = 13;
    prime_numbers[6] = 17;
    prime_numbers[7] = 19;
    prime_numbers[8] = 23;
    prime_numbers[9] = 29;
    const int BUFFER_POSSIBLE_PRIMES = 29 * 29;//Because the greatest prime number we have is 29 in the 10th position so I started with a block of 841 numbers
    int qt_calculated_primes = 10;//10 because we initialized the array with the ten first primes
    int possible_primes[BUFFER_POSSIBLE_PRIMES];//Will store the booleans to check valid primes
    long long iteration = 0;//Used as multiplier to the range of the buffer possible_primes
    int i;//Simple counter for loops
    while(qt_calculated_primes < MAX_PRIME_NUMBERS)
    {
        for (i = 0; i < BUFFER_POSSIBLE_PRIMES; i++)
            possible_primes[i] = 1;//set the number as prime

        int biggest_possible_prime = sqrt((iteration + 1) * BUFFER_POSSIBLE_PRIMES);

        int k = 0;

        long long prime = prime_numbers[k];//First prime to be used in the check

        while (prime <= biggest_possible_prime)//We don't need to check primes bigger than the square root
        {
            for (i = 0; i < BUFFER_POSSIBLE_PRIMES; i++)
                if ((iteration * BUFFER_POSSIBLE_PRIMES + i) % prime == 0)
                    possible_primes[i] = 0;

            if (++k == qt_calculated_primes)
                break;

            prime = prime_numbers[k];
        }
        for (i = 0; i < BUFFER_POSSIBLE_PRIMES; i++)
            if (possible_primes[i])
            {
                if ((qt_calculated_primes < MAX_PRIME_NUMBERS) && ((iteration * BUFFER_POSSIBLE_PRIMES + i) != 1))
                {
                    prime_numbers[qt_calculated_primes] = iteration * BUFFER_POSSIBLE_PRIMES + i;
                    printf("%d\n", prime_numbers[qt_calculated_primes]);
                    qt_calculated_primes++;
                } else if (!(qt_calculated_primes < MAX_PRIME_NUMBERS))
                    break;
            }

        iteration++;
    }

    return 0;
}

It set a maximum of prime numbers to be found, then an array is initialized with known prime numbers like 2, 3, 5...29. So we make a buffer that will store the segments of possible primes, this buffer can't be greater than the power of the greatest initial prime that in this case is 29.

I'm sure there are a plenty of optimizations that can be done to improve the performance like parallelize the segments analysis process and skip numbers that are multiple of 2, 3 and 5 but it serves as an example of low memory consumption.

View more solutions

31,931

Author by

John Smith

Updated on January 18, 2021

Comments

John Smith over 3 years
It's easy enough to make a simple sieve:
```
for (int i=2; i<=N; i++){
    if (sieve[i]==0){
        cout << i << " is prime" << endl;
        for (int j = i; j<=N; j+=i){
            sieve[j]=1;
        }
    }
    cout << i << " has " << sieve[i] << " distinct prime factors\n";
}
```
But what about when N is very large and I can't hold that kind of array in memory? I've looked up segmented sieve approaches and they seem to involve finding primes up until sqrt(N) but I don't understand how it works. What if N is very large (say 10^18)?
John Smith about 12 years

I've come across the implementations but the problem is that I don't understand them. Those papers are always quite dense. I'm mainly looking for examples because I think those are easiest to work with and understand. Technically I am using the sieve to acquire # of unique prime factors per value k for large N.
John Smith about 12 years

I've seen lots of implementations already -- again I don't understand them fully because it's all too abstract for me. I am looking for examples.
user448810 about 12 years

Did you look? The implementation I pointed to includes a pretty good explanation.
John Smith about 12 years

Yes I have been to that site already but I start to glaze over when they start throwing variables around willy-nilly. I am just after something purely numerical so I can gain the intuition
user448810 about 12 years

You asked for examples. The referenced site shows precisely how to sieve the range 100 to 200 in five segments, including how to choose the sieving primes and how to reset the sieving primes for each segment. Did you work out the example for yourself, by hand? What is it that you still don't understand?
John Smith about 12 years

When they start getting into this P Q B stuff and I can't keep it all straight, let alone do it by hand. I don't know why they choose the values they do, what exactly they're doing, etc.
user448810 about 12 years

Looking at the example. The sieving primes less than the square root of 200 are 3, 5, 7, 11 and 13. Let's consider the first segment, which has the ten values {101 103 105 107 109 111 113 115 117 119}. The smallest multiple of 3 in the segment is 105, so strike 105 and each third number after: 111, 117. The smallest multiple of 5 in the segment is 105, so strike 105 and the fifth number after: 115. The smallest multiple of 7 in the segment is 105, so strike 105 and the seventh number after: 119. There is no multiple of 11 in the segment, so there is nothing to do. The smallest multiple of 13
user448810 about 12 years

in the segment is 117, so strike it. The numbers that are left {101 103 107 109 113} are prime. For the second segment {121 123 125 127 129 131 133 135 137 139} the smallest multiples of each prime are 123, 125, 133 (beyond the segment), 121 and 143 (beyond the segment), which can all be calculated by counting the next multiple of the sieving prime after the end of the first segment. Does that help?
John Smith about 12 years

So basically it uses the primes less than the square root of the upperbound to strike out entries within each segment? Does this work if I am after the prime factorizations too? (say I change the sieve[j]=1 in my code to sieve[j]+=1)
NealB about 12 years

+1 for an excellent description of segmented sieves and the programmingpraxis link.
user448810 about 12 years

Yes, it uses the primes less than the square root of the upper bound to strike out composites. To compute prime factorizations, store the first sieving prime that strikes out each number, which is the least prime factor of the number, then divide by the least prime factor to get the cofactor, and look up the cofactor in your table of least prime factors. And so on, recursively. But that won't get you to 10^18, as the table of least prime factors will be much too large.
user448810 about 12 years

NeilB: Thank you. By the way, I write Programming Praxis.
John Smith about 12 years

Is there a way to get to 10^18 with sieving? Does my initial code made sense? I edited it to provide some output. EDIT: Whoa I didn't know that was your site! Awesome
user448810 about 12 years

Yes. But you will need lots of memory (for all the sieving primes less than 10^9) and will have to store the output on disk. Daniel Fischer, who hangs out here on Stack Overflow, has done it. Tomas Oliveira e Silva has done it, and has a web site about it. But computing large amounts of primes is almost certainly the wrong way to do what you want -- it's the wrong way to do just about anything. Why don't you tell us what you are really trying to do, and maybe we can suggest a better method.
user448810 about 12 years

I assume you mean omega(k) for 2<=k<=N; that is, you want the result for all k in the range, not just for some of the k in the range. In that case, a sieve is fine: just run through the sieving primes to the square root of n, adding 1 at each location in the underlying array. It works fine if the sieve is segmented; for instance, in the example we found 3 sieving primes that divided 105. Seems like an odd hobby.
user448810 about 12 years

Of course, if you just want omega(k) for a single value of k, you should just factor k and strip duplicate factors.
John Smith about 12 years

Is the way I did it in my original post the right way to do it? Underlying array for 2<=k<=square root of N. And then I figure I can use this for the rest of N?
user448810 about 12 years

I assume your sieve is elsewhere. The counting you gave should work. Try it for N = 1000. Test it by writing a function that factors k and counts the non-duplicate factors, then compare the two. Let me know how you get on (email on my profile).
John Smith about 12 years

What is a good rule of thumb for selecting the size of the segmenting block?
user448810 about 12 years

As big as possible. But it should fit in memory, preferably in your smallest cache. Remember that you need bits, not bytes.
John Smith about 12 years

I think I hit my first stumbling block, here. What about prime factors higher than sqrt(N) that are factors of k higher than sqrt(N)? Those would get skipped over.
user448810 about 12 years

There are no factors of k < N greater than the square root of N. However, there are primes between the square root of N and N, which you need to add to your count.
John Smith about 12 years

Well what I mean is consider N=100 case. Primes in our initial sieve would be 2,3,5,7. Say we take B=10. When we get to the segment 40-50, 46 will have 23 (a prime) as one of its factors even though it's not part of our initial prime list.
user448810 about 12 years

I erred. When sieving primes, you can stop at the least prime factor. When counting factors, you need to consider all the prime factors, not just the least prime factor. That means you need to sieve to n/2 instead of sqrt(n). Which will make it slower to iterate to 10^18. Code is available at http://ideone.com/x1Sxw, but it uses a normal sieve, not a segmented sieve.
tomi.lee.jones over 10 years

How to improve this algorithm to make it work for small lo? For example testing it in range 4 to 50 lists primes greater than 11, and in range 4 to 10 lists also 11. Moreover, passing an odd integer as lo, makes this algorithm print random values.
user448810 over 10 years

The text gives various restrictions about lo and hi, which your examples violate. For ranges that small, it is better to use a standard Sieve of Eratosthenes and discard any primes you don't want.
sanchit.h over 10 years

What is the logic behind the Q = (-1/2 * (L + Pk + 1)) % Pk step ? How does it calculate the first number (L + Q) greater than L which is divisble by Pk ?
user448810 over 10 years

It's not the first number greater than L divisible by P, but the first offset into the sieve. Consider a window of length P sliding along the number line; at some point one endpoint will be greater than L while the other is less than or equal to L. The distance from the left endpoint to L is L%P, and the remaining -L%P above L is the offset where we want to start sieving. Add 1 and divide by 2 because our sieve has only odd numbers, and rearrange terms to move the mod P operation outside the parentheses.
sanchit.h over 10 years

Thanks! Makes sense now! Also, the sieve currently skips even numbers, how can we modify it to include a bigger wheel of 2,3,5 etc..
user448810 over 10 years

Since the only even prime is 2, you can ignore all even numbers in your sieve and handle 2 specially.
sanchit.h over 10 years

I meant to ask how we can ignore multiples of 2,3 and 5 the same way we are ignoring multiples of 2 here. Then we only need to examine every 8 numbers for every 30 numbers instead of 15(when we ignore multiples of 2).
user448810 over 10 years

I posted a simple wheel sieve at http://codepad.org/i22jgU9W. Don't expect any speed improvement; the bookkeeping needed to count the gaps costs more than the savings of reduced sieving. Also search for Pritchard's wheel sieve at my blog. I'll leave you to segment it yourself. Between this question and your previous one, you probably owe me some bounty.
sanchit.h over 10 years

Thanks, I'll try segmenting my sieve again, I'm actually stuck implementing a sieve with wheel factorization and segmentation both. In your sieve, I see that you compressed the usual {6, 4, 2, 4, 2, 4, 6, 2} into a 'halves' array to decrease space but its speed can be improved by marking (p*x-7)/2 [x=p] as composite and incrementing x by whlptrn[m] instead of skipping in multiples of p,2p,3p(whose logic I didn't get). My sieve which uses this computes pi(10^8) in less than a second, I was hoping to improve it to pi(10^9) in less than 1s by also including segmentation.
sanchit.h over 10 years

I'm not aware of bounties here, do I need to start a new question for that ?
GordonBGood about 10 years

I wouldn't recommend sieving to 10^18 for the whole range to complete anytime soon on current desktop computers: Storage is not a problem as one can just keep the base primes bit array, which when compressed using wheel factorization are only about 28 Megabytes; the problem is execution time - even if it took 3 CPU clock cycles per composite cull operation, about 4 by 10 ^ 17 culls will take years to compute just the count of primes let alone using those primes to do something such as brute force factoring. Although one only needs the primes to a billion to factor up to 10 ^ 18...
GordonBGood about 10 years

An incremental sieve as used by Melissa O'Neill in the linked paper is quite slow as compared to an array-based sieve, and worse, has asymptotic computational complexity that grows by considerable faster than linearly with range, so may not be suitable for this problem. As to the "no upper bound necessary" qualification, a page segmented sieve also doesn't have to have a specified upper bound if the base primes less than the square root of the current range) are implemented as a "expandable array" or as a form of list.
GordonBGood about 10 years

cont'd: the above job in interpreted Python is even slower if one sieves the whole range to 10^18 as it there will be 100's of clock cycles per composite number cull rather than just three as in my above estimate made based on a native code compiler implementation. However, I see no need to sieve up to an N of 10^18 if one just wants the prime factors to that limit, although finding the number of factors by brute force division for each number in that range is still a big job even given the 50 million primes to a billion - a million trillion divisions at 10's of CPU clocks per...
Adam Stelmaszczyk almost 7 years

@sanchit.h No, you can just hit "start a bounty" button under the question and then assign that bounty to any answer you like. More on bounties here. Before asking a question it's good to Google it first (as it saves everybody's time), this link comes on the top of results.
Albert van der Horst over 5 years

GordonBGood: Now you have all those primes under 10^17, what are you going to do with them? Realistically, most mathematician use those tables to count primes and relate it to e.g. the Riemann zeta function. If you merely count primes, there are other solutions. I've collected some in github.com/albertvanderhorst/primecounting
GordonBGood almost 4 years

@AlbertvanderHorst, Sorry I missed your comment reply. Your point questioning the usefulness of making all the primes over some huge range available is also my point (as well as the near impossibly to actually accomplish the task using commonly available computers or even supercomputers using sieving techniques). As to just counting primes, other than solutions listed in your GitHub repo, the most useful for huge ranges is using numerical analysis techniques as done in Kim Walisch's primecount and can be extended to summing primes as well.
mtraceur about 3 years

@gordonbgood it is not obviously correct to me that the iterators-and-priority-queue-based sieve "is quite slow as compared to an array-based sieve". The runtime is, near as I can tell: O(the sum from i=2 to n of log(π(i)-1) ω(i)) (where π is number of primes less than or equal to a given number, and ω is the number of unique prime factors of a given number). A comparably naive implementation of an array-based sieve is O(the sum from i=2 to n of (n/i if i is prime, or 1 if i is not prime)).
mtraceur about 3 years

@gordonbgood Basically, I don't have the math chops to quickly think it through, but currently I do think that you're right that the former is slower than the latter, and that the former has worse asymptomatic growth than the latter. But that's not a very trivial analysis, my initial intuition was to doubt your comment. I had to make the complexity of each iteration explicit like this for me to see that you seem to be generally right (subjective fuzzy strengthening words like "quite" aside).
mtraceur about 3 years

@gordonbgood But upon further analysis it gets even less clear. Let's look at those terms in the sum: n/i in array-based vs log(π(i)-1) ω(i). The former trends from n divided by a small constant, towards one. The latter trends from one, towards log(π(n)-1) ω(n). That tempts the intuition into "the former shrinks, the latter grows, so clearly the former is faster and the latter is slower". But I found it useful to notice that if we take all the terms of those sums for a given n, and sort them from smallest to largest, both start at 1 and climb to n/2 and log(π(n)-1) ω(n), respectively.
mtraceur about 3 years

So then the first realization is that n/2 is obviously much bigger than log(π(n)-1) ω(n) for large n. We can do obvious optimizations like ignoring all even numbers to turn that two into three, and I think we can generalize to wheel factorization to make that number bigger. But even if we treat wheel factorization as free, it's still only going to make it n/C for some constant C, and no matter how big we make C, log and ω will produce smaller results than that for big enough n. So if it is faster, it is because we only pay the n/i for i that are prime, but we pay log(π(i)-1) ω(i) for all i.
mtraceur about 3 years

@gordonbgood And again, to be clear, I'm not saying you're wrong, I'm just saying that I cannot prove your assertion right and that from the perspective of just my own limited capabilities to analyze this problem space, I would not be comfortable with confidently accepting the conclusion that it is necessarily slower.
GordonBGood about 3 years

@mtaceur, I'll try to "boil it down for you" on two respects: 1) The "O" asymptotic computation complexity with which you seem to be mostly concerned, and 2) the number of CPU cycles = time it takes to perform each operation. As to the first, it is well known that the Sieve of Eratosthenes has O(n log log n) operations where n is the sieving range for very large ranges as per the Wikipedia article, and from the more exact derivations in that article that there is a large constant offset that makes practical ranges even less.
GordonBGood about 3 years

@mtraceur, Also, it is well known that the access time for a Heap based Priority Queue is O(log n) from its binary tree based structure where n is the number if items in the queue. Now, in Melisa O'Neill's implementation, the number of elements in the queue is the number of base primes up to the square root of the current range (from later improved versions) and therefore covered by the Wikipedia article, so it has added a "log n" factor to the complexity of the basic SoE; O'Neill confirms this in her article. As you say, levels of wheel optimization only add constant factor improvements.
GordonBGood about 3 years

@mtraceur, So, there are many times and increasing with range number of operations in these "functional" "incremental" sieves. As well, they can't be multi-threaded as each step depends on the previous, so can't take advantage of modern multi-core CPU's. 2) Finally and most seriously, their time per operation is slooow, at hundreds of CPU clock cycles per composite number elimination, whereas an array-based implementation can take as little as one and average just a few clock cycles per cull such as Kim Walisch's primesieve.
GordonBGood about 3 years

@mtraceur, Finally, as to a perhaps more understandable series of tutorial answers, please see my answer to a Page-Segmented SoE in JavaScript and the answers leading up to that, which uses array culling to sieve to billions in only five or six CPU clock cycles per culling operation (run on the Google Chrome browser). The snippet will even run on a smartphone by viewing it as the desktop site. For both reasons of reduced number of operations and reduced cost per operation, no functional sieve can ever come close to this in a given language.
mtraceur about 3 years

@gordonbgood I agree with you regarding the huge difference in numbers of actual CPU instructions per number eliminated, but as for the algorithmic complexity... you just treated two different n from two different contexts as if they were the same n. When n is the number up to which we are sieving, that π is crucial to account for "the number of primes less than or equal to n" which actually goes in the heap. We can't just drop it (unless you want to make the argument that the prime counting function is approximately linear or is effectively overpowered by other factors).
mtraceur about 3 years

@gordonbgood I also get the impression from your remarks that you... either didn't realize, or didn't find it worthwhile to acknowledge, that the O(n log log n) which is well known for the array-based sieve is actually the asymptomatic approximation of the more precise sum which I gave in my comments. I mean, maybe I'm wrong, but I've got a book written by people much smarter than me saying that the runtime is O(n/3 + n/5 + n/7 + n/11 + ...) and that through some fancy math this can be shown to tend towards O(n log log n). I merely expanded the sum back out to make the comparison easier.
GordonBGood about 3 years

@mtraceur, if this goes on, maybe we better move it to a chat. However, the second is correct and I summarized by referring to the full derivation in the Wikipedia article. The SoE complexity has nothing to do with why implementations like O'Neill's are slower, and even if one eliminates the extra log n factor by using a hash table with amortized O(1) access time, the slowness is due to the 100's or 1000's more CPU clock cycles required to implement them as compared to raw bit culling in an array. Sorry if my previous comments gave any other impression.