nᵗʰ ugly number

29,842

Solution 1

A simple fast solution in Java. Uses approach described by Anon..
Here TreeSet is just a container capable of returning smallest element in it. (No duplicates stored.)

    int n = 20;
    SortedSet<Long> next = new TreeSet<Long>();
    next.add((long) 1);

    long cur = 0;
    for (int i = 0; i < n; ++i) {
        cur = next.first();
        System.out.println("number " + (i + 1) + ":   " + cur);

        next.add(cur * 2);
        next.add(cur * 3);
        next.add(cur * 5);
        next.remove(cur);
    }

Since 1000th ugly number is 51200000, storing them in bool[] isn't really an option.

edit
As a recreation from work (debugging stupid Hibernate), here's completely linear solution. Thanks to marcog for idea!

    int n = 1000;

    int last2 = 0;
    int last3 = 0;
    int last5 = 0;

    long[] result = new long[n];
    result[0] = 1;
    for (int i = 1; i < n; ++i) {
        long prev = result[i - 1];

        while (result[last2] * 2 <= prev) {
            ++last2;
        }
        while (result[last3] * 3 <= prev) {
            ++last3;
        }
        while (result[last5] * 5 <= prev) {
            ++last5;
        }

        long candidate1 = result[last2] * 2;
        long candidate2 = result[last3] * 3;
        long candidate3 = result[last5] * 5;

        result[i] = Math.min(candidate1, Math.min(candidate2, candidate3));
    }

    System.out.println(result[n - 1]);

The idea is that to calculate a[i], we can use a[j]*2 for some j < i. But we also need to make sure that 1) a[j]*2 > a[i - 1] and 2) j is smallest possible.
Then, a[i] = min(a[j]*2, a[k]*3, a[t]*5).

Solution 2

I am working on finding nth ugly number. Note that these numbers are extremely sparsely distributed as n gets large.

I wrote a trivial program that computes if a given number is ugly or not.

This looks like the wrong approach for the problem you're trying to solve - it's a bit of a shlemiel algorithm.

Are you familiar with the Sieve of Eratosthenes algorithm for finding primes? Something similar (exploiting the knowledge that every ugly number is 2, 3 or 5 times another ugly number) would probably work better for solving this.

With the comparison to the Sieve I don't mean "keep an array of bools and eliminate possibilities as you go up". I am more referring to the general method of generating solutions based on previous results. Where the Sieve gets a number and then removes all multiples of it from the candidate set, a good algorithm for this problem would start with an empty set and then add the correct multiples of each ugly number to that.

Solution 3

My answer refers to the correct answer given by Nikita Rybak. So that one could see a transition from the idea of the first approach to that of the second.

from collections import deque
def hamming():
    h=1;next2,next3,next5=deque([]),deque([]),deque([])
    while True:
        yield h
        next2.append(2*h)
        next3.append(3*h)
        next5.append(5*h)
        h=min(next2[0],next3[0],next5[0])
        if h == next2[0]: next2.popleft()
        if h == next3[0]: next3.popleft()
        if h == next5[0]: next5.popleft()

What's changed from Nikita Rybak's 1st approach is that, instead of adding next candidates into single data structure, i.e. Tree set, one can add each of them separately into 3 FIFO lists. This way, each list will be kept sorted all the time, and the next least candidate must always be at the head of one ore more of these lists.

If we eliminate the use of the three lists above, we arrive at the second implementation in Nikita Rybak' answer. This is done by evaluating those candidates (to be contained in three lists) only when needed, so that there is no need to store them.

Simply put:

In the first approach, we put every new candidate into single data structure, and that's bad because too many things get mixed up unwisely. This poor strategy inevitably entails O(log(tree size)) time complexity every time we make a query to the structure. By putting them into separate queues, however, you will see that each query takes only O(1) and that's why the overall performance reduces to O(n)!!! This is because each of the three lists is already sorted, by itself.

Solution 4

I believe you can solve this problem in sub-linear time, probably O(n^{2/3}).

To give you the idea, if you simplify the problem to allow factors of just 2 and 3, you can achieve O(n^{1/2}) time starting by searching for the smallest power of two that is at least as large as the nth ugly number, and then generating a list of O(n^{1/2}) candidates. This code should give you an idea how to do it. It relies on the fact that the nth number containing only powers of 2 and 3 has a prime factorization whose sum of exponents is O(n^{1/2}).

def foo(n):
  p2 = 1  # current power of 2
  p3 = 1  # current power of 3
  e3 = 0  # exponent of current power of 3
  t = 1   # number less than or equal to the current power of 2
  while t < n:
    p2 *= 2
    if p3 * 3 < p2:
      p3 *= 3
      e3 += 1
    t += 1 + e3
  candidates = [p2]
  c = p2
  for i in range(e3):
    c /= 2
    c *= 3
    if c > p2: c /= 2
    candidates.append(c)
  return sorted(candidates)[n - (t - len(candidates))]

The same idea should work for three allowed factors, but the code gets more complex. The sum of the powers of the factorization drops to O(n^{1/3}), but you need to consider more candidates, O(n^{2/3}) to be more precise.

Solution 5

A lot of good answers here, but I was having trouble understanding those, specifically how any of these answers, including the accepted one, maintained the axiom 2 in Dijkstra's original paper:

Axiom 2. If x is in the sequence, so is 2 * x, 3 * x, and 5 * x.

After some whiteboarding, it became clear that the axiom 2 is not an invariant at each iteration of the algorithm, but actually the goal of the algorithm itself. At each iteration, we try to restore the condition in axiom 2. If last is the last value in the result sequence S, axiom 2 can simply be rephrased as:

For some x in S, the next value in S is the minimum of 2x, 3x, and 5x, that is greater than last. Let's call this axiom 2'.

Thus, if we can find x, we can compute the minimum of 2x, 3x, and 5x in constant time, and add it to S.

But how do we find x? One approach is, we don't; instead, whenever we add a new element e to S, we compute 2e, 3e, and 5e, and add them to a minimum priority queue. Since this operations guarantees e is in S, simply extracting the top element of the PQ satisfies axiom 2'.

This approach works, but the problem is that we generate a bunch of numbers we may not end up using. See this answer for an example; if the user wants the 5th element in S (5), the PQ at that moment holds 6 6 8 9 10 10 12 15 15 20 25. Can we not waste this space?

Turns out, we can do better. Instead of storing all these numbers, we simply maintain three counters for each of the multiples, namely, 2i, 3j, and 5k. These are candidates for the next number in S. When we pick one of them, we increment only the corresponding counter, and not the other two. By doing so, we are not eagerly generating all the multiples, thus solving the space problem with the first approach.

Let's see a dry run for n = 8, i.e. the number 9. We start with 1, as stated by axiom 1 in Dijkstra's paper.

+---------+---+---+---+----+----+----+-------------------+
| #       | i | j | k | 2i | 3j | 5k | S                 |
+---------+---+---+---+----+----+----+-------------------+
| initial | 1 | 1 | 1 | 2  | 3  | 5  | {1}               |
+---------+---+---+---+----+----+----+-------------------+
| 1       | 1 | 1 | 1 | 2  | 3  | 5  | {1,2}             |
+---------+---+---+---+----+----+----+-------------------+
| 2       | 2 | 1 | 1 | 4  | 3  | 5  | {1,2,3}           |
+---------+---+---+---+----+----+----+-------------------+
| 3       | 2 | 2 | 1 | 4  | 6  | 5  | {1,2,3,4}         |
+---------+---+---+---+----+----+----+-------------------+
| 4       | 3 | 2 | 1 | 6  | 6  | 5  | {1,2,3,4,5}       |
+---------+---+---+---+----+----+----+-------------------+
| 5       | 3 | 2 | 2 | 6  | 6  | 10 | {1,2,3,4,5,6}     |
+---------+---+---+---+----+----+----+-------------------+
| 6       | 4 | 2 | 2 | 8  | 6  | 10 | {1,2,3,4,5,6}     |
+---------+---+---+---+----+----+----+-------------------+
| 7       | 4 | 3 | 2 | 8  | 9  | 10 | {1,2,3,4,5,6,8}   |
+---------+---+---+---+----+----+----+-------------------+
| 8       | 5 | 3 | 2 | 10 | 9  | 10 | {1,2,3,4,5,6,8,9} |
+---------+---+---+---+----+----+----+-------------------+

Notice that S didn't grow at iteration 6, because the minimum candidate 6 had already been added previously. To avoid this problem of having to remember all of the previous elements, we amend our algorithm to increment all the counters whenever the corresponding multiples are equal to the minimum candidate. That brings us to the following Scala implementation.

def hamming(n: Int): Seq[BigInt] = {
  @tailrec
  def next(x: Int, factor: Int, xs: IndexedSeq[BigInt]): Int = {
    val leq = factor * xs(x) <= xs.last
    if (leq) next(x + 1, factor, xs)
    else x
  }

  @tailrec
  def loop(i: Int, j: Int, k: Int, xs: IndexedSeq[BigInt]): IndexedSeq[BigInt] = {
    if (xs.size < n) {
      val a = next(i, 2, xs)
      val b = next(j, 3, xs)
      val c = next(k, 5, xs)
      val m = Seq(2 * xs(a), 3 * xs(b), 5 * xs(c)).min

      val x = a + (if (2 * xs(a) == m) 1 else 0)
      val y = b + (if (3 * xs(b) == m) 1 else 0)
      val z = c + (if (5 * xs(c) == m) 1 else 0)

      loop(x, y, z, xs :+ m)
    } else xs
  }

  loop(0, 0, 0, IndexedSeq(BigInt(1)))
}
Share:
29,842
Anil Katti
Author by

Anil Katti

Updated on July 09, 2022

Comments

  • Anil Katti
    Anil Katti almost 2 years

    Numbers whose only prime factors are 2, 3, or 5 are called ugly numbers.

    Example:

    1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, ... 
    

    1 can be considered as 2^0.

    I am working on finding nth ugly number. Note that these numbers are extremely sparsely distributed as n gets large.

    I wrote a trivial program that computes if a given number is ugly or not. For n > 500 - it became super slow. I tried using memoization - observation: ugly_number * 2, ugly_number * 3, ugly_number * 5 are all ugly. Even with that it is slow. I tried using some properties of log - since that will reduce this problem from multiplication to addition - but, not much luck yet. Thought of sharing this with you all. Any interesting ideas?

    Using a concept similar to Sieve of Eratosthenes (thanks Anon)

        for (int i(2), uglyCount(0); ; i++) {
            if (i % 2 == 0)
                continue;
            if (i % 3 == 0)
                continue;
            if (i % 5 == 0)
                continue;
            uglyCount++;
            if (uglyCount == n - 1)
                break;
        }
    

    i is the nth ugly number.

    Even this is pretty slow. I am trying to find the 1500th ugly number.

    • SLaks
      SLaks over 13 years
      Why are these numbers called ugly numbers?
    • SLaks
      SLaks over 13 years
      Are you trying to find the nth ugly number or trying to determine whether a number is ugly?
    • ruslik
      ruslik over 13 years
      In problems with integer arithmetics, avoid using floating point.
    • Ben Jackson
      Ben Jackson over 13 years
      I bet the people on math would be better suited to answering this.
    • Khaled Alshaya
      Khaled Alshaya over 13 years
      +1 Interesting question :) These are called Hamming Numbers: en.wikipedia.org/wiki/Regular_number#Algorithms
    • President James K. Polk
      President James K. Polk over 13 years
      I think the problem is equivalent to iterating over the exponents (x1, x2, x3) in 2x1 * 3x2 * 5**x3 in such a way so that the products come out in numerical order.
    • orangepips
      orangepips over 13 years
      Python solution using generators that's very fast: code.activestate.com/recipes/576961
    • jonderry
      jonderry over 13 years
      I think you can get a sub-linear solution if you don't need the whole sequence (see my answer below).
    • starblue
      starblue over 13 years
    • Will Ness
      Will Ness about 9 years
      the working O(n^(2/3)) solution is at stackoverflow.com/a/10160054/849891. see also rosettacode.org/wiki/….
    • cregox
      cregox almost 9 years
    • boateng
      boateng over 5 years
      You don’t have to start from 1, you can go from last one you’ve found out..
  • moinudin
    moinudin over 13 years
    +1 This solves the problem of finding the nth number fast. You should also add that going through the multiples of 2,3,5 in parallel will remove the need for a bool array.
  • Anil Katti
    Anil Katti over 13 years
    I was familiar with Sieve of Eratosthenes.. First I started thinking about generating a sorted list of all the ugly number - which was not quite clean. Then I ventures into the trivial solution (which was damn slow obviously). Sieve of Eratosthenes should help me solve the problem in O(U(n)) where U(n) is the nth ugly number.
  • Nikita Rybak
    Nikita Rybak over 13 years
    @Anil You don't have to store elements in array, you can use any other type of container, like heap. This can give you O(n*logn) easily. There's also an approach described by marcog: it'll give O(n), but it's a bit trickier.
  • Anon.
    Anon. over 13 years
    @Anil: When I made the comparison to the Sieve, I didn't really mean "keep an array of bools and eliminate possibilities as you go up" - I was more referring to the general method of generating solutions based on previous results. Where the Sieve gets a result and the removes all multiples of it from the candidate set, a good algorithm for this problem would start with an empty set and then add the correct multiples of each ugly number to that.
  • Nikita Rybak
    Nikita Rybak almost 13 years
    @vardhan If you don't understand something, ask. Don't just 'fix' things.
  • Jim Balter
    Jim Balter about 11 years
    @vardhan "The 2nd solution is not completely linear -- The 3 while loops inside the for loops cannot be described as constant time." -- Um, utterly wrong. Each lasti ranges from 0 to at most n, once total, so they're O(n) total. Put another way, per iteration of the for loop, the average number of iterations of each of the 3 inner loops is <= 1, which is indeed constant time.
  • Will Ness
    Will Ness about 10 years
    yes, the n^{2/3} is correct, though I didn't follow your arguments here. This is done by enumerating the i,j,k triples to not reach above an estimated value of n-th member of the sequence (since ln2, ln3, ln5 are known). Code and links in this answer.
  • interjay
    interjay over 9 years
    There are 17 ugly numbers between 2 and 30, not 22. And adding 30 will not make another one. For example, 3 is ugly but 33 isn't.
  • guidothekp
    guidothekp over 9 years
    Oops. I should have read the question more carefully. The problem that needs to be solved should be for numbers of the form 2^a*3^b*5^c. What I solved was for numbers that are multiples of 2, 3, and 5 and these include primes such as 7, 11, etc.
  • Kakira
    Kakira almost 9 years
    Is the while loop necessary though? Won't prev be one of the three last? Like the top solution here: stackoverflow.com/questions/5505894/…
  • gnasher729
    gnasher729 over 8 years
    It's a shame that the only fast solution has so few votes. It will easily find the one millionth ugly number around 10^253 by my estimate.
  • Teepeemm
    Teepeemm over 8 years
    Please format your answer properly. Python is meaningless if you don't.
  • gnasher729
    gnasher729 over 8 years
    That's wrong. Ugly numbers include for example 60 = 2^2 * 3^1 * 5^1 which is not on any of the lists.
  • Zhan
    Zhan over 8 years
    no, i think the function covers the ugly number 60. try the the function: nthuglynumber(26) in python. it will return 60.
  • Will Ness
    Will Ness over 8 years
    finding n first members of Hamming sequence is an O(n) time calculation. n log n is totally unnecessary. the accepted answer's second version (under "edit") is O(n). (it is also what Dijkstra wrote, down to the whiles -- ifs are enough really, but he wrote that using while leaves no room for doubt, correctness-wise).
  • Will Ness
    Will Ness over 8 years
    @gnasher729 1000000-th Hamming number is 5.19312780448E+83, actually.
  • Will Ness
    Will Ness over 8 years
    @Kakira if is enough ; but no, sometimes two or even all three have to be advanced at once ; in the linked solution the two ifs are sequential, not exclusive; I think Dijkstra himself wrote this algo down with the whiles, so as not to leave room for any doubt correctness-wise, I think his reasoning was.
  • Will Ness
    Will Ness over 8 years
    @gnasher729 no, 60 is on all three lists: 60 = 30 * 2 = 10 * 3 = 12 * 5.
  • derek
    derek over 6 years
    The explanation is wrong. Suppose we add "7*2", "7*3", "7*5" to the 3 lists.
  • Will Ness
    Will Ness over 5 years
    @derek the explanation might be incomplete but the code is actually right. too bad I can't upvote several times to compensate for the wrong downvotes. I asked for such capability on meta, and was voted down into oblivion, too.
  • Will Ness
    Will Ness over 5 years
    @derek the explanation was incomplete but the code is actually right. I've updated the explanation and the example, for clarity.
  • Will Ness
    Will Ness over 5 years
    this code is exponential in the (qubic root of the) number k of ugly numbers it produces: n ~ exp (k ^ (1/3)). Dijkstra's algo is linear in k. It is shown in several answers here, e.g. this.
  • Will Ness
    Will Ness over 5 years
    what is the value of Iterator.from(6).drop(1).next()? isn't it 7? if so, it would mean this code is wrong. as a test, what is the 1000th hamming number produced by this code, please? is it 51200000?
  • Will Ness
    Will Ness over 5 years
    this answer makes absolutely no sense to me at all. you "can simply build a list of ugly numbers"?? the question is how?
  • Will Ness
    Will Ness over 5 years
    this code is wrong. it produces e.g. 14=7*2, 21 = 7*3, 22 = 11*2...
  • Will Ness
    Will Ness over 5 years
    works for 100, 10000 (I verified that the results are correct -- the value returned is at the index n in the sequence, zero-based), but fails for 1000 with "list index out of range" error. ideone.com/6hnIxg
  • Abhijit Sarkar
    Abhijit Sarkar over 5 years
    @WillNess fixed, thanks for finding the bug. I didn't try to generate the 1000 number, but I tested up to 15. Also, if I was going to use this code for generating a large sequence, I'd probably use a mutable sequence, and also try not to repeat BigInt multiplications.
  • Alex Peter
    Alex Peter almost 5 years
    The internal while is making this a bad reading, it looks like we can move last2 last3 or last5 for more than 1, which we cannot. :( It is confused if last2 is pointer or it is a power of 2 on first reading. :( Really no reason for that. We do not loop for more than 1.
  • Will Ness
    Will Ness over 4 years
    @AlexPeter correct, but needs knowledge / mathematical reasoning external to code. as per Dijkstra (and the guy is pretty famous), with the whiles the code's correctness is self-evident, redundancy notwithstanding.
  • chqrlie
    chqrlie about 4 years
    The first fragment will fail to find ugly numbers greater than 1845281250000000000, which is the 11389-th ugly number because cur * 5 will overflow the range of type long.
  • Will Ness
    Will Ness about 4 years
    are you aware of this? the linked answer's code calculates 1 billionth H.N. in 0.02s and 1 trillionth in about 2s on Ideone.
  • chqrlie
    chqrlie about 4 years
    @WillNess: Amazing contribution! but Haskell is so alien to non-afficionados. Do your published timings include the computation of the exact values, and the conversion to base 10?
  • Will Ness
    Will Ness about 4 years
    the code calculates (2,3,5) exponents triples; exact values is a matter of simple BIGNUM arithmetic. it shows also its decimal approximation, e.g. 1B --> (1334,335,404) --> "6.216075755562335E+843". there's nothing especially haskelly about the algorithm.
  • Will Ness
    Will Ness about 4 years
    I mean, the triples are exact, of course. exponentiation and printing (in decimal) is already provided by Haskell, so I haven't bothered reimplementing it. the interpreter responds to 2^1334*3^335*5^404 printing the result without delay (it says 0.02s after printing). It is easy to add this to the code on Ideone, I just didn't want to clutter up the output.
  • Will Ness
    Will Ness about 4 years
    I've added the full exact number printout to the Ideone entry; the run time didn't change for the 1Bth number. for the 1Tth though the time grew by almost a second on top the previous 2 seconds.
  • chqrlie
    chqrlie about 4 years
    @WillNess: The algorithm is simple and elegant, and Louis Klauder's top band idea is brilliant. Computing with base 2 logs saves about 10%. Just one remark on the algorithm: you sort the band at the end, but you just need to find the m-th entry in the sorted array, which can be computed faster with a custom partitioning function. Does Haskell provide such a primitive? The sorting time is probably negligible compared to the enumeration loops...
  • Will Ness
    Will Ness about 4 years
    Louis Klauder's code in that (vanished?) DDJ blog thread was using an arbitrary top logvalue; then it would find the top index of the enumerated band that it chanced upon (this I don't exactly remember; maybe it was just printing few big HN's). My contribution was to add the estimation of that top logvalue and figuring out the empirical corrections (and band indexing). then, switching to logbase 2 and realizing that only one top i index needs to be used for each (j,k) pair if the width is < 1 (base 2).
  • Will Ness
    Will Ness about 4 years
    re: finding the m-th entry, yes, but the band sorting time is indeed negligible AFAIR, complexity-wise for sure. re: Haskell's quick-partition, I don't know. the built-in sort is mergesort, so with it, laziness doesn't help and counting n-1 elements in the prefix to get to the nth is actually sorting the whole prefix AFAICS. taking just the head element is O(n), due to laziness.
  • Will Ness
    Will Ness about 4 years
    er, Haskell's *quickselect... searching for "quickselect Haskell" does produce some hits, e.g. on RosettaCode, but that one's O(n^2) in worst case. there's also a proper package on hackage but it's for vectors (though lists can be converted to and from, of course).