How to generate a random 4 digit number not starting with 0 and having unique digits?

python python-3.x random sampling

17,263

Solution 1

We generate the first digit in the 1 - 9 range, then take the next 3 from the remaining digits:

import random

# We create a set of digits: {0, 1, .... 9}
digits = set(range(10))
# We generate a random integer, 1 <= first <= 9
first = random.randint(1, 9)
# We remove it from our set, then take a sample of
# 3 distinct elements from the remaining values
last_3 = random.sample(digits - {first}, 3)
print(str(first) + ''.join(map(str, last_3)))

The generated numbers are equiprobable, and we get a valid number in one step.

Solution 2

Just loop until you have something you like:

import random

numbers = [0]
while numbers[0] == 0:
    numbers = random.sample(range(10), 4)

print(''.join(map(str, numbers)))

Solution 3

This is very similar to the other answers but instead of sample or shuffle you could draw a random integer in the range 1000-9999 until you get one that contains only unique digits:

import random

val = 0  # initial value - so the while loop is entered.
while len(set(str(val))) != 4:  # check if it's duplicate free
    val = random.randint(1000, 9999)

print(val)

As @Claudio pointed out in the comments the range actually only needs to be 1023 - 9876 because the values outside that range contain duplicate digits.

Generally random.randint will be much faster than random.shuffle or random.choice so even if it's more likely one needs to draw multiple times (as pointed out by @karakfa) it's up to 3 times faster than any shuffle, choice approach that also needs to join the single digits.

Solution 4

I do not know Python well, but something like

digits=[1,2,3,4,5,6,7,8,9] <- no zero
random.shuffle(digits)
first=digits[0] <- first digit, obviously will not be zero
digits[0]=0 <- used digit can not occur again, zero can
random.shuffle(digits)
lastthree=digits[0:3] <- last three digits, no repeats, can contain zero, thanks @Dubu

A more useful iteration, actually creating a number:

digits=[1,2,3,4,5,6,7,8,9]   # no zero
random.shuffle(digits)
val=digits[0]                # value so far, not zero for sure
digits[0]=0                  # used digit can not occur again, zero becomes a valid pick
random.shuffle(digits)
for i in range(0,3):
  val=val*10+digits[i]       # update value with further digits
print(val)

After stealing pieces from other solutions, plus applying the tip from @DavidHammen:

val=random.randint(1,9)
digits=[1,2,3,4,5,6,7,8,9]
digits[val-1]=0
for i in random.sample(digits,3):
  val=val*10+i
print(val)

Solution 5

rejection sampling method. Create a 4 digit random combination from 10 digits and resample if it doesn't match the criteria.

r4=0    
while r4 < 1000:
    r4=int(''.join(map(str,random.sample(range(10),4))))

noticed that this is essentially the same as @Austin Haskings's answer

View more solutions

17,263

Author by

Menon A.

Updated on June 18, 2022

Comments

Menon A. almost 2 years
This works almost fine but the number starts with 0 sometimes:
```
import random
numbers = random.sample(range(10), 4)
print(''.join(map(str, numbers)))
```
I've found a lot of examples but none of them guarantee that the sequence won't start with 0.
Mark Ransom almost 7 years

It will also allow a single repeated digit.
Tim Peters almost 7 years

Except the allowable outcomes aren't equally likely then; e.g., there are two ways to get 1230, but only one way to get 1234.
Jean-François Fabre almost 7 years

@TimPeters you're right. Added an alternate solution.
janscas almost 7 years

if you get 0023, then you have the same problem
Dima almost 7 years

Would you need to multiply the first digit by 1000 before adding the two? Or does the plus sign just concatenate them?
karakfa almost 7 years

with this method, you need to reject more than half of the samples. Target range: 9*9*8*7 = 4536, sample space 8999.
MSeifert almost 7 years

@karakfa But it avoids shuffling a "range" and joining the list and checking if the first item is 0. I think that's worth it. A superficial timing shows that this approach using randint is roughly 3 times faster compared to the random.sample approaches here.
Jean-François Fabre almost 7 years

but you cannot have 0023. There's only one 0
miradulo almost 7 years

@MSeifert Of course, that's for 4 digits. Your rejection space is going to blow up and slow you down with say, 7 digits (I know that's not the question :) ). Practically speaking though, I like this answer best.
karakfa almost 7 years

Yes, it might be faster. What I meant to comment is it's going to throw away more than half of the samples. It might still very well be the most efficient one.
Thierry Lathuille almost 7 years

@Claudio Independence of the numbers generated is one of the fundamental properties a good pseudo-random generator must satisfy. I suppose that Python's random is good enough that subsequent calls can be considered independent.
muddyfish almost 7 years

@Claudio the probabilities still work out to be equal for all results so I'm not sure what you mean by not as random as it should be
miradulo almost 7 years

@Claudio I'm sorry but you should probably consult some kind of reference on accept-reject sampling and review conditional probability before suggesting all of the answers are incorrect.
Claudio almost 7 years

To make it even more perfect you can limit further the range for choices using for example random.randint(1023, 9876), right?
MSeifert almost 7 years

But the not every possible value is drawn with the same probability. Because for example 1234 and 01234 produce the same result while numbers containing a not-leading zero (e.g. 1023) have only one possibility to be drawn.
miradulo almost 7 years

@MSeifert Ahh you're correct.. How about swapping zero with a random location in the list should a zero-leading number be drawn? Then we redistribute evenly back across the sample space in that case.
miradulo almost 7 years

e.g. random.shuffle(l); if l[0] == 0: pos = random.choice(range(1, 10)); l[0], l[pos] = l[pos], l[0]
miradulo almost 7 years

@foobar No, MSeifert has a point. Your current approach doesn't draw from the sample space uniformly. (I think) my suggestion fixes this though.
FooBar167 almost 7 years

@Mitch -- Yes, you're right. I'll fix my note according to your suggestion. And check it on [0,1,2] list with 2 unique digits not starting with 0. There are only 3! == 6 possibilities (easy to check).
FooBar167 almost 7 years

@MSeifert -- Yes. For example, take list [0,1,2] and combine only two unique random digits not starting from zero. There are only 3! == 6 possibilities: [0,1,2]; [0,2,1]; [1,0,2]; [1,2,0]; [2,0,1]; [2,1,0]. If I shift on one position, I'll have [1,2] and [2,1] twice as often than [1,0] and [2,0]. So I should use suggestion of Mitch and randomly swap leading zero with the second or the third position. This will give the same probabilities.
MSeifert almost 7 years

I'm not really sure if that garantuees an even distribution. For example 1234 wouldn't hit the branch (say it would have a normalized probability of 1). But numbers like 1203 could be produced either by shuffling once (probability 1) and by drawing 0213 (granted it's a chance of 1/9th). Thus these would have a probability of ~1.1. This reasoning may be wrong (I'm not a statistician) and I personally liked the original approach (it was very simple and therefore elegant!). Even distribution wasn't part of the question so a skewed distribution would answer the question just fine.
FooBar167 almost 7 years

@MSeifert -- Lets take simpler example: list [0,1,2] and two unique digits not starting with 0. There are 3!==6 possibilities. Each possibility has probability 1/6. There are 2 leading zeros: [0,1,2] and [0,2,1]. If randomly swap leading 0 with the rest 2 positions, there are four combinations: [0,1,2]==>([1,0,2] and [2,1,0]); [0,2,1]==>([2,0,1] and [1,2,0]). Each of these four combinations have the same probability equal to 1/6 * 1/2 == 1/12. But these 4 combinations coincide with the rest 4 possibilities. So each possibility after random swap should have probability == 1/6 + 1/12.
MSeifert almost 7 years

Yes, you're totally right. :) I forgot that 02341 could also produce 1234. I'm glad I did not remove my upvote based on my flawed argument. (even though I really liked your first approach). :)
jpmc26 almost 7 years

I think I'd just put val = 0 for the start. I don't like having to think about repeated digits that hard. +1, though.
toddkaufmann almost 7 years

I don't know why this was downvoted, it's short and combinatorical. It's exactly how I thought of doing it when reading the title. Setting first to 0 after getting first digit is "clever", I would have just added zero back to the list. This looks to be identical to Claudio's, which is tl;dr and less readable. This can generate all 4536 answers, without unnecessary looping or testing.
Dubu almost 7 years

This is good, except that it should read lastthree=digits[0:3] because of Python's unusual array indexing.
tevemadar almost 7 years

@toddkaufmann I wanted to do that first, just I did not know the syntax for appending an item, and then I realized it was not needed anyway
FooBar167 almost 7 years

@MSeifert -- added mathematical proof.
MSeifert almost 7 years

return outside a function is a syntaxerror in python. Also note that you can use a range from 1023 to 9876 as pointed out here.
Lightness Races in Orbit almost 7 years

This could loop forever. Hardly a versatile solution.
Lightness Races in Orbit almost 7 years

This could loop forever. Not useful.
karakfa almost 7 years

Probability of that is zero... You'll be surprised how common this technique is used in practice.
Lightness Races in Orbit almost 7 years

The probability is certainly not zero.
karakfa almost 7 years

lim 10^(-n) = 0; for n -> Infinity (aka forever)
Lightness Races in Orbit almost 7 years

Okay how about it could loop for two years. Is that more useful?
Ilmari Karonen almost 7 years

Note that, while this method does indeed generate each possible output with the same probability, this fact isn't trivially obvious. The key reason why it does produce equiprobable reasons is that, after picking the first digit, the number of remaining possible digits is the same regardless of which first digit was chosen. In particular, if you tried to reverse the procedure (i.e. first pick the lowest three distinct digits, and then pick a non-zero first digit out of the remaining seven), the outputs would not be equiprobable. (Exercise to readers: why?)
karakfa almost 7 years

The probability is negligible for all practical purposes. For example, to loop 1000 times without producing a valid sample (which takes 16 msecs on my computer) is probability 10^(-1000) event.
David Hammen almost 7 years

@BoundaryImposition, and all the other downvoters: This is a very versatile solution. It even has a name, "rejection sampling." Most implementations of a normal distribution use rejection sampling.
David Hammen almost 7 years

@BoundaryImposition - You are barking up the wrong tree here. Rejection sampling is a very widely used technique. The only problems with this answer is that Austin Hasking's wrote a very similar answer two minutes earlier (probably while karakfa was writing this answer) and that Austin Hasking's answer is a bit better.
David Hammen almost 7 years

@BoundaryImposition, and all the other downvoters: Of the answers by Austin Hastings, MSeifert, karakfa, andThierry Lathuille, this is the fastest in both python 2.7 and python 3.4. In fact, the accepted answer is the slowest, 40% slower than is this answer (python 2.7).
David Hammen almost 7 years

While random.randint is much faster than is random.sample, len(set(str(val))) is very slow. The end result is that this is slower than Austin Hasting's algorithm in both python 2.7 and python 3.4, but not by much. This answer's algorithm is considerably faster than is that of the accepted answer, and because of that, +1.
David Hammen almost 7 years

Downvoted because this is the slowest of the top answers. This combines set arithmetic with random.sample, both of which are slow.
David Hammen almost 7 years

@BoundaryImposition and MSeifert -- Fun fact: This algorithm can be sped up by over 10% by writing it as an infinite loop, with val = random.randint(1023, 9876) and ` if len(set(str(val))) == 4: break` as the body of the loop. This still uses the very expensive len(set(str(val))), but does so one less time than does the code in the answer.
David Hammen almost 7 years

@tevemadar -- While correct, this is quite slow, with two calls to random.shuffle on a nine element list.
MSeifert almost 7 years

@DavidHammen You're right about the break being faster (however randint is slower than len(set(str(val))) so the difference wasn't that much. However I would be interested in how you did the benchmarks where my solution was slower, because I ran it on three machines and the results where roughly consistent: 2-4 times faster compared to the other 5 top-voted answers. So if you have any valid benchmarks proving me wrong I'll be happy to correct the answer :)
tevemadar almost 7 years

@DavidHammen I was not aware of other random functions, like random.sample (despite of using almost identical variable names, I have not spotted the later-accepted answer, there were much-much more failed answer attempts around). Adding a 3rd variant, by the way
David Hammen almost 7 years

@tevemadar - Much better. That puts you in fourth place on my computer, behind Austin Hastings, foo bar, MSeifert, but ahead of karakfa and Thierry Lathuille. So +1.
David Hammen almost 7 years

You mentioned that you don't know python well. One of the key things you need to unlearn is loop based on indices. Just loop over the thing you want to loop over. Do this and you will be on par with MSeifert's answer: Replace tail=random.sample(digits,3) and for i in range(0,3): val=val*10+tail[i] with for n in random.sample(digits,3): val = val*10 + n.
MSeifert almost 7 years

@DavidHammen Note that with python-3 the timings look very different: ideone.com/ous7zZ. Am I right that it seems my solution is a bit slower on python2 but much faster in python3?
David Hammen almost 7 years

@MSeifert - I didn't notice the python3 tag. I have experimented with both python2 and python3 and have noticed that performance is in general quite abysmal with python3. Your's is the one exception, being only 10% slower in python3 rather than python2 (other algorithms take twice as long, or longer). One thing that is common across both versions of the language is that the accepted answer is lousy.
David Hammen almost 7 years

@MSeifert -- Even your algorithm is a bit slower in python3 than it is in python2. I reduced the timeit number on ideone so as not to overly abuse their computers. Expand that number and it's clear than every algorithm is slower in python3. I suspect that python3 has drastically improved the performance of len, set, and str, but has somehow disimproved the performance of random, with particular attention paid to disimproving the performance of random.sample and random.shuffle.
David Hammen almost 7 years

@MSeifert - Wow. On my computer, python -m timeit -s 'import random' 'random.sample(range(10),3)' takes 2.41 usec per loop. With python3, it takes 6.38 usec per loop.
MSeifert almost 7 years

@DavidHammen Yeah, that was why I wrote that randint is much faster than sample (and shuffle). I'm not sure what changed, it could be that range doesn't return a list in python3 or that list indexing got slower but maybe they even fixed problems with the PRNG that resulted in slowdowns. But yeah, it's definetly interesting. Might be worth a follow-up question, we probably can't solve this here in the comments. :)
David Hammen almost 7 years

On the other hand, python -m timeit -s 'n=123345678' 'len(set(str(n)))' takes 0.804 usec per loop with python, 0.74 usec per loop with python3.
David Hammen almost 7 years

Another problem we can't solve in comments: The accepted and most highly up-voted answer has incredibly lousy performance in both versions of the language. I don't understand the upvotes or the acceptance. This is a people problem rather than a language problem.
MSeifert almost 7 years

@DavidHammen Performance wasn't part of the question so I wouldn't judge on that point (even though it's good to point out). However the accepted answer doesn't need "sample rejection". Maybe that was the reason that made it the top-voted and accepted answer. :)
David Hammen almost 7 years

The (invalid) comments about rejecting the notion of rejection sampling were all about performance.
David Hammen almost 7 years

@MSeifert -- Question asked.
tevemadar almost 7 years

@DavidHammen Thank you, added the idea in the final code. If you have a strong stomach, please find a fast, but assembly-like approach in stackoverflow.com/a/43631936/7916438
miradulo almost 7 years

@DavidHammen Why is that worthy of a downvote? On my machine, this answer is less than 4 microseconds slower than Austin Hastings answer, and performance wasn't even mentioned in the question. I would hesitate to call a constructive approach alongside rejection sampling techniques "not useful".
David Hammen almost 7 years

@Mitch - Mostly to counter what I consider to be completely invalid downvotes on the other answers due to lack of understanding of the concept of rejection sampling. An easy way to understand rejection sampling: Suppose I asked you to produce all of the four digit numbers that have no repeated digits and that start with 1. You could come up with a complex algorithm to do that, or you could use [n for n in range(1000,10000) if len(set(str(n)))==4]. (continued)
David Hammen almost 7 years

The if in the list comprehension that filters out values containing replicated digits is the equivalent of rejection sampling. Sometimes it's faster to intentionally generate invalid values and then filter them out. The exact same concept applies to rejection sampling.
miradulo almost 7 years

I understand rejection sampling quite fine, thank you, and prefer Austin's answer to this answer as well. Downvoting a perfectly fine answer just seems... odd.
David Hammen almost 7 years

@Mitch -- The probability of looping is even smaller than that. A call to random.sample(range(10), 4) will produce one of 5040 different lists, of which 504 have a leading zero. In other words, exactly one out of ten of the lists generated by that call to random.sample fails the criteria. The probability of having to loop at all is 10%, having to make two or more loops is 1%, four or more loops, 0.01%. The probability drops very quickly. The key problem with this approach is how poorly random.sample performs in python3.
miradulo almost 7 years

@DavidHammen Ahh I thought he was using pure rejection sampling like MSeifert, didn't take a close look at the answer. You are correct.
Eric Duminil almost 7 years

Interesting. Did you check that the results are uniformly distributed?
Eric Duminil almost 7 years

Nice answer. You might be interested in a one-liner (after imports) variant of your code.
tevemadar almost 7 years

@EricDuminil the distribution over the set of valid outputs follows the distribution of randint(). What I checked is the correctness of mapping inputs to outputs, added some lines about it now