How to generate a random 4 digit number not starting with 0 and having unique digits?
Solution 1
We generate the first digit in the 1 - 9 range, then take the next 3 from the remaining digits:
import random
# We create a set of digits: {0, 1, .... 9}
digits = set(range(10))
# We generate a random integer, 1 <= first <= 9
first = random.randint(1, 9)
# We remove it from our set, then take a sample of
# 3 distinct elements from the remaining values
last_3 = random.sample(digits - {first}, 3)
print(str(first) + ''.join(map(str, last_3)))
The generated numbers are equiprobable, and we get a valid number in one step.
Solution 2
Just loop until you have something you like:
import random
numbers = [0]
while numbers[0] == 0:
numbers = random.sample(range(10), 4)
print(''.join(map(str, numbers)))
Solution 3
This is very similar to the other answers but instead of sample
or shuffle
you could draw a random integer in the range 1000-9999 until you get one that contains only unique digits:
import random
val = 0 # initial value - so the while loop is entered.
while len(set(str(val))) != 4: # check if it's duplicate free
val = random.randint(1000, 9999)
print(val)
As @Claudio pointed out in the comments the range actually only needs to be 1023 - 9876 because the values outside that range contain duplicate digits.
Generally random.randint
will be much faster than random.shuffle
or random.choice
so even if it's more likely one needs to draw multiple times (as pointed out by @karakfa) it's up to 3 times faster than any shuffle
, choice
approach that also needs to join
the single digits.
Solution 4
I do not know Python well, but something like
digits=[1,2,3,4,5,6,7,8,9] <- no zero
random.shuffle(digits)
first=digits[0] <- first digit, obviously will not be zero
digits[0]=0 <- used digit can not occur again, zero can
random.shuffle(digits)
lastthree=digits[0:3] <- last three digits, no repeats, can contain zero, thanks @Dubu
A more useful iteration, actually creating a number:
digits=[1,2,3,4,5,6,7,8,9] # no zero
random.shuffle(digits)
val=digits[0] # value so far, not zero for sure
digits[0]=0 # used digit can not occur again, zero becomes a valid pick
random.shuffle(digits)
for i in range(0,3):
val=val*10+digits[i] # update value with further digits
print(val)
After stealing pieces from other solutions, plus applying the tip from @DavidHammen:
val=random.randint(1,9)
digits=[1,2,3,4,5,6,7,8,9]
digits[val-1]=0
for i in random.sample(digits,3):
val=val*10+i
print(val)
Solution 5
rejection sampling method. Create a 4 digit random combination from 10 digits and resample if it doesn't match the criteria.
r4=0
while r4 < 1000:
r4=int(''.join(map(str,random.sample(range(10),4))))
noticed that this is essentially the same as @Austin Haskings's answer
Menon A.
Updated on June 18, 2022Comments
-
Menon A. almost 2 years
This works almost fine but the number starts with 0 sometimes:
import random numbers = random.sample(range(10), 4) print(''.join(map(str, numbers)))
I've found a lot of examples but none of them guarantee that the sequence won't start with
0
. -
Mark Ransom almost 7 yearsIt will also allow a single repeated digit.
-
Tim Peters almost 7 yearsExcept the allowable outcomes aren't equally likely then; e.g., there are two ways to get 1230, but only one way to get 1234.
-
Jean-François Fabre almost 7 years@TimPeters you're right. Added an alternate solution.
-
janscas almost 7 yearsif you get 0023, then you have the same problem
-
Dima almost 7 yearsWould you need to multiply the first digit by 1000 before adding the two? Or does the plus sign just concatenate them?
-
karakfa almost 7 yearswith this method, you need to reject more than half of the samples. Target range:
9*9*8*7 = 4536
, sample space8999
. -
MSeifert almost 7 years@karakfa But it avoids shuffling a "range" and joining the list and checking if the first item is 0. I think that's worth it. A superficial timing shows that this approach using
randint
is roughly 3 times faster compared to therandom.sample
approaches here. -
Jean-François Fabre almost 7 yearsbut you cannot have
0023
. There's only one0
-
miradulo almost 7 years@MSeifert Of course, that's for 4 digits. Your rejection space is going to blow up and slow you down with say, 7 digits (I know that's not the question :) ). Practically speaking though, I like this answer best.
-
karakfa almost 7 yearsYes, it might be faster. What I meant to comment is it's going to throw away more than half of the samples. It might still very well be the most efficient one.
-
Thierry Lathuille almost 7 years@Claudio Independence of the numbers generated is one of the fundamental properties a good pseudo-random generator must satisfy. I suppose that Python's random is good enough that subsequent calls can be considered independent.
-
muddyfish almost 7 years@Claudio the probabilities still work out to be equal for all results so I'm not sure what you mean by not as random as it should be
-
miradulo almost 7 years@Claudio I'm sorry but you should probably consult some kind of reference on accept-reject sampling and review conditional probability before suggesting all of the answers are incorrect.
-
Claudio almost 7 yearsTo make it even more perfect you can limit further the range for choices using for example random.randint(1023, 9876), right?
-
MSeifert almost 7 yearsBut the not every possible value is drawn with the same probability. Because for example
1234
and01234
produce the same result while numbers containing a not-leading zero (e.g.1023
) have only one possibility to be drawn. -
miradulo almost 7 years@MSeifert Ahh you're correct.. How about swapping zero with a random location in the list should a zero-leading number be drawn? Then we redistribute evenly back across the sample space in that case.
-
miradulo almost 7 yearse.g.
random.shuffle(l); if l[0] == 0: pos = random.choice(range(1, 10)); l[0], l[pos] = l[pos], l[0]
-
miradulo almost 7 years@foobar No, MSeifert has a point. Your current approach doesn't draw from the sample space uniformly. (I think) my suggestion fixes this though.
-
FooBar167 almost 7 years@Mitch -- Yes, you're right. I'll fix my note according to your suggestion. And check it on [0,1,2] list with 2 unique digits not starting with 0. There are only 3! == 6 possibilities (easy to check).
-
FooBar167 almost 7 years@MSeifert -- Yes. For example, take list [0,1,2] and combine only two unique random digits not starting from zero. There are only 3! == 6 possibilities: [0,1,2]; [0,2,1]; [1,0,2]; [1,2,0]; [2,0,1]; [2,1,0]. If I shift on one position, I'll have [1,2] and [2,1] twice as often than [1,0] and [2,0]. So I should use suggestion of Mitch and randomly swap leading zero with the second or the third position. This will give the same probabilities.
-
MSeifert almost 7 yearsI'm not really sure if that garantuees an even distribution. For example
1234
wouldn't hit the branch (say it would have a normalized probability of 1). But numbers like1203
could be produced either by shuffling once (probability 1) and by drawing0213
(granted it's a chance of 1/9th). Thus these would have a probability of~1.1
. This reasoning may be wrong (I'm not a statistician) and I personally liked the original approach (it was very simple and therefore elegant!). Even distribution wasn't part of the question so a skewed distribution would answer the question just fine. -
FooBar167 almost 7 years@MSeifert -- Lets take simpler example: list [0,1,2] and two unique digits not starting with 0. There are 3!==6 possibilities. Each possibility has probability 1/6. There are 2 leading zeros: [0,1,2] and [0,2,1]. If randomly swap leading 0 with the rest 2 positions, there are four combinations: [0,1,2]==>([1,0,2] and [2,1,0]); [0,2,1]==>([2,0,1] and [1,2,0]). Each of these four combinations have the same probability equal to 1/6 * 1/2 == 1/12. But these 4 combinations coincide with the rest 4 possibilities. So each possibility after random swap should have probability == 1/6 + 1/12.
-
MSeifert almost 7 yearsYes, you're totally right. :) I forgot that
02341
could also produce1234
. I'm glad I did not remove my upvote based on my flawed argument. (even though I really liked your first approach). :) -
jpmc26 almost 7 yearsI think I'd just put
val = 0
for the start. I don't like having to think about repeated digits that hard. +1, though. -
toddkaufmann almost 7 yearsI don't know why this was downvoted, it's short and combinatorical. It's exactly how I thought of doing it when reading the title. Setting
first
to 0 after getting first digit is "clever", I would have just added zero back to the list. This looks to be identical to Claudio's, which is tl;dr and less readable. This can generate all 4536 answers, without unnecessary looping or testing. -
Dubu almost 7 yearsThis is good, except that it should read
lastthree=digits[0:3]
because of Python's unusual array indexing. -
tevemadar almost 7 years@toddkaufmann I wanted to do that first, just I did not know the syntax for appending an item, and then I realized it was not needed anyway
-
FooBar167 almost 7 years@MSeifert -- added mathematical proof.
-
MSeifert almost 7 years
return
outside a function is a syntaxerror in python. Also note that you can use a range from1023
to9876
as pointed out here. -
Lightness Races in Orbit almost 7 yearsThis could loop forever. Hardly a versatile solution.
-
Lightness Races in Orbit almost 7 yearsThis could loop forever. Not useful.
-
karakfa almost 7 yearsProbability of that is zero... You'll be surprised how common this technique is used in practice.
-
Lightness Races in Orbit almost 7 yearsThe probability is certainly not zero.
-
karakfa almost 7 years
lim 10^(-n) = 0
; forn -> Infinity
(aka forever) -
Lightness Races in Orbit almost 7 yearsOkay how about it could loop for two years. Is that more useful?
-
Ilmari Karonen almost 7 yearsNote that, while this method does indeed generate each possible output with the same probability, this fact isn't trivially obvious. The key reason why it does produce equiprobable reasons is that, after picking the first digit, the number of remaining possible digits is the same regardless of which first digit was chosen. In particular, if you tried to reverse the procedure (i.e. first pick the lowest three distinct digits, and then pick a non-zero first digit out of the remaining seven), the outputs would not be equiprobable. (Exercise to readers: why?)
-
karakfa almost 7 yearsThe probability is negligible for all practical purposes. For example, to loop 1000 times without producing a valid sample (which takes 16 msecs on my computer) is probability 10^(-1000) event.
-
David Hammen almost 7 years@BoundaryImposition, and all the other downvoters: This is a very versatile solution. It even has a name, "rejection sampling." Most implementations of a normal distribution use rejection sampling.
-
David Hammen almost 7 years@BoundaryImposition - You are barking up the wrong tree here. Rejection sampling is a very widely used technique. The only problems with this answer is that Austin Hasking's wrote a very similar answer two minutes earlier (probably while karakfa was writing this answer) and that Austin Hasking's answer is a bit better.
-
David Hammen almost 7 years@BoundaryImposition, and all the other downvoters: Of the answers by Austin Hastings, MSeifert, karakfa, andThierry Lathuille, this is the fastest in both python 2.7 and python 3.4. In fact, the accepted answer is the slowest, 40% slower than is this answer (python 2.7).
-
David Hammen almost 7 yearsWhile
random.randint
is much faster than israndom.sample
,len(set(str(val)))
is very slow. The end result is that this is slower than Austin Hasting's algorithm in both python 2.7 and python 3.4, but not by much. This answer's algorithm is considerably faster than is that of the accepted answer, and because of that, +1. -
David Hammen almost 7 yearsDownvoted because this is the slowest of the top answers. This combines set arithmetic with
random.sample
, both of which are slow. -
David Hammen almost 7 years@BoundaryImposition and MSeifert -- Fun fact: This algorithm can be sped up by over 10% by writing it as an infinite loop, with
val = random.randint(1023, 9876)
and ` if len(set(str(val))) == 4: break` as the body of the loop. This still uses the very expensivelen(set(str(val)))
, but does so one less time than does the code in the answer. -
David Hammen almost 7 years@tevemadar -- While correct, this is quite slow, with two calls to
random.shuffle
on a nine element list. -
MSeifert almost 7 years@DavidHammen You're right about the
break
being faster (howeverrandint
is slower thanlen(set(str(val)))
so the difference wasn't that much. However I would be interested in how you did the benchmarks where my solution was slower, because I ran it on three machines and the results where roughly consistent: 2-4 times faster compared to the other 5 top-voted answers. So if you have any valid benchmarks proving me wrong I'll be happy to correct the answer :) -
tevemadar almost 7 years@DavidHammen I was not aware of other random functions, like random.sample (despite of using almost identical variable names, I have not spotted the later-accepted answer, there were much-much more failed answer attempts around). Adding a 3rd variant, by the way
-
David Hammen almost 7 years@tevemadar - Much better. That puts you in fourth place on my computer, behind Austin Hastings, foo bar, MSeifert, but ahead of karakfa and Thierry Lathuille. So +1.
-
David Hammen almost 7 yearsYou mentioned that you don't know python well. One of the key things you need to unlearn is loop based on indices. Just loop over the thing you want to loop over. Do this and you will be on par with MSeifert's answer: Replace
tail=random.sample(digits,3)
andfor i in range(0,3): val=val*10+tail[i]
withfor n in random.sample(digits,3): val = val*10 + n
. -
MSeifert almost 7 years@DavidHammen Note that with python-3 the timings look very different: ideone.com/ous7zZ. Am I right that it seems my solution is a bit slower on python2 but much faster in python3?
-
David Hammen almost 7 years@MSeifert - I didn't notice the python3 tag. I have experimented with both python2 and python3 and have noticed that performance is in general quite abysmal with python3. Your's is the one exception, being only 10% slower in python3 rather than python2 (other algorithms take twice as long, or longer). One thing that is common across both versions of the language is that the accepted answer is lousy.
-
David Hammen almost 7 years@MSeifert -- Even your algorithm is a bit slower in python3 than it is in python2. I reduced the timeit
number
on ideone so as not to overly abuse their computers. Expand that number and it's clear than every algorithm is slower in python3. I suspect that python3 has drastically improved the performance oflen
,set
, andstr
, but has somehow disimproved the performance ofrandom
, with particular attention paid to disimproving the performance ofrandom.sample
andrandom.shuffle
. -
David Hammen almost 7 years@MSeifert - Wow. On my computer,
python -m timeit -s 'import random' 'random.sample(range(10),3)'
takes 2.41 usec per loop. With python3, it takes 6.38 usec per loop. -
MSeifert almost 7 years@DavidHammen Yeah, that was why I wrote that
randint
is much faster thansample
(andshuffle
). I'm not sure what changed, it could be thatrange
doesn't return a list in python3 or that list indexing got slower but maybe they even fixed problems with the PRNG that resulted in slowdowns. But yeah, it's definetly interesting. Might be worth a follow-up question, we probably can't solve this here in the comments. :) -
David Hammen almost 7 yearsOn the other hand,
python -m timeit -s 'n=123345678' 'len(set(str(n)))'
takes 0.804 usec per loop with python, 0.74 usec per loop with python3. -
David Hammen almost 7 yearsAnother problem we can't solve in comments: The accepted and most highly up-voted answer has incredibly lousy performance in both versions of the language. I don't understand the upvotes or the acceptance. This is a people problem rather than a language problem.
-
MSeifert almost 7 years@DavidHammen Performance wasn't part of the question so I wouldn't judge on that point (even though it's good to point out). However the accepted answer doesn't need "sample rejection". Maybe that was the reason that made it the top-voted and accepted answer. :)
-
David Hammen almost 7 yearsThe (invalid) comments about rejecting the notion of rejection sampling were all about performance.
-
David Hammen almost 7 years@MSeifert -- Question asked.
-
tevemadar almost 7 years@DavidHammen Thank you, added the idea in the final code. If you have a strong stomach, please find a fast, but assembly-like approach in stackoverflow.com/a/43631936/7916438
-
miradulo almost 7 years@DavidHammen Why is that worthy of a downvote? On my machine, this answer is less than 4 microseconds slower than Austin Hastings answer, and performance wasn't even mentioned in the question. I would hesitate to call a constructive approach alongside rejection sampling techniques "not useful".
-
David Hammen almost 7 years@Mitch - Mostly to counter what I consider to be completely invalid downvotes on the other answers due to lack of understanding of the concept of rejection sampling. An easy way to understand rejection sampling: Suppose I asked you to produce all of the four digit numbers that have no repeated digits and that start with 1. You could come up with a complex algorithm to do that, or you could use
[n for n in range(1000,10000) if len(set(str(n)))==4]
. (continued) -
David Hammen almost 7 yearsThe
if
in the list comprehension that filters out values containing replicated digits is the equivalent of rejection sampling. Sometimes it's faster to intentionally generate invalid values and then filter them out. The exact same concept applies to rejection sampling. -
miradulo almost 7 yearsI understand rejection sampling quite fine, thank you, and prefer Austin's answer to this answer as well. Downvoting a perfectly fine answer just seems... odd.
-
David Hammen almost 7 years@Mitch -- The probability of looping is even smaller than that. A call to
random.sample(range(10), 4)
will produce one of 5040 different lists, of which 504 have a leading zero. In other words, exactly one out of ten of the lists generated by that call torandom.sample
fails the criteria. The probability of having to loop at all is 10%, having to make two or more loops is 1%, four or more loops, 0.01%. The probability drops very quickly. The key problem with this approach is how poorlyrandom.sample
performs in python3. -
miradulo almost 7 years@DavidHammen Ahh I thought he was using pure rejection sampling like MSeifert, didn't take a close look at the answer. You are correct.
-
Eric Duminil almost 7 yearsInteresting. Did you check that the results are uniformly distributed?
-
Eric Duminil almost 7 yearsNice answer. You might be interested in a one-liner (after imports) variant of your code.
-
tevemadar almost 7 years@EricDuminil the distribution over the set of valid outputs follows the distribution of randint(). What I checked is the correctness of mapping inputs to outputs, added some lines about it now