Can I make random mask with Numpy?
Solution 1
Create an array of False
values, set the first 1000
elements them to True
:
a = np.full(10000, False)
a[:1000] = True
Afterwards simply shuffle the array
np.random.shuffle(a)
For a slightly faster solution you can also create an array of integer zeros, set some values to 1
, shuffle and cast it to bool
:
a = np.zeros(10000, dtype=int)
a[:1000] = 1
np.random.shuffle(a)
a = a.astype(bool)
In both cases you will have an array a
with exactly 1000 True
elements at random positions.
If instead you want each element to be individually picked from [True, False]
you could use
np.random.choice([True, False], size=10000, p=[0.1, 0.9])
but note you cannot predict the number of True
elements in your array. You'll just know that on average you'll have 1000 of them.
Solution 2
A common solution is creating an array of random integer indices, which can be efficiently done with numpy's random choice
.
With this setup:
n_dim = 10_000 # size of the original array
n = 100 # size of the random mask
rng = np.random.default_rng(123)
To create the array of random index we can use numpy's choice
passing the array size as first argument:
In [5]: %%timeit
...: m = rng.choice(n_dim, replace=False, size=n)
...:
...:
21.9 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
As a comparison, the boolean array approach mentioned in other answers (which requires shuffling an array of 0 and 1s) is quite slower (>10x slower in this example):
In [7]: %%timeit
...: m = np.hstack([np.ones(n, dtype=bool), np.zeros(n_dim - n, dtype=bool)])
...: rng.shuffle(m)
...:
...:
261 µs ± 604 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
NOTE: The integer indexing works best in the sparse case, i.e. when selecting a small fraction of samples from the original array. In this case the RAM usage of an integer index would be much lower than a boolean mask. When the fraction of samples becomes more than 10..20% of the original array the bool mask approach would be more efficient.
NOTE2 The integer indexing will return samples in random order. In order to random sample an array while maintaining the order you need to sort the index. The bool mask would naturally return sorted samples.
To conclude, if you are performing sparse sampling and you don't care about order of the sampled items, the integer indexing shown here is likely to outperform other approaches.
Solution 3
In [7]: import numpy as np
In [8]: mask=np.array( [False]*10000)
In [9]: inds=np.random.choice(np.arange(10000),size=1000)
In [10]: mask[inds]=True
Now the first 100 elements of your mask are
In [11]: print(mask[:100])
[False False False False False True False False False False False False
False False False False False False False False False False True False
False False False False False False False True True False True False
False False False False False False False False False False True False
True False False False False False False False False False False False
False False False False False False True False False False False False
False False True False False False False False False False False False
False False True False False False False False False False False False
False False False False]
Solution 4
Similar to Nils Werner's answer, but more directly:
import numpy as np
size = 10000
num_true = 1000
mask = np.concatenate([np.ones(num_true, dtype=bool), np.zeros(size - num_true, dtype=bool)])
np.random.shuffle(mask)
It is equally fast; using IPython's %%timeit
magic:
%%timeit
a = np.zeros(size, dtype=int)
a[:num_ones] = 1
np.random.shuffle(a)
a = a.astype(bool)
Out: 217 µs ± 2.33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
mask = np.concatenate([np.ones(num_true, dtype=bool), np.zeros(size - num_true, dtype=bool)])
np.random.shuffle(mask)
Out: 201 µs ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Admin
Updated on September 06, 2022Comments
-
Admin over 1 year
I'm doing image processing using Python.
I am trying to randomly extract some pixels from the image.
Is it impossible to make random mask with Numpy?
What I'm thinking now is to make 1000 elements of the 10000 line array True and all else False, is it possible to realize this?
Also, if impossible, is there any other way to make a random mask? Thank you.
-
Anton vBR over 6 yearsI liked this solution. However.... just a thought. We could change the np.full(..) to np.zeros() and set a[:1000] to 1 instead. This should give a speed improvement. Could add a
a.astype(bool)
in the end too. -
Nils Werner over 6 yearsThey are about as fast with the purely boolean array being slightly faster most of the time.
-
Nils Werner over 6 yearsAh, I misread the timings. The
int
solution is indeed the fastest! Will adjust my answer. -
Anakhand almost 5 yearsCould also do
np.concat([np.ones(1000, dtype=bool), np.zeros(10000 - 1000, dtype=bool)])
to save some lines and the conversion tobool
-
Anakhand over 3 yearsThis only works if you don't care about maintaining the original order of the elements after indexing with the mask
m
. If you do care (as is OP's case, probably, since they are extracting pixels from an image), you have to sort the mask after creating it withm.sort()
. The overall complexity of that is worse than the boolean mask method—i.e., it performs worse for larger mask sizes. See this benchmark. -
user2304916 over 3 yearsDear Anakhand, that's a fair point. The requirements of sample sorting is not mentioned in the OP question. The integer index approach works better in the sparse case, i.e. when the fraction of selected samples is a small fraction of the total samples. In this case, it will be both more more efficient (less RAM) and faster than a bool mask. When the fraction of sampled points is larger than ~1/4, I would use a bool mask, as the advantage of an integer mask in terms of RAM would vanish.
-
Safron over 3 years
np.random.choice([True, False], size=10000, p=[0.1, 0.9])
is about 10 times slower than the equivalentnp.random.random_sample(10000) < 0.9
.