How to generate random pairs of numbers in Python, including pairs with one entry being the same and excluding pairs with both entries being the same?

python numpy random

13,887

Solution 1

Generator random unique coordinates:

from random import randint

def gencoordinates(m, n):
    seen = set()

    x, y = randint(m, n), randint(m, n)

    while True:
        seen.add((x, y))
        yield (x, y)
        x, y = randint(m, n), randint(m, n)
        while (x, y) in seen:
            x, y = randint(m, n), randint(m, n)

Output:

>>> g = gencoordinates(1, 100)
>>> next(g)
(42, 98)
>>> next(g)
(9, 5)
>>> next(g)
(89, 29)
>>> next(g)
(67, 56)
>>> next(g)
(63, 65)
>>> next(g)
(92, 66)
>>> next(g)
(11, 46)
>>> next(g)
(68, 21)
>>> next(g)
(85, 6)
>>> next(g)
(95, 97)
>>> next(g)
(20, 6)
>>> next(g)
(20, 86)

As you can see coincidentally an x coordinate was repeated!

Solution 2

Let's say that your x and y coordinates are all integers between 0 and n. For small n a simple method might be to generate the set of all possible xy coordinates using np.mgrid, reshape it to a (nx * ny, 2) array, then sample random rows from this:

nx, ny = 100, 200
xy = np.mgrid[:nx,:ny].reshape(2, -1).T
sample = xy.take(np.random.choice(xy.shape[0], 100, replace=False), axis=0)

Creating the array of all possible coordinates can become expensive if nx and/or ny is very large, in which case it might be better to use a generator object and keep track of previously used coordinates, as in James' answer.

Following @morningsun's suggestion, an alternative method is to sample from the set of nx*ny indices into the flattened array then convert these directly to x, y coordinates, which avoids constructing the whole nx*ny array of possible x, y permutations.

For comparison, here's a version of my original approach generalized for N-dimensional arrays, plus a version that uses the new approach:

def sample_comb1(dims, nsamp):
    perm = np.indices(dims).reshape(len(dims), -1).T
    idx = np.random.choice(perm.shape[0], nsamp, replace=False)
    return perm.take(idx, axis=0)

def sample_comb2(dims, nsamp):
    idx = np.random.choice(np.prod(dims), nsamp, replace=False)
    return np.vstack(np.unravel_index(idx, dims)).T

There's not a huge difference in practice, but the benefits of the second method become a bit more apparent for larger arrays:

In [1]: %timeit sample_comb1((100, 200), 100)
100 loops, best of 3: 2.59 ms per loop

In [2]: %timeit sample_comb2((100, 200), 100)
100 loops, best of 3: 2.4 ms per loop

In [3]: %timeit sample_comb1((1000, 2000), 100)
1 loops, best of 3: 341 ms per loop

In [4]: %timeit sample_comb2((1000, 2000), 100)
1 loops, best of 3: 319 ms per loop

If you have scikit-learn installed, sklearn.utils.random.sample_without_replacement offers a much faster method for generating random indices without replacement using Floyd's algorithm:

from sklearn.utils.random import sample_without_replacement

def sample_comb3(dims, nsamp):
    idx = sample_without_replacement(np.prod(dims), nsamp)
    return np.vstack(np.unravel_index(idx, dims)).T

In [5]: %timeit sample_comb3((1000, 2000), 100)
The slowest run took 4.49 times longer than the fastest. This could mean that an
intermediate result is being cached 
10000 loops, best of 3: 53.2 µs per loop

Solution 3

@James Miles answer is great, but just to avoid endless loops when accidentally asking for too many arguments I suggest the following (it also removes some repetitions):

def gencoordinates(m, n):
    seen = set()
    x, y = randint(m, n), randint(m, n)
    while len(seen) < (n + 1 - m)**2:
        while (x, y) in seen:
            x, y = randint(m, n), randint(m, n)
        seen.add((x, y))
        yield (x, y)
    return

Note that wrong range of values will still propagate down.

13,887

Dave

Updated on September 15, 2022

Comments

Dave over 1 year
I'm using Python and was using numpy for this. I want to generate pairs of random numbers. I want to exclude repetitive outcomes of pairs with both entries being the same number and I want to include pairs which only have one entry being the same number.I tried to use
```
import numpy
numpy.random.choice(a,(m,n),replace=False) 
```
for it, but it excludes any tupels with the the same entries completely, i.e.
```
import numpy
numpy.random.choice(a=2,(m=2,n=1),replace=False) 
```
gives me only (1,0) and (0,1) and not (1,1), (0,0), (1,0) and (0,1).

I want to do this because I want to draw a sample of random tuples with a large a and large n(as used above) without getting exactly the same tupels more then once. It also should be more or less efficient. Is there a way that's already implemented to do this?
- Tim Pietzcker almost 9 years
  
  I've read this three times, and I still don't get it. If you want to exclude pairs where both entries are the same number, then you should exclude (1,1) and (0,0) - could you explain why you're unhappy with the result you have?
- James Mills almost 9 years
  
  Then don't you just want a generator to generate a random pair of unique (x, y) coordinates?
Dave almost 9 years

I'll check back on this later, got a group meeting now. Thank you a lot already for the quick answer
Dave almost 9 years

I implemented lines/the idea of your code and for excercise did my own. Thank you again.
Admin over 8 years

I thought a random choice of the linear indices would be better, but in practice it's not faster. A limitation of the current numpy.