Python pairwise comparison of elements in a array or list

python arrays numpy comparison rank

19,458

Solution 1

Perhaps you want:

 [x >= y for i,x in enumerate(a) for j,y in enumerate(a) if i != j]

This will not compare any item against itself, but compare each of the others against each other.

Solution 2

You could use NumPy broadcasting -

# Get the mask of comparisons in a vectorized manner using broadcasting
mask = a[:,None] >= a

# Select the elements other than diagonal ones
out = mask[~np.eye(a.size,dtype=bool)]

If you rather prefer to set the diagonal elements as False in mask and then mask would be the output, like so -

mask[np.eye(a.size,dtype=bool)] = 0

Sample run -

In [56]: a
Out[56]: array([3, 7, 5, 8])

In [57]: mask = a[:,None] >= a

In [58]: mask
Out[58]: 
array([[ True, False, False, False],
       [ True,  True,  True, False],
       [ True, False,  True, False],
       [ True,  True,  True,  True]], dtype=bool)

In [59]: mask[~np.eye(a.size,dtype=bool)] # Selecting non-diag elems
Out[59]: 
array([False, False, False,  True,  True, False,  True, False, False,
        True,  True,  True], dtype=bool)

In [60]: mask[np.eye(a.size,dtype=bool)] = 0 # Setting diag elems as False

In [61]: mask
Out[61]: 
array([[False, False, False, False],
       [ True, False,  True, False],
       [ True, False, False, False],
       [ True,  True,  True, False]], dtype=bool)

Runtime test

Reasons to use NumPy broadcasting? Performance! Let's see how with a large dataset -

In [34]: def pairwise_comp(A): # Using NumPy broadcasting    
    ...:     a = np.asarray(A) # Convert to array if not already so
    ...:     mask = a[:,None] >= a
    ...:     out = mask[~np.eye(a.size,dtype=bool)]
    ...:     return out
    ...: 

In [35]: a = np.random.randint(0,9,(1000)).tolist() # Input list

In [36]: %timeit [x >= y for i,x in enumerate(a) for j,y in enumerate(a) if i != j]
1 loop, best of 3: 185 ms per loop # @Sixhobbits's loopy soln

In [37]: %timeit pairwise_comp(a)
100 loops, best of 3: 5.76 ms per loop

Solution 3

I'd like to apply @Divakar's solution to pandas objects. Here are two approaches for calculating pairwise absolute differences.

(IPython 6.1.0 on Python 3.6.2)

In [1]: import pandas as pd
   ...: import numpy as np
   ...: import itertools

In [2]: n = 256
   ...: labels = range(n)
   ...: ser = pd.Series(np.random.randn(n), index=labels)
   ...: ser.head()
Out[2]: 
0    1.592248
1   -1.168560
2   -1.243902
3   -0.133140
4   -0.714133
dtype: float64

Loops

In [3]: %%time
   ...: result = dict()
   ...: for pair in itertools.combinations(labels, 2):
   ...:     a, b = pair
   ...:     a = ser[a]  # retrieve values
   ...:     b = ser[b]
   ...:     result[pair] = a - b

   ...: result = pd.Series(result).abs().reset_index()
   ...: result.columns = list('ABC')
   ...: df1 = result.pivot('A', 'B, 'C').reindex(index=labels, columns=labels)
   ...: df1 = df1.fillna(df1.T).fillna(0.)
CPU times: user 18.2 s, sys: 468 ms, total: 18.7 s
Wall time: 18.7 s

NumPy broadcast

In [4]: %%time
   ...: arr = ser.values
   ...: arr = arr[:, None] - arr
   ...: df2 = pd.DataFrame(arr, labels, labels).abs()
CPU times: user 816 µs, sys: 432 µs, total: 1.25 ms
Wall time: 675 µs

Verify they're equal:

In [5]: df1.equals(df2)
Out[5]: True

Using loops is about 20000 times slower than the clever NumPy approach. NumPy has many optimizations, but sometimes they need a different way of thinking. :-)

Solution 4

You may achieve that by using:

[x >= y for i,x in enumerate(a) for j,y in enumerate(a) if i != j]

Issue with your code:

You are iterating in over list twice. If you convert your comprehension to loop, it will work like:

for x in a:
    for y in a:
        x>=y # which is your condition

Hence, it the the order of execution is as: (a1, a1), (a1, a2), ... , (a2, a1), (a2, a2), ... , (a4, a4)

Solution 5

Why are you worried about the a1>=a1 comparison. It may be pridictable, but skipping it might not be worth the extra work.

Make a list of 100 numbers

In [17]: a=list(range(100))

Compare them with the simple double loop; producing a 10000 values (100*100)

In [18]: len([x>=y for x in a for y in a])
Out[18]: 10000
In [19]: timeit [x>=y for x in a for y in a]
1000 loops, best of 3: 1.04 ms per loop

Now use @Moinuddin Quadri's enumerated loop to skip the 100 eye values:

In [20]: len([x>=y for i,x in enumerate(a) for j, y in enumerate(a) if i!=j])
Out[20]: 9900
In [21]: timeit [x>=y for i,x in enumerate(a) for j, y in enumerate(a) if i!=j]
100 loops, best of 3: 2.12 ms per loop

It takes 2x longer. Half the extra time is the enumerates, and half the if.

In this case working with numpy arrays is much faster, even when including the time to create the array.

xa = np.array(x); Z = xa[:,None]>=xa

But you can't get rid of the the diagonal values. They will the True; they can be flipped to False, but why. In a boolean array there are only 2 values.

The fastest solution is to write an indicator function that isn't bothered by these diagonal values.

View more solutions

19,458

Author by

Sue

Updated on June 15, 2022

Comments

Sue almost 2 years
Let me elaborate my question using a simple example.I have a=[a1,a2,a3,a4], with all ai being a numerical value.

What I want to get is pairwise comparisons within 'a', such as I(a1>=a2), I(a1>=a3), I(a1>=a4), ,,,,I(a4>=a1), I(a4>=a2), I(a4>=a3), where I is a indicator function. So I used the following code.
```
res=[x>=y for x in a for y in a]
```
But it also gives the comparison results like I(a1>=a1),..,I(a4>=a4), which is always one. To get rid of these nuisance, I convert res into a numpy array and find the off diagonal elements.
```
res1=numpy.array(res)
```
This gives the result what I want, but I think there should be more efficient or simpler way to do pairwise comparison and extract the off diagonal element. Do you have any idea about this? Thanks in advance.
Jean-François Fabre over 7 years

what about [x >= y for i,x in enumerate(a) for j,y in enumerate(a) if i > j]
TemporalWolf over 7 years

@Jean-FrançoisFabre OP wants the reverse results as well: I(a1>=a4) and I(a4>=a1)
Sue over 7 years

Yes. I need both.
Sue over 7 years

Looks great! appreciate it.