Difference between nonzero(a), where(a) and argwhere(a). When to use which?

python numpy

24,722

Solution 1

nonzero and argwhere both give you information about where in the array the elements are True. where works the same as nonzero in the form you have posted, but it has a second form:

np.where(mask,a,b)

which can be roughly thought of as a numpy "ufunc" version of the conditional expression:

a[i] if mask[i] else b[i]

(with appropriate broadcasting of a and b).

As far as having both nonzero and argwhere, they're conceptually different. nonzero is structured to return an object which can be used for indexing. This can be lighter-weight than creating an entire boolean mask if the 0's are sparse:

mask = a == 0  # entire array of bools
mask = np.nonzero(a)

Now you can use that mask to index other arrays, etc. However, as it is, it's not very nice conceptually to figure out which indices correspond to 0 elements. That's where argwhere comes in.

Solution 2

I can't comment on the usefulness of having a separate convenience function that transposes the result of another, but I can comment on where vs nonzero. In it's simplest use case, where is indeed the same as nonzero.

>>> np.where(np.array([[0,4],[4,0]]))
(array([0, 1]), array([1, 0]))
>>> np.nonzero(np.array([[0,4],[4,0]]))
(array([0, 1]), array([1, 0]))

>>> a = np.array([[1, 2],[3, 4]])
>>> np.where(a == 3)
(array([1, 0]),)
>>> np.nonzero(a == 3)
(array([1, 0]),)

where is different from nonzero in the case when you wish to pick elements of from array a if some condition is True and from array b when that condition is False.

>>> a = np.array([[6, 4],[0, -3]])
>>> b = np.array([[100, 200], [300, 400]])
>>> np.where(a > 0, a, b)
array([[6, 4], [300, 400]])

Again, I can't explain why they added the nonzero functionality to where, but this at least explains how the two are different.

EDIT: Fixed the first example... my logic was incorrect previously

24,722

Author by

Amelio Vazquez-Reina

I'm passionate about people, technology and research. Some of my favorite quotes: "Far better an approximate answer to the right question than an exact answer to the wrong question" -- J. Tukey, 1962. "Your title makes you a manager, your people make you a leader" -- Donna Dubinsky, quoted in "Trillion Dollar Coach", 2019.

Updated on July 09, 2022

Comments

Amelio Vazquez-Reina almost 2 years
In Numpy, nonzero(a), where(a) and argwhere(a), with a being a numpy array, all seem to return the non-zero indices of the array. What are the differences between these three calls?
- On argwhere the documentation says:
  
  np.argwhere(a) is the same as np.transpose(np.nonzero(a)).
  
  Why have a whole function that just transposes the output of nonzero ? When would that be so useful that it deserves a separate function?
- What about the difference between where(a) and nonzero(a)? Wouldn't they return the exact same result?
Sam about 11 years

I don't understand your last few statements. np.nonzero(a) returns a tuple, so mask.T is not allowed. mask[:,0] similarly does not work.
mgilson about 11 years

@Sam -- You're right. Sorry about that. (I was wrong about what it actually returns). The point however is the same. np.argnonzero is nice to get the indices which are not zero.
user2357112 over 7 years

"This can be lighter-weight than creating an entire boolean mask if the 0's are sparse" - but you already have to create that entire boolean mask to feed it to nonzero.
user2357112 over 7 years

(Also, I think you mixed up zero elements and nonzero elements.)
mgilson over 7 years

@user2357112 So ... It's hard to say exactly what I meant 3 years ago when I wrote this, but I'm guessing that indexing via an array with only a few elements will be faster than indexing using a mask that has n elements (where n is the size of the flattened array being indexed). Also don't buy into the example too much -- What if you already had a suitable mask array sitting around?
mgilson over 7 years

@user2357112 -- Also, I'm not 100% sure I'm following your next comment either. Having a (non-boolean) array that is useful for picking out the non-zero elements makes it hard to get the 0 elements. However, boolean array that can pick out the non-zero elements can easily be inverted (~) to pick out the 0 elements -- Though, I could be missing something. I wrote this 3 years ago when I worked with numpy a lot more than I have been recently ...
user2357112 over 7 years

If you already have a suitable mask lying around, feeding it to nonzero is still going to be more expensive than not feeding it to nonzero. As for the next comment, the output of nonzero is small if the nonzero elements are sparse, not if the zeros are sparse. Similarly, the last paragraph sounds like it's talking about the difference between the use cases of argwhere and where, with the thing about finding 0 elements being a mistake instead of deliberate.