Difference between nonzero(a), where(a) and argwhere(a). When to use which?
Solution 1
nonzero
and argwhere
both give you information about where in the array the elements are True
. where
works the same as nonzero
in the form you have posted, but it has a second form:
np.where(mask,a,b)
which can be roughly thought of as a numpy "ufunc" version of the conditional expression:
a[i] if mask[i] else b[i]
(with appropriate broadcasting of a
and b
).
As far as having both nonzero
and argwhere
, they're conceptually different. nonzero
is structured to return an object which can be used for indexing. This can be lighter-weight than creating an entire boolean mask if the 0's are sparse:
mask = a == 0 # entire array of bools
mask = np.nonzero(a)
Now you can use that mask to index other arrays, etc. However, as it is, it's not very nice conceptually to figure out which indices correspond to 0 elements. That's where argwhere
comes in.
Solution 2
I can't comment on the usefulness of having a separate convenience function that transposes the result of another, but I can comment on where
vs nonzero
. In it's simplest use case, where
is indeed the same as nonzero
.
>>> np.where(np.array([[0,4],[4,0]]))
(array([0, 1]), array([1, 0]))
>>> np.nonzero(np.array([[0,4],[4,0]]))
(array([0, 1]), array([1, 0]))
or
>>> a = np.array([[1, 2],[3, 4]])
>>> np.where(a == 3)
(array([1, 0]),)
>>> np.nonzero(a == 3)
(array([1, 0]),)
where
is different from nonzero
in the case when you wish to pick elements of from array a
if some condition is True
and from array b
when that condition is False
.
>>> a = np.array([[6, 4],[0, -3]])
>>> b = np.array([[100, 200], [300, 400]])
>>> np.where(a > 0, a, b)
array([[6, 4], [300, 400]])
Again, I can't explain why they added the nonzero
functionality to where
, but this at least explains how the two are different.
EDIT: Fixed the first example... my logic was incorrect previously
![Amelio Vazquez-Reina](https://i.stack.imgur.com/ilsZ4.jpg?s=256&g=1)
Amelio Vazquez-Reina
I'm passionate about people, technology and research. Some of my favorite quotes: "Far better an approximate answer to the right question than an exact answer to the wrong question" -- J. Tukey, 1962. "Your title makes you a manager, your people make you a leader" -- Donna Dubinsky, quoted in "Trillion Dollar Coach", 2019.
Updated on July 09, 2022Comments
-
Amelio Vazquez-Reina almost 2 years
In Numpy,
nonzero(a)
,where(a)
andargwhere(a)
, witha
being a numpy array, all seem to return the non-zero indices of the array. What are the differences between these three calls?-
On
argwhere
the documentation says:np.argwhere(a)
is the same asnp.transpose(np.nonzero(a))
.Why have a whole function that just transposes the output of
nonzero
? When would that be so useful that it deserves a separate function? What about the difference between
where(a)
andnonzero(a)
? Wouldn't they return the exact same result?
-
-
Sam about 11 yearsI don't understand your last few statements.
np.nonzero(a)
returns a tuple, somask.T
is not allowed.mask[:,0]
similarly does not work. -
mgilson about 11 years@Sam -- You're right. Sorry about that. (I was wrong about what it actually returns). The point however is the same.
np.argnonzero
is nice to get the indices which are not zero. -
user2357112 over 7 years"This can be lighter-weight than creating an entire boolean mask if the 0's are sparse" - but you already have to create that entire boolean mask to feed it to
nonzero
. -
user2357112 over 7 years(Also, I think you mixed up zero elements and nonzero elements.)
-
mgilson over 7 years@user2357112 So ... It's hard to say exactly what I meant 3 years ago when I wrote this, but I'm guessing that indexing via an array with only a few elements will be faster than indexing using a mask that has
n
elements (wheren
is the size of the flattened array being indexed). Also don't buy into the example too much -- What if you already had a suitablemask
array sitting around? -
mgilson over 7 years@user2357112 -- Also, I'm not 100% sure I'm following your next comment either. Having a (non-boolean) array that is useful for picking out the non-zero elements makes it hard to get the
0
elements. However, boolean array that can pick out the non-zero elements can easily be inverted (~
) to pick out the 0 elements -- Though, I could be missing something. I wrote this 3 years ago when I worked withnumpy
a lot more than I have been recently ... -
user2357112 over 7 yearsIf you already have a suitable
mask
lying around, feeding it tononzero
is still going to be more expensive than not feeding it tononzero
. As for the next comment, the output ofnonzero
is small if the nonzero elements are sparse, not if the zeros are sparse. Similarly, the last paragraph sounds like it's talking about the difference between the use cases ofargwhere
andwhere
, with the thing about finding 0 elements being a mistake instead of deliberate.