Replace all elements of Python NumPy Array that are greater than some value

510,464

Solution 1

I think both the fastest and most concise way to do this is to use NumPy's built-in Fancy indexing. If you have an ndarray named arr, you can replace all elements >255 with a value x as follows:

arr[arr > 255] = x

I ran this on my machine with a 500 x 500 random matrix, replacing all values >0.5 with 5, and it took an average of 7.59ms.

In [1]: import numpy as np
In [2]: A = np.random.rand(500, 500)
In [3]: timeit A[A > 0.5] = 5
100 loops, best of 3: 7.59 ms per loop

Solution 2

Since you actually want a different array which is arr where arr < 255, and 255 otherwise, this can be done simply:

result = np.minimum(arr, 255)

More generally, for a lower and/or upper bound:

result = np.clip(arr, 0, 255)

If you just want to access the values over 255, or something more complicated, @mtitan8's answer is more general, but np.clip and np.minimum (or np.maximum) are nicer and much faster for your case:

In [292]: timeit np.minimum(a, 255)
100000 loops, best of 3: 19.6 µs per loop

In [293]: %%timeit
   .....: c = np.copy(a)
   .....: c[a>255] = 255
   .....: 
10000 loops, best of 3: 86.6 µs per loop

If you want to do it in-place (i.e., modify arr instead of creating result) you can use the out parameter of np.minimum:

np.minimum(arr, 255, out=arr)

or

np.clip(arr, 0, 255, arr)

(the out= name is optional since the arguments in the same order as the function's definition.)

For in-place modification, the boolean indexing speeds up a lot (without having to make and then modify the copy separately), but is still not as fast as minimum:

In [328]: %%timeit
   .....: a = np.random.randint(0, 300, (100,100))
   .....: np.minimum(a, 255, a)
   .....: 
100000 loops, best of 3: 303 µs per loop

In [329]: %%timeit
   .....: a = np.random.randint(0, 300, (100,100))
   .....: a[a>255] = 255
   .....: 
100000 loops, best of 3: 356 µs per loop

For comparison, if you wanted to restrict your values with a minimum as well as a maximum, without clip you would have to do this twice, with something like

np.minimum(a, 255, a)
np.maximum(a, 0, a)

or,

a[a>255] = 255
a[a<0] = 0

Solution 3

I think you can achieve this the quickest by using the where function:

For example looking for items greater than 0.2 in a numpy array and replacing those with 0:

import numpy as np

nums = np.random.rand(4,3)

print np.where(nums > 0.2, 0, nums)

Solution 4

Another way is to use np.place which does in-place replacement and works with multidimentional arrays:

import numpy as np

# create 2x3 array with numbers 0..5
arr = np.arange(6).reshape(2, 3)

# replace 0 with -10
np.place(arr, arr == 0, -10)

Solution 5

You can consider using numpy.putmask:

np.putmask(arr, arr>=T, 255.0)

Here is a performance comparison with the Numpy's builtin indexing:

In [1]: import numpy as np
In [2]: A = np.random.rand(500, 500)

In [3]: timeit np.putmask(A, A>0.5, 5)
1000 loops, best of 3: 1.34 ms per loop

In [4]: timeit A[A > 0.5] = 5
1000 loops, best of 3: 1.82 ms per loop
Share:
510,464

Related videos on Youtube

NLi10Me
Author by

NLi10Me

Four Gregs! https://www.youtube.com/watch?v=7ex9psLw5sM

Updated on July 08, 2022

Comments

  • NLi10Me
    NLi10Me almost 2 years

    I have a 2D NumPy array and would like to replace all values in it greater than or equal to a threshold T with 255.0. To my knowledge, the most fundamental way would be:

    shape = arr.shape
    result = np.zeros(shape)
    for x in range(0, shape[0]):
        for y in range(0, shape[1]):
            if arr[x, y] >= T:
                result[x, y] = 255
    
    1. What is the most concise and pythonic way to do this?

    2. Is there a faster (possibly less concise and/or less pythonic) way to do this?

    This will be part of a window/level adjustment subroutine for MRI scans of the human head. The 2D numpy array is the image pixel data.

  • askewchan
    askewchan over 10 years
    Note that this modifies the existing array arr, instead of creating a result array as in the OP.
  • NLi10Me
    NLi10Me over 10 years
    Thank you very much for your complete comment, however np.clip and np.minimum do not seem to be what I need in this case, in the OP you see that the threshold T and the replacement value (255) are not necessarily the same number. However I still gave you an up vote for thoroughness. Thanks again.
  • sodiumnitrate
    sodiumnitrate over 8 years
    Is there a way to do this by not modifying A but creating a new array?
  • lavee_singh
    lavee_singh over 8 years
    What would we do, if we wanted to change values at indexes which are multiple of given n, like a[2],a[4],a[6],a[8]..... for n=2?
  • lavee_singh
    lavee_singh over 8 years
    What would we do, if we wanted to change values at indexes which are multiple of given n, like a[2],a[4],a[6],a[8]..... for n=2?
  • askewchan
    askewchan over 8 years
    @lavee_singh, to do that, you can use the third part of the slice, which is usually neglected: a[start:stop:step] gives you the elements of the array from start to stop, but instead of every element, it takes only every step (if neglected, it is 1 by default). So to set all the evens to zero, you could do a[::2] = 0
  • lavee_singh
    lavee_singh over 8 years
    Thanks I needed something, like this, even though I knew it for simple lists, but I didn't know whether or how it works for numpy.array.
  • dreab
    dreab over 7 years
    100 loops, best of 3: 2.22 ms per loop
  • mjp
    mjp about 7 years
    NOTE: this doesn't work if the data is in a python list, it HAS to be in a numpy array (np.array([1,2,3])
  • jonathanking
    jonathanking about 6 years
    This is the solution I used because it was the first I came across. I wonder if there is a big difference between this and the selected answer above. What do you think?
  • Shital Shah
    Shital Shah almost 6 years
    In my very limited tests, my above code with np.place is running 2X slower than accepted answer's method of direct indexing. It's surprising because I would have thought np.place would be more optimized but I guess they have probably put more work on direct indexing.
  • Divyang Vashi
    Divyang Vashi almost 6 years
    @mdml np.place method is the faster than this. timeit A[A>0.5] = 5 :- 1.79 ms ± 6.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) and timeit np.place(A, A>0, 5) :- 732 µs ± 5.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
  • Darcy
    Darcy over 4 years
    Is there a way to modify this if arr includes NaN values?
  • AgentM
    AgentM over 4 years
    is it possible to use this indexing to update every value without condition? I want to do this: array[ ? ] = x, setting every value to x. Secondly, is it possible to do multiple conditions like: array[ ? ] = 255 if array[i] > 127 else 0 I want to optimize my code and am currently using list comprehension which was dramatically slower than this fancy indexing.
  • riyansh.legend
    riyansh.legend almost 4 years
    In my case np.place was also slower compared to the built-in method, although the opposite is claimed in this comment.
  • Debjit Bhowmick
    Debjit Bhowmick almost 4 years
    For not modifying the original array, do a deep copy of the original array. arr2 = arr.copy() and then arr2[arr2 > 255] = x
  • corvus
    corvus over 2 years
    For massive arrays, this solution will likely not be workable as it creates an intermediate array in-memory equal in size to the input array. If you do not have sufficient memory on your system it will fail.
  • Ali_Sh
    Ali_Sh over 2 years
    I have tested the code for when upper limit 0.5 used instead of 5, and indexing was better than np.putmask about two times.
  • Muhammad Yasirroni
    Muhammad Yasirroni over 2 years
    Surprisingly in my investigation, a = np.maximum(a,0) is faster than np.maximum(a,0,out=a).
  • Muhammad Yasirroni
    Muhammad Yasirroni over 2 years
    @askewchan answer of using result = np.minimum(arr, 255) is the best for performance in my test.
  • AndrewJaeyoung
    AndrewJaeyoung about 2 years
    np.where is a great solution, it doesn't mutate the arrays involved, and it's also directly compatible with pandas series objects. Really helped me.