What's the fastest way to threshold a numpy array?

10,448

Solution 1

Instead of looping, you can compare the entire array at once in several ways. Starting from

>>> arr = np.random.randint(0, 255, (3,3))
>>> brightest = arr.max()
>>> threshold = brightest // 2
>>> arr
array([[214, 151, 216],
       [206,  10, 162],
       [176,  99, 229]])
>>> brightest
229
>>> threshold
114

Method #1: use np.where:

>>> np.where(arr > threshold, 255, 0)
array([[255, 255, 255],
       [255,   0, 255],
       [255,   0, 255]])

Method #2: use boolean indexing to create a new array

>>> up = arr > threshold
>>> new_arr = np.zeros_like(arr)
>>> new_arr[up] = 255

Method #3: do the same, but use an arithmetic hack

>>> (arr > threshold) * 255
array([[255, 255, 255],
       [255,   0, 255],
       [255,   0, 255]])

which works because False == 0 and True == 1.


For a 1000x1000 array, it looks like the arithmetic hack is fastest for me, but to be honest I'd use np.where because I think it's clearest:

>>> %timeit np.where(arr > threshold, 255, 0)
100 loops, best of 3: 12.3 ms per loop
>>> %timeit up = arr > threshold; new_arr = np.zeros_like(arr); new_arr[up] = 255;
100 loops, best of 3: 14.2 ms per loop
>>> %timeit (arr > threshold) * 255
100 loops, best of 3: 6.05 ms per loop

Solution 2

I'm not sure if your tresholding operation is special, e.g. need to customize it for every pixel or something, but you can just use logical operation on a np.arrays. For example:

import numpy as np


a = np.round(np.random.rand(5,5)*255)

thresholded_array = a > 100; #<-- tresholding on 100 value

print(a)
print(thresholded_array)

Gives:

[[ 238.  201.  165.  111.  127.]
 [ 188.   55.  157.  121.  129.]
 [ 220.  127.  231.   75.   23.]
 [  76.   67.   75.  141.   96.]
 [ 228.   94.  172.   26.  195.]]

[[ True  True  True  True  True]
 [ True False  True  True  True]
 [ True  True  True False False]
 [False False False  True False]
 [ True False  True False  True]]
Share:
10,448
El Confuso
Author by

El Confuso

Updated on July 27, 2022

Comments

  • El Confuso
    El Confuso over 1 year

    I want the resulting array as a binary yes/no.

    I came up with

        img = PIL.Image.open(filename)
    
        array = numpy.array(img)
        thresholded_array = numpy.copy(array)
    
        brightest = numpy.amax(array)
        threshold = brightest/2
    
        for b in xrange(490):
            for c in xrange(490):
                if array[b][c] > threshold:
                    thresholded_array[b][c] = 255
                else:
                    thresholded_array[b][c] = 0
    
        out=PIL.Image.fromarray(thresholded_array)
    

    but iterating over the array one value at a time is very very slow and I know there must be a faster way, what's the fastest?