replace zeroes in numpy array with the median value

76,306

Solution 1

This solution takes advantage of numpy.median:

import numpy as np
foo_array = [38,26,14,55,31,0,15,8,0,0,0,18,40,27,3,19,0,49,29,21,5,38,29,17,16]
foo = np.array(foo_array)
# Compute the median of the non-zero elements
m = np.median(foo[foo > 0])
# Assign the median to the zero elements 
foo[foo == 0] = m

Just a note of caution, the median for your array (with no zeroes) is 23.5 but as written this sticks in 23.

Solution 2

foo2 = foo[:]
foo2[foo2 == 0] = nz_values[middle]

Instead of foo2, you could just update foo if you want. Numpy's smart array syntax can combine a few lines of the code you made. For example, instead of,

nonzero_values = foo[0::] > 0
nz_values = foo[nonzero_values]

You can just do

nz_values = foo[foo > 0]

You can find out more about "fancy indexing" in the documentation.

Share:
76,306
slashdottir
Author by

slashdottir

a time for cats a time for lulz a time for trolls a time to bait trolls a time to gather cats together

Updated on February 16, 2020

Comments

  • slashdottir
    slashdottir about 4 years

    I have a numpy array like this:

    foo_array = [38,26,14,55,31,0,15,8,0,0,0,18,40,27,3,19,0,49,29,21,5,38,29,17,16]
    

    I want to replace all the zeros with the median value of the whole array (where the zero values are not to be included in the calculation of the median)

    So far I have this going on:

    foo_array = [38,26,14,55,31,0,15,8,0,0,0,18,40,27,3,19,0,49,29,21,5,38,29,17,16]
    foo = np.array(foo_array)
    foo = np.sort(foo)
    print "foo sorted:",foo
    #foo sorted: [ 0  0  0  0  0  3  5  8 14 15 16 17 18 19 21 26 27 29 29 31 38 38 40 49 55]
    nonzero_values = foo[0::] > 0
    nz_values = foo[nonzero_values]
    print "nonzero_values?:",nz_values
    #nonzero_values?: [ 3  5  8 14 15 16 17 18 19 21 26 27 29 29 31 38 38 40 49 55]
    size = np.size(nz_values)
    middle = size / 2
    print "median is:",nz_values[middle]
    #median is: 26
    

    Is there a clever way to achieve this with numpy syntax?

    Thank you