Removing nan values from an array

594,889

Solution 1

If you're using numpy for your arrays, you can also use

x = x[numpy.logical_not(numpy.isnan(x))]

Equivalently

x = x[~numpy.isnan(x)]

[Thanks to chbrown for the added shorthand]

Explanation

The inner function, numpy.isnan returns a boolean/logical array which has the value True everywhere that x is not-a-number. As we want the opposite, we use the logical-not operator, ~ to get an array with Trues everywhere that x is a valid number.

Lastly we use this logical array to index into the original array x, to retrieve just the non-NaN values.

Solution 2

filter(lambda v: v==v, x)

works both for lists and numpy array since v!=v only for NaN

Solution 3

Try this:

import math
print [value for value in x if not math.isnan(value)]

For more, read on List Comprehensions.

Solution 4

For me the answer by @jmetz didn't work, however using pandas isnull() did.

x = x[~pd.isnull(x)]

Solution 5

@jmetz's answer is probably the one most people need; however it yields a one-dimensional array, e.g. making it unusable to remove entire rows or columns in matrices.

To do so, one should reduce the logical array to one dimension, then index the target array. For instance, the following will remove rows which have at least one NaN value:

x = x[~numpy.isnan(x).any(axis=1)]

See more detail here.

Share:
594,889

Related videos on Youtube

Dax Feliz
Author by

Dax Feliz

My name is Dax. I am an aspiring astrophysicist learning how to program for data purposes.

Updated on July 20, 2022

Comments

  • Dax Feliz
    Dax Feliz almost 2 years

    I want to figure out how to remove nan values from my array. My array looks something like this:

    x = [1400, 1500, 1600, nan, nan, nan ,1700] #Not in this exact configuration
    

    How can I remove the nan values from x?

    • smci
      smci almost 5 years
      To be clear, by "remove NaNs" you mean filter out only the subset of non-null values. Not "fill the NaNs with some value (zero, constant, mean, median, etc.)"
  • Miki Tebeka
    Miki Tebeka almost 12 years
    Or x = x[numpy.isfinite(x)]
  • jmetz
    jmetz almost 12 years
    If you're using numpy both my answer and that by @lazy1 are almost an order of magnitude faster than the list comprehension - lazy1's solution is slightly faster (though technically will also not return any infinity values).
  • chbrown
    chbrown over 10 years
    Or x = x[~numpy.isnan(x)], which is equivalent to mutzmatron's original answer, but shorter. In case you want to keep your infinities around, know that numpy.isfinite(numpy.inf) == False, of course, but ~numpy.isnan(numpy.inf) == True.
  • jmetz
    jmetz over 10 years
    @dax-felizv I agree with @chbrown, NaN and Infinite are not the same in numpy. @chbrown - thanks for pointing out the shorthand for logical_not, though beware that it is considerably slower - stackoverflow.com/questions/15998188/…, stackoverflow.com/questions/13600988/…
  • chbrown
    chbrown over 10 years
    Hmm, @mutzmatron -- I figured they did the same thing underneath the hood, and I'm getting very similar results with timeit (as did @unutbu at that first link): python -m timeit -s "import numpy; bools = numpy.random.uniform(size=10000) >= 0.5" "numpy.logical_not(bools)" vs. python -m timeit -s "import numpy; bools = numpy.random.uniform(size=10000) >= 0.5" "~bools" (numpy.__version__ == '1.8.0')
  • jmetz
    jmetz over 10 years
    @chbrown - you're right, any performance gain with numpy seems to have only occurred on the second posters machine - I tested numpy.invert and numpy.logical_not and got the same result for both as for ~, on numpy v1.7.1. Not sure if architecture affects comparative performance - am testing on my chromebook (armv7l).
  • Austin Richardson
    Austin Richardson almost 9 years
    A hack but an especially useful one in the case where you are filtering nans from an array of objects with mixed types, such as a strings and nans.
  • jmetz
    jmetz about 7 years
    This is strange; according to the docs, boolean array indexing (which this is), is under advanced indexing which apparently "always returns a copy of the data", so you should be over-writing x with the new value (i.e. without the NaNs...). Can you provide any more info as to why this could be happening?
  • Pier Paolo
    Pier Paolo almost 7 years
    Welcome to SO! The solution you propose does not answer the problem: your solution substitutes NaNs with a large number, while the OP asked to entirely remove the elements.
  • BoltzmannBrain
    BoltzmannBrain over 6 years
    For people looking to solve this with an ndarray and maintain the dimensions, use numpy where: np.where(np.isfinite(x), x, 0)
  • Moondra
    Moondra over 6 years
    Very clean solution.
  • hypers
    hypers over 6 years
    Don't forget the brackets :) print ([value for value in x if not math.isnan(value)])
  • towry
    towry almost 6 years
    TypeError: only integer scalar arrays can be converted to a scalar index
  • jmetz
    jmetz almost 6 years
    @towry: this is happening because your input, x is not a numpy array. If you want to use logical indexing, it must be an array - e.g. x = np.array(x)
  • Chris_Rands
    Chris_Rands almost 6 years
    This might seem clever, but if obscures the logic and theoretically other objects (such as custom classes) can also have this property
  • yeliabsalohcin
    yeliabsalohcin over 5 years
    If you're using numpy like the top answer then you can use this list comprehension answer with the np package: So returns your list without the nans: [value for value in x if not np.isnan(value)]
  • Dark
    Dark about 4 years
    Also, to completely remove the non-finite rows, use .any(axis=1). The full code will be x=x[~pd.isnull(x).any(axis=1)] for Pandas or x=x[~np.isnan(x).any(axis=1)]for Numpy. Note that these are working on different type of variables.
  • jmetz
    jmetz about 4 years
    @Dark - thanks for the useful example for 2d data, though it's beyond the scope of the OP's question which relates only to a 1d input. Perhaps it would be useful for others posted as a separate Q and A?
  • Christian O'Reilly
    Christian O'Reilly almost 4 years
    Also useful because it only needs x to be specified once as opposed to solutions of the type x[~numpy.isnan(x)]. This is convenient when x is defined by a long expression and you don't want to clutter the code by creating a temporary variable to store the result of this long expression.
  • smm
    smm over 3 years
    It might be slow compere to x[~numpy.isnan(x)]
  • seralouk
    seralouk over 3 years
    x = x[~numpy.isnan(x)] is beautiful
  • Darren Weber
    Darren Weber about 2 years
    Similarly, as a list comprehension, e.g. [v for v in var if v == v]
  • Darren Weber
    Darren Weber about 2 years
    This can avoid TypeError: ufunc 'isnan' not supported for the input types when the var contains mixtures of nan and strings, as noted by @AustinRichardson