What is python's equivalent of R's NA?

51,742

Solution 1

Scikit-learn doesn't handle missing values currently. For most machine learning algorithms, it is unclear how to handle missing values, and so we rely on the user of handling them prior to giving them to the algorithm. Numpy doesn't have a "missing" value. Pandas uses NaN, but inside numeric algorithms that might lead to confusion. It is possible to use masked arrays, but we don't do that in scikit-learn (yet).

Solution 2

nan in numpy is handled well with many functions:

>>> import numpy as np
>>> a = [1, np.nan, 2, 3]
>>> np.nanmean(a)
2.0
>>> np.nansum(a)
6.0
>>> np.isnan(a)
array([False,  True, False, False], dtype=bool)

Solution 3

for pandas take a look at this.

http://pandas.pydata.org/pandas-docs/dev/missing_data.html

pandas uses NaN. You can test for null values using isnull() or not null(), drop them from a data frame using dropna() etc. The equivalent for datetime objects is NaT

Share:
51,742
power
Author by

power

Updated on February 23, 2020

Comments

  • power
    power about 4 years

    What is python's equivalent of R's NA?

    To be more specific: R has NaN, NA, NULL, Inf and -Inf. NA is generally used when there is missing data. What is python's equivalent?

    How libraries such as numpy and pandas handle missing values?

    How does scikit-learn handle missing values?

    Is it different for python 2.7 and python 3?

  • Paul
    Paul about 9 years
    It might be worth noting that integer pandas Series (or column) must have values. There is no way to represent a missing value in an integer series; the usual alternative being to upconvert to a floating point type that has NaN.
  • stidmatt
    stidmatt almost 3 years
    NaN in Pandas is numpy's nan value.