How to count nan values in a pandas DataFrame?

75,558

Solution 1

If you want to count only NaN values in column 'a' of a DataFrame df, use:

len(df) - df['a'].count()

Here count() tells us the number of non-NaN values, and this is subtracted from the total number of values (given by len(df)).

To count NaN values in every column of df, use:

len(df) - df.count()

If you want to use value_counts, tell it not to drop NaN values by setting dropna=False (added in 0.14.1):

dfv = dfd['a'].value_counts(dropna=False)

This allows the missing values in the column to be counted too:

 3     3
NaN    2
 1     1
Name: a, dtype: int64

The rest of your code should then work as you expect (note that it's not necessary to call sum; just print("nan: %d" % dfv[np.nan]) suffices).

Solution 2

To count just null values, you can use isnull():

In [11]:
dfd.isnull().sum()

Out[11]:
a    2
dtype: int64

Here a is the column name, and there are 2 occurrences of the null value in the column.

Solution 3

A good clean way to count all NaN's in all columns of your dataframe would be ...

import pandas as pd 
import numpy as np


df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})
print(df.isna().sum().sum())

Using a single sum, you get the count of NaN's for each column. The second sum, sums those column sums.

Solution 4

if you only want the summary of null value for each column, using the following code df.isnull().sum() if you want to know how many null values in the data frame using following code df.isnull().sum().sum() # calculate total

Solution 5

dfd['a'].isnull().value_counts()

return :

  • (True 695
  • False 60,
  • Name: a, dtype: int64)
  • True : represents the null values count
  • False : represent the non-null values count
Share:
75,558

Related videos on Youtube

SpeedCoder5
Author by

SpeedCoder5

Code faster than Racer X.

Updated on July 09, 2022

Comments

  • SpeedCoder5
    SpeedCoder5 almost 2 years

    What is the best way to account for (not a number) nan values in a pandas DataFrame?

    The following code:

    import numpy as np
    import pandas as pd
    dfd = pd.DataFrame([1, np.nan, 3, 3, 3, np.nan], columns=['a'])
    dfv = dfd.a.value_counts().sort_index()
    print("nan: %d" % dfv[np.nan].sum())
    print("1: %d" % dfv[1].sum())
    print("3: %d" % dfv[3].sum())
    print("total: %d" % dfv[:].sum())
    

    Outputs:

    nan: 0
    1: 1
    3: 3
    total: 4
    

    While the desired output is:

    nan: 2
    1: 1
    3: 3
    total: 6
    

    I am using pandas 0.17 with Python 3.5.0 with Anaconda 2.4.0.

  • SpeedCoder5
    SpeedCoder5 over 8 years
    And after using the method above dfv.values.sum() Counts all the values, i.e. 6 Thanks. ;)
  • Alex Riley
    Alex Riley over 8 years
    No problem! Yep, that works. In fact, you could just write dfv.sum() to count all the values. Or even more efficiently, just check len(dfd).
  • Quastiat
    Quastiat over 4 years
    this is the easier approach
  • help-info.de
    help-info.de over 3 years
    Welcome to Stack Overflow. Before answering an old question having an accepted answer (look for the green ✓) as well as other answers ensure your answer adds something new or is otherwise helpful in relation to them. Here is a guide on How to Answer.