Count number of non-NaN entries in every column of Dataframe

121,009

Solution 1

The count() method returns the number of non-NaN values in each column:

>>> df1.count()
a    3
b    2
d    1
dtype: int64

Similarly, count(axis=1) returns the number of non-NaN values in each row.

Solution 2

If you want to sum the total count values which are not NAN, one can do;

np.sum(df.count())

Solution 3

In case you are dealing with empty strings you may want to count them as NA as well :

df.replace('', np.nan).count()

or if you also want to remove blank strings :

df.replace(r'^\s*$', np.nan, regex=True).count()
Share:
121,009

Related videos on Youtube

cryp
Author by

cryp

Updated on February 18, 2021

Comments

  • cryp
    cryp about 3 years

    I have a really big DataFrame and I was wondering if there was short (one or two liner) way to get the a count of non-NaN entries in a DataFrame. I don't want to do this one column at a time as I have close to 1000 columns.

    df1 = pd.DataFrame([(1,2,None),(None,4,None),(5,None,7),(5,None,None)], 
                        columns=['a','b','d'], index = ['A', 'B','C','D'])
    
        a   b   d
    A   1   2 NaN
    B NaN   4 NaN
    C   5 NaN   7
    D   5 NaN NaN
    

    Output:

    a: 3
    b: 2
    d: 1
    
    • cryp
      cryp about 9 years
      df1[df1.notnull()].count() this seem to have worked
    • Alex Riley
      Alex Riley about 9 years
      The extra indexing with df1.notnull() is not necessary since count ignores null values anyway.
    • smci
      smci over 7 years
      Unlike series.value_counts(..., dropna=False), there is no option on df.count() to directly get NA counts.
  • DISC-O
    DISC-O almost 3 years
    I dont believe that works if the column has strings
  • Alex Riley
    Alex Riley almost 3 years
    @DISC-O: just tried and it works for me (pandas version 1.2.1). E.g. df = pd.DataFrame({"a": ["x", np.nan, "z"]}) then df.count() produces the expected value 2. Do you have an example where this method does not work?
  • DISC-O
    DISC-O almost 3 years
    yes, if you manually create a df and place the np.nan it could work I guess. But that is not how you typically create your columns. One often used way, by me at least is: df['C'] =np.where(df.A>df.B,'some text',np.nan). This turns the np.nan into 'nan' and is no longer recognized as nan.