To extract non-nan values from multiple rows in a pandas dataframe

24,511

Solution 1

df.ix[1:6].dropna(axis=1)

As a heads up, irow will be deprecated in the next release of pandas. New methods, with clearer usage, replace it.

http://pandas.pydata.org/pandas-docs/dev/indexing.html#deprecations

Solution 2

In 0.11 (0.11rc1 is out now), this is very easy using .iloc to first select the first 6 rows, then dropna drops any row with a nan (you can also pass some options to dropna to control exactly which columns you want considered)

I realized you want 1:6, I did 0:6 in my answer....

In [8]: df = DataFrame(randn(10,3),columns=list('ABC'),index=date_range('20130101',periods=10))

In [9]: df.ix[6,'A'] = np.nan

In [10]: df.ix[6,'B'] = np.nan

In [11]: df.ix[2,'A'] = np.nan

In [12]: df.ix[4,'B'] = np.nan

In [13]: df.iloc[0:6]
Out[13]: 
                   A         B         C
2013-01-01  0.442692 -0.109415 -0.038182
2013-01-02  1.217950  0.006681 -0.067752
2013-01-03       NaN -0.336814 -1.771431
2013-01-04 -0.655948  0.484234  1.313306
2013-01-05  0.096433       NaN  1.658917
2013-01-06  1.274731  1.909123 -0.289111

In [14]: df.iloc[0:6].dropna()
Out[14]: 
                   A         B         C
2013-01-01  0.442692 -0.109415 -0.038182
2013-01-02  1.217950  0.006681 -0.067752
2013-01-04 -0.655948  0.484234  1.313306
2013-01-06  1.274731  1.909123 -0.289111
Share:
24,511
user2179627
Author by

user2179627

Updated on November 02, 2020

Comments

  • user2179627
    user2179627 over 3 years

    I am working on several taxi datasets. I have used pandas to concat all the dataset into a single dataframe.

    My dataframe looks something like this.

                         675                       1039                #and rest 125 taxis
                         longitude     latitude    longitude    latitude
    date
    2008-02-02 13:31:21  116.56359  40.06489       Nan          Nan
    2008-02-02 13:31:51  116.56486  40.06415       Nan          Nan
    2008-02-02 13:32:21  116.56855  40.06352       116.58243    39.6313
    2008-02-02 13:32:51  116.57127  40.06324       Nan          Nan
    2008-02-02 13:33:21  116.57120  40.06328       116.55134    39.6313
    2008-02-02 13:33:51  116.57121  40.06329       116.55126    39.6123
    2008-02-02 13:34:21  Nan        Nan            116.55134    39.5123
    

    where 675,1039 are the taxi ids. Basically there are totally 127 taxis having their corresponding latitudes and longitudes columned up.

    I have several ways to extract not-null values for a row.

    df.ix[k,df.columns[np.isnan(df.irow(0))!=1]]
                  (or)
    df.irow(0)[np.isnan(df.irow(0))!=1]
                  (or)
    df.irow(0)[np.where(df.irow(0)[df.columns].notnull())[0]]
    

    any of the above commands will return,

    675   longitude    116.56359
          latitude     40.064890 
    4549  longitude    116.34642
          latitude      39.96662
    Name: 2008-02-02 13:31:21
    

    now i want to extract all the notnull values from first few rows(say from row 1 to row 6).

    how do i do that?

    i can probably loop it up. But i want a non-looped way of doing it.

    Any help, suggestions are welcome. Thanks in adv! :)