Python: use a function in pandas lambda expression

18,351

Solution 1

You are trying to use find_hour before it has yet been defined. You just need to switch things around:

def find_hour(self, input):
    return input[11:13].astype(float)

print(df['Dates'].head(3))
df['hour'] = df.apply(lambda x: find_hour(x['Dates']), axis=1)

Edit: Padraic has pointed out a very important point: find_hour() is defined as taking two arguments, self and input, but you are giving it only one. You should define find_hour() as def find_hour(input): except that defining the argument as input shadows the built-in function. You might consider renaming it to something a little more descriptive.

Solution 2

what is wrong with old good .dt.hour?

In [202]: df
Out[202]:
                 Date
0 2015-05-13 23:53:00
1 2015-05-13 23:53:00
2 2015-05-13 23:33:00

In [217]: df['hour'] = df.Date.dt.hour

In [218]: df
Out[218]:
                 Date  hour
0 2015-05-13 23:53:00    23
1 2015-05-13 23:53:00    23
2 2015-05-13 23:33:00    23

and if your Date column is of string type you may want to convert it to datetime first:

df.Date = pd.to_datetime(df.Date)

or just:

df['hour'] = int(df.Date.str[11:13])
Share:
18,351

Related videos on Youtube

Edamame
Author by

Edamame

Updated on July 09, 2022

Comments

  • Edamame
    Edamame almost 2 years

    I have the following code, trying to find the hour of the 'Dates' column in a data frame:

    print(df['Dates'].head(3))
    df['hour'] = df.apply(lambda x: find_hour(x['Dates']), axis=1)
    
    def find_hour(self, input):
        return input[11:13].astype(float)
    

    where the print(df['Dates'].head(3)) looks like:

    0    2015-05-13 23:53:00
    1    2015-05-13 23:53:00
    2    2015-05-13 23:33:00
    

    However, I got the following error:

        df['hour'] = df.apply(lambda x: find_hour(x['Dates']), axis=1)
    NameError: ("global name 'find_hour' is not defined", u'occurred at index 0')
    

    Does anyone know what I missed? Thanks!


    Note that if I put the function directly in the lambda line like below, everything works fine:

    df['hour'] = df.apply(lambda x: x['Dates'][11:13], axis=1).astype(float)