how to apply a function to multiple columns in a pandas dataframe at one time

14,753

Solution 1

You can do df[['Col1', 'Col2', 'Col3']].applymap(format_number). Note, though that this will return new columns; it won't modify the existing DataFrame. If you want to put the values back in the original, you'll have to do df[['Col1', 'Col2', 'Col3']] = df[['Col1', 'Col2', 'Col3']].applymap(format_number).

Solution 2

You could use apply like this:

df.apply(lambda row: format_number(row), axis=1)

You would need to specify the columns though in your format_number function:

def format_number(row):
    row['Col1'] = doSomething(row['Col1']
    row['Col2'] = doSomething(row['Col2'])
    row['Col3'] = doSomething(row['Col3'])

This is not as elegant as @BrenBarn's answer but it has an advantage that the dataframe is modified in place so you don't need to assign the columns back again

Share:
14,753

Related videos on Youtube

yoshiserry
Author by

yoshiserry

Updated on September 24, 2022

Comments

  • yoshiserry
    yoshiserry over 1 year

    I frequently deal with data which is poorly formatted (I.e. number fields are not consistent etc)

    There may be other ways, which I am not aware of but the way I format a single column in a dataframe is by using a function and mapping the column to that function.

    format = df.column_name.map(format_number)
    

    Question: 1 - what if I have a dataframe with 50 columns, and want to apply that formatting to multiple columns, etc column 1, 3, 5, 7, 9,

    Can you go:

    format = df.1,3,5,9.map(format_number)
    

    .. This way I could format all my number columns in one line?

  • yoshiserry
    yoshiserry about 10 years
    can you refer to the columns by number instead of name as well when doing that? Is there also any way to programatically create that string (which would change depending on the number of columns you had) and apply the format_number function? I.e. the above would work fine if I knew exactly how many columns were in the sheet every time, but If I didn't know the number of columns, and wanted to apply the same function to every column, is there a better way of doing it?
  • BrenBarn
    BrenBarn about 10 years
    @yoshiserry: If you just want to apply it to all the columns, just do df.applymap(format_number).
  • yoshiserry
    yoshiserry about 10 years
    could you explain what does it mean that it is modified in place, I'm not sure how that differs from the above solution, given I haven't grasped what lamda does?
  • EdChum
    EdChum about 10 years
    @yoshiserry ignoring my code example, if you perform apply to a dataframe then the dataframe itself is modified by any changes in your function so you would not need to assign to the column, you may still need to depending on what your function is doing. The point being that you just need to call df.apply and not need to say do df[['col1','col2','col3']]=df.apply(lambda row: format_number(row), axis=1)), in my code the assignment is done by the format_number function so I guess the assignment is implicit rather than explicit like BrenBarn's answer
  • yoshiserry
    yoshiserry about 10 years
    great, and so what exactly is the function of Lamda? I still haven't grasped what lamda does?
  • EdChum
    EdChum about 10 years
    @yoshiserry there are lots of articles online such as pythonconquerstheuniverse.wordpress.com/2011/08/29/…, stackoverflow.com/questions/890128/python-lambda-why and secnetix.de/olli/Python/lambda_functions.hawk. Essentially it allows you to define an anonymous function so you don't have to define a special function or expression. In this case it captures the row and passes this as an arguement to format_number, it could easily have been df.apply(lambda x: x **2) to square the value