how to apply a function to multiple columns in a pandas dataframe at one time
Solution 1
You can do df[['Col1', 'Col2', 'Col3']].applymap(format_number)
. Note, though that this will return new columns; it won't modify the existing DataFrame. If you want to put the values back in the original, you'll have to do df[['Col1', 'Col2', 'Col3']] = df[['Col1', 'Col2', 'Col3']].applymap(format_number)
.
Solution 2
You could use apply
like this:
df.apply(lambda row: format_number(row), axis=1)
You would need to specify the columns though in your format_number
function:
def format_number(row):
row['Col1'] = doSomething(row['Col1']
row['Col2'] = doSomething(row['Col2'])
row['Col3'] = doSomething(row['Col3'])
This is not as elegant as @BrenBarn's answer but it has an advantage that the dataframe is modified in place so you don't need to assign the columns back again
Related videos on Youtube
yoshiserry
Updated on September 24, 2022Comments
-
yoshiserry over 1 year
I frequently deal with data which is poorly formatted (I.e. number fields are not consistent etc)
There may be other ways, which I am not aware of but the way I format a single column in a dataframe is by using a function and mapping the column to that function.
format = df.column_name.map(format_number)
Question: 1 - what if I have a dataframe with 50 columns, and want to apply that formatting to multiple columns, etc column 1, 3, 5, 7, 9,
Can you go:
format = df.1,3,5,9.map(format_number)
.. This way I could format all my number columns in one line?
-
yoshiserry about 10 yearscan you refer to the columns by number instead of name as well when doing that? Is there also any way to programatically create that string (which would change depending on the number of columns you had) and apply the format_number function? I.e. the above would work fine if I knew exactly how many columns were in the sheet every time, but If I didn't know the number of columns, and wanted to apply the same function to every column, is there a better way of doing it?
-
BrenBarn about 10 years@yoshiserry: If you just want to apply it to all the columns, just do
df.applymap(format_number)
. -
yoshiserry about 10 yearscould you explain what does it mean that it is modified in place, I'm not sure how that differs from the above solution, given I haven't grasped what lamda does?
-
EdChum about 10 years@yoshiserry ignoring my code example, if you perform
apply
to a dataframe then the dataframe itself is modified by any changes in your function so you would not need to assign to the column, you may still need to depending on what your function is doing. The point being that you just need to calldf.apply
and not need to say dodf[['col1','col2','col3']]=df.apply(lambda row: format_number(row), axis=1))
, in my code the assignment is done by theformat_number
function so I guess the assignment is implicit rather than explicit like BrenBarn's answer -
yoshiserry about 10 yearsgreat, and so what exactly is the function of Lamda? I still haven't grasped what lamda does?
-
EdChum about 10 years@yoshiserry there are lots of articles online such as pythonconquerstheuniverse.wordpress.com/2011/08/29/…, stackoverflow.com/questions/890128/python-lambda-why and secnetix.de/olli/Python/lambda_functions.hawk. Essentially it allows you to define an anonymous function so you don't have to define a special function or expression. In this case it captures the row and passes this as an arguement to format_number, it could easily have been
df.apply(lambda x: x **2)
to square the value