Replacing newlines with spaces for str columns through pandas dataframe
12,748
Solution 1
Use replace
- first first and last strip and then replace \n
:
df = df.replace({r'\s+$': '', r'^\s+': ''}, regex=True).replace(r'\n', ' ', regex=True)
print (df)
0 1 2 3
0 1 2 abc foo bar
1 3 1 def haha love it
Solution 2
You can select_dtypes
to select columns of type object
and use applymap
on those columns.
Because there is no inplace
argument for these functions, this would be a workaround to make change to the dataframe:
strs = lol.select_dtypes(include=['object']).applymap(lambda x: x.replace('\n', ' ').strip())
lol[strs.columns] = strs
lol
# 0 1 2 3
#0 1 2 abc foo bar
#1 3 1 def haha love it
Solution 3
Adding to the other nice answers, this is a vectorized version of your initial idea:
columns = [2,3]
df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ')
for col in columns]
Details:
In [49]: df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ')
for col in columns]
In [50]: df
Out[50]:
0 1 2 3
0 1 2 abc def haha
1 3 1 foo bar love it
Comments
-
alvas almost 2 years
Given an example dataframe with the 2nd and 3rd columns of free text, e.g.
>>> import pandas as pd >>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']] >>> pd.DataFrame(lol) 0 1 2 3 0 1 2 abc foo\nbar 1 3 1 def\nhaha love it\n
The goal is to replace the
\n
to(whitespace) and strip the string in column 2 and 3 to achieve:
>>> pd.DataFrame(lol) 0 1 2 3 0 1 2 abc foo bar 1 3 1 def haha love it
How to replace newlines with spaces for specific columns through pandas dataframe?
I have tried this:
>>> import pandas as pd >>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']] >>> replace_and_strip = lambda x: x.replace('\n', ' ').strip() >>> lol2 = [[replace_and_strip(col) if type(col) == str else col for col in list(row)] for idx, row in pd.DataFrame(lol).iterrows()] >>> pd.DataFrame(lol2) 0 1 2 3 0 1 2 abc foo bar 1 3 1 def haha love it
But there must be a better/simpler way.