Replacing newlines with spaces for str columns through pandas dataframe

12,748

Solution 1

Use replace - first first and last strip and then replace \n:

df = df.replace({r'\s+$': '', r'^\s+': ''}, regex=True).replace(r'\n',  ' ', regex=True)
print (df)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

Solution 2

You can select_dtypes to select columns of type object and use applymap on those columns.

Because there is no inplace argument for these functions, this would be a workaround to make change to the dataframe:

strs = lol.select_dtypes(include=['object']).applymap(lambda x: x.replace('\n', ' ').strip())
lol[strs.columns] = strs
lol
#   0  1         2        3
#0  1  2       abc  foo bar
#1  3  1  def haha  love it

Solution 3

Adding to the other nice answers, this is a vectorized version of your initial idea:

columns = [2,3] 
df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') 
                       for col in columns] 

Details:

In [49]: df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') 
                                 for col in columns]  

In [50]: df
Out[50]: 
   0  1        2         3
0  1  2      abc  def haha
1  3  1  foo bar   love it
Share:
12,748
alvas
Author by

alvas

食飽未?

Updated on June 08, 2022

Comments

  • alvas
    alvas almost 2 years

    Given an example dataframe with the 2nd and 3rd columns of free text, e.g.

    >>> import pandas as pd
    >>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']]
    >>> pd.DataFrame(lol)
       0  1          2          3
    0  1  2        abc   foo\nbar
    1  3  1  def\nhaha  love it\n
    

    The goal is to replace the \n to (whitespace) and strip the string in column 2 and 3 to achieve:

    >>> pd.DataFrame(lol)
       0  1         2        3
    0  1  2       abc  foo bar
    1  3  1  def haha  love it
    

    How to replace newlines with spaces for specific columns through pandas dataframe?

    I have tried this:

    >>> import pandas as pd
    >>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']]
    
    >>> replace_and_strip = lambda x: x.replace('\n', ' ').strip()
    
    >>> lol2 = [[replace_and_strip(col) if type(col) == str else col for col in list(row)] for idx, row in pd.DataFrame(lol).iterrows()]
    
    >>> pd.DataFrame(lol2)
       0  1         2        3
    0  1  2       abc  foo bar
    1  3  1  def haha  love it
    

    But there must be a better/simpler way.