Remove '\n' in text in pandas python

10,136

Considering one wants to apply the changes to the column 'texts', select that column as

df['text']

Then, to achieve that, one might use pandas.DataFrame.replace.

This lets one can pass regular expressions, regex=True, which will interpret both the strings in both lists as regexs (instead of matching them directly).

Picking up on @Wiktor Stribiżew suggestion, the following will do the work

df['text'] = df['text'].replace(r'\s+|\\n', ' ', regex=True) 

This regular expression syntax reference may be of help.

Share:
10,136
Lily
Author by

Lily

Updated on June 15, 2022

Comments

  • Lily
    Lily almost 2 years

    The following code is current code that i use to remove \n in ['text'] column:

    df = pd.read_csv('file1.csv')
    
    df['text'].replace('\s+', ' ', regex=True, inplace=True) # remove extra whitespace
    df['text'].replace('\n',' ', regex=True) # remove \n in text
    
    header = ["text", "word_length", "author"]
    
    df_out = df.to_csv('sn_file1.csv', columns = header, sep=',', encoding='utf-8')
    

    I've tried too from the suggestions:

    df['text'].replace('\n', '')
    df['text'] = df['text'].str.replace('\n', '').str.replace('\s+', ' ').str.strip()
    

    Output: ' What a smartass! \nLike he knows anything about real estate deals too...'

    The code to remove whitespace is working. But not in removing the \n. Anyone can help me on this matter? Thanks.

    I've tried to solve based on the suggestion from this link too removing newlines from messy strings in pandas dataframe cells? but it's still not working.

    Solved:

    df['text'].replace(r'\s+|\\n', ' ', regex=True, inplace=True)