Pandas Read_CSV quotes issue

13,942

Solution 1

I think you need str.strip with apply:

import pandas as pd
import io

temp=u"""'colA'|'colB'
'word"A'|'A'
'word'B'|'B'"""

#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep='|')

df = df.apply(lambda x: x.str.strip("'"))
df.columns = df.columns.str.strip("'")
print (df)
     colA colB
0  word"A    A
1  word'B    B

Solution 2

The source of the problem is that ' is defined as quote, and as a regular char.

You can escape it e.g.

'colA'|'colB'
'word"A'|'A'
'word/'B'|'B'

And then use escapechar:

>>> pd.read_csv('input.csv',sep='|',quotechar="'",escapechar="/")
     colA colB
0  word"A    A
1  word'B    B

Also You can use: quoting=csv.QUOTE_ALL - but the output will include the quote chars

>>> import pandas as pd
>>> import csv
>>> pd.read_csv('input.csv',sep='|',quoting=csv.QUOTE_ALL)
     'colA' 'colB'
0  'word"A'    'A'
1  'word'B'    'B'
>>>
Share:
13,942
Admin
Author by

Admin

Updated on June 26, 2022

Comments

  • Admin
    Admin almost 2 years

    I have a file that looks like:

    'colA'|'colB'
    'word"A'|'A'
    'word'B'|'B'
    

    I want to use pd.read_csv('input.csv',sep='|', quotechar="'") but I get the following output:

    colA    colB
    word"A   A
    wordB'   B
    

    The last row is not correct, it should be word'B B. How do I get around this? I have tried various iterations but none of them word that reads both rows correctly. I need some csv reading expertise!