Reading tab-delimited file with Pandas - works on Windows, but not on Mac

306,381

Solution 1

The biggest clue is the rows are all being returned on one line. This indicates line terminators are being ignored or are not present.

You can specify the line terminator for csv_reader. If you are on a mac the lines created will end with \rrather than the linux standard \n or better still the suspenders and belt approach of windows with \r\n.

pandas.read_csv(filename, sep='\t', lineterminator='\r')

You could also open all your data using the codecs package. This may increase robustness at the expense of document loading speed.

import codecs

doc = codecs.open('document','rU','UTF-16') #open for reading with "universal" type set

df = pandas.read_csv(doc, sep='\t')

Solution 2

Another option would be to add engine='python' to the command pandas.read_csv(filename, sep='\t', engine='python')

Share:
306,381
user3062149
Author by

user3062149

Updated on March 06, 2021

Comments

  • user3062149
    user3062149 about 3 years

    I've been reading a tab-delimited data file in Windows with Pandas/Python without any problems. The data file contains notes in first three lines and then follows with a header.

    df = pd.read_csv(myfile,sep='\t',skiprows=(0,1,2),header=(0))
    

    I'm now trying to read this file with my Mac. (My first time using Python on Mac.) I get the following error.

    pandas.parser.CParserError: Error tokenizing data. C error: Expected 1
    fields in line 8, saw 39
    

    If set the error_bad_lines argument for read_csv to False, I get the following information, which continues until the end of the last row.

    Skipping line 8: expected 1 fields, saw 39
    Skipping line 9: expected 1 fields, saw 125
    Skipping line 10: expected 1 fields, saw 125
    Skipping line 11: expected 1 fields, saw 125
    Skipping line 12: expected 1 fields, saw 125
    Skipping line 13: expected 1 fields, saw 125
    Skipping line 14: expected 1 fields, saw 125
    Skipping line 15: expected 1 fields, saw 125
    Skipping line 16: expected 1 fields, saw 125
    Skipping line 17: expected 1 fields, saw 125
    ...
    

    Do I need to specify a value for the encoding argument? It seems as though I shouldn't have to because reading the file works fine on Windows.

  • Mikhail Venkov
    Mikhail Venkov over 6 years
    The adding codecs piece of code helped me. Then I realized there is a paramter in read_csv that does the same. I've added encoding='utf-16' and it fixed the issue for me.