How can I fix "Error tokenizing data" on pandas csv reader?

41,117

Solution 1

I struggled with this almost a half day , I opened the csv with notepad and noticed that separate is TAB not comma and then tried belo combination.

df = pd.read_csv('C:\\myfile.csv',sep='\t', lineterminator='\r')

Solution 2

Try df = pd.read_csv(file, header=None, error_bad_lines=False)

Solution 3

The existing answer will not include these additional lines in your dataframe. If you'd like your dataframe to be as wide as its widest point, you can use the following:

delimiter = ','
max_columns = max(open(path_name, 'r'), key = lambda x: x.count(delimiter)).count(delimiter)
df = pd.read_csv(path_name, header = None, skiprows = 1, names = list(range(0,max_columns)))

Set skiprows = 1 if there's actually a header, you can always retrieve the header column names later. You can also identify rows that have more columns populated than the number of column names in the original header.

Share:
41,117
user9191983
Author by

user9191983

Updated on July 09, 2022

Comments

  • user9191983
    user9191983 almost 2 years

    I'm trying to read a csv file with pandas.

    This file actually has only one row but it causes an error whenever I try to read it.

    Something wrong seems happening in line 8 but I could hardly find the 8th line since there's clearly only one row on it.

    I do like:

    with codecs.open("path_to_file", "rU", "Shift-JIS", "ignore") as file:
    
    df = pd.read_csv(file, header=None, sep="\t")
    df
    

    Then I get:

    ParserError: Error tokenizing data. C error: Expected 1 fields in line 8, saw 3

    I don't get what's really going on, so any of your advice will be appreciated.

  • user9191983
    user9191983 over 5 years
    Thanks so much fo your comment Po Xin, I've tried that and got another error like this ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
  • Admin
    Admin over 5 years
  • M. Mariscal
    M. Mariscal about 4 years
    How to avoid showing errors in terminal furthermore?