How can I fix "Error tokenizing data" on pandas csv reader?

python pandas csv tokenize

41,117

Solution 1

I struggled with this almost a half day , I opened the csv with notepad and noticed that separate is TAB not comma and then tried belo combination.

df = pd.read_csv('C:\\myfile.csv',sep='\t', lineterminator='\r')

Solution 2

Try df = pd.read_csv(file, header=None, error_bad_lines=False)

Solution 3

The existing answer will not include these additional lines in your dataframe. If you'd like your dataframe to be as wide as its widest point, you can use the following:

delimiter = ','
max_columns = max(open(path_name, 'r'), key = lambda x: x.count(delimiter)).count(delimiter)
df = pd.read_csv(path_name, header = None, skiprows = 1, names = list(range(0,max_columns)))

Set skiprows = 1 if there's actually a header, you can always retrieve the header column names later. You can also identify rows that have more columns populated than the number of column names in the original header.

41,117

Author by

user9191983

Updated on July 09, 2022

Comments

user9191983 almost 2 years
I'm trying to read a csv file with pandas.

This file actually has only one row but it causes an error whenever I try to read it.

Something wrong seems happening in line 8 but I could hardly find the 8th line since there's clearly only one row on it.

I do like:
```
with codecs.open("path_to_file", "rU", "Shift-JIS", "ignore") as file:

df = pd.read_csv(file, header=None, sep="\t")
df
```
Then I get:

ParserError: Error tokenizing data. C error: Expected 1 fields in line 8, saw 3

I don't get what's really going on, so any of your advice will be appreciated.
user9191983 over 5 years

Thanks so much fo your comment Po Xin, I've tried that and got another error like this ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
Admin over 5 years

try this stackoverflow.com/questions/33998740/…
M. Mariscal about 4 years

How to avoid showing errors in terminal furthermore?