Pandas parsing csv error - expected 1 fields found 9

python python-3.x pandas csv data-analysis

10,630

Solution 1

The function pandas.read_csv() gets the number of columns and their names from the first line. By default it does not consider the option of the first lines being comments.

What is happening is that pandas reads the first line, splits it and finds there is only one column, insetad of doing this split to the line 13 which is the first not commented line. To solve this, the argument comment can be used.

planets = pd.read_csv("planets.csv", comment='#')

Compared to using skiprows, this allows the same code to load the planets.csv file even if the number of comment lines vary.

Solution 2

In addition to the above answer, if you got problem only with row 13th, you may skip it .

pd.read_csv("plants.csv", skiprows = 12, header=None)

Solution 3

Looks like you need skiprows. You can skip all the comments.

Ex:

planets = pd.read_csv("planets.csv", sep=',', skiprows=12)

10,630

Baalateja Kataru

Updated on June 04, 2022

Comments

Baalateja Kataru almost 2 years

I'm trying to parse from a .csv file:

planets = pd.read_csv("planets.csv", sep=',')

But I always end up with this error:

ParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 9

This is how the first few lines of my csv file look like:

# This file was produced by the test
# Tue Apr  3 06:03:27 2018
#
# COLUMN pl_hostname:    Host Name
# COLUMN pl_discmethod:  Discovery Method
# COLUMN pl_pnum:        Number of Planets in System
# COLUMN pl_orbper:      Orbital Period [days]
# COLUMN pl_orbsmax:     Orbit Semi-Major Axis [AU])
# COLUMN st_dist:        Distance [pc]
# COLUMN st_teff:        Effective Temperature [K]
# COLUMN st_mass:        Stellar Mass [Solar mass] 
#
loc_rowid,pl_hostname,pl_discmethod,pl_pnum,pl_orbper,pl_orbsmax,st_dist,st_teff,st_mass
1,11 Com,Radial Velocity,1,326.03000000,1.290000,110.62,4742.00,2.70
2,11 UMi,Radial Velocity,1,516.22000000,1.540000,119.47,4340.00,1.80
3,14 And,Radial Velocity,1,185.84000000,0.830000,76.39,4813.00,2.20
4,14 Her,Radial Velocity,1,1773.40000000,2.770000,18.15,5311.00,0.90
5,16 Cyg B,Radial Velocity,1,798.50000000,1.681000,21.41,5674.00,0.99
6,18 Del,Radial Velocity,1,993.30000000,2.600000,73.10,4979.00,2.30
7,1RXS J160929.1-210524,Imaging,1,,330.000000,145.00,4060.00,0.85

Edit: this is line 13:

loc_rowid,pl_hostname,pl_discmethod,pl_pnum,pl_orbper,pl_orbsmax,st_dist,st_teff,st_mass

Edit: Thanks to @Rakesh, Skipping the first 12 lines solved the problem

planets = pd.read_csv("planets.csv", sep=',', skiprows=12)

michaelg about 6 years

You probably need to check line 13.
in_user about 6 years

You should post the data in line 13, that would give more clue
Baalateja Kataru about 6 years

@NEOmen This is line 13: loc_rowid,pl_hostname,pl_discmethod,pl_pnum,pl_orbper,pl_orb‌smax,st_dist,st_teff‌,st_mass
OriolAbril about 6 years

What are lines 1-12 then? Do you want to read the info there?
Baalateja Kataru about 6 years

@xg.plt.py Those are just comments that contain info about the data in the file.
OriolAbril about 6 years

Are they started with a specific charechter like #?
in_user about 6 years

Can you post first 20 lines of the csv file, because what you are saying is your line 13, you have the same value in line 1 as well
Baalateja Kataru about 6 years

@xg.plt.py Yeah they start with #

Baalateja Kataru about 6 years

That just skips all the lines in the csv file.
Rakesh about 6 years

Can you post the first 20 line in your question?
Baalateja Kataru about 6 years

Again, that just skips all the lines in the csv file.
Baalateja Kataru about 6 years

Yeah the updated snippet worked like a charm, thanks :)
OriolAbril about 6 years

You are welcome, I cannot remember all the options of read_csv and I have to check the documentation frequently, no problem :).