ipython pandas TypeError: read_csv() got an unexpected keyword argument 'delim-whitespace''
Solution 1
Oddly, the delim_whitespace parameter appears in the Pandas documentation in the method summary but not the parameters list. Try replacing it with delimiter = r'\s+'
, which is equivalent to what I assume the authors meant.
CSV does refer to comma-separated values, but it's often used to refer to general delimited-text formats. TSV (tab-separated values) is another variant; in this case it's basically whitespace-separated values.
Solution 2
Your code uses delim_whitespace
but the error message says delim-whitespace
. The former exists, the latter does not.
If the data file contains
14.0 8. 454.0 220.0 4354. 9.0 70. 1. "chevrolet impala"
and you define data
with
data = pd.read_csv('data', delim_whitespace = True, header=None, names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model', 'origin', 'car_name'])
then the DataFrame does get parsed successfully:
mpg cylinders displacement horsepower weight acceleration model \
0 14 8 454 220 4354 9 70
origin car_name
0 1 chevrolet impala
So you just have change the hyphen to an underscore.
Note that when you specify delim_whitespace=True
, the pure Python parser is used. In this case I don't think that is necessary. Using delimiter=r'\s+'
as Steve Howard suggests would probably perform better. (The source code says, "The C engine is faster while the python engine is
currently more feature-complete", but I think the only feature that the python engine has that the C engine does not is skipfooter
.)
importError
Updated on January 24, 2020Comments
-
importError over 4 years
While trying the ipython.org notebook, "INTRODUCTION TO PYTHON FOR DATA MINING"
The following code:
data = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original", delim_whitespace = True, header=None, names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model', 'origin', 'car_name'])
yields the following error:
TypeError: read_csv() got an unexpected keyword argument 'delim-whitespace'
Unfortunately the dataset file itself is not really csv, and I don't know why they used read_csv() to get its data.
The data looks like this line:
14.0 8. 454.0 220.0 4354. 9.0 70. 1. "chevrolet impala"
The environment is python/2.7 on Debian stable w/ ipython 0.13. After searching here, I realize it's mostly likely a version problem, as the argument 'delim-whitespace' maybe in a later version of the pandas library, than the one available to the APT package manager.
I tried several workarounds, without success.
First, I tried to upgrade pandas, by building from latest source, but i found i would end up with a cascade of other builds of dependencies whose versions need upgrading and could end up breaking the environment. E.g., I had to install Cython, then it reported it was again a version too old on the APT package manager, so I would have to rebuild Cython, + other libs/modules and so on.
Then after looking at the API a bit, I tried using other arguments: using delimiter = ' ' in the call to read_csv() caused it to break up the strings inside quotes into several columns,
ValueError: Expecting 9 columns, got 13 in row 0
I tried using the
read_csv()
argumentquotechar='"'
, as documented in the API but again it was not recognized (unexpected keyword argument)Finally I tried using a different way to load the file,
data = DataFrame() data.from_csv(url)
I got,
Out[18]: <class 'pandas.core.frame.DataFrame'> Index: 405 entries, 15.0 8. 350.0 165.0 3693. 11.5 70. 1."buick skylark 320" to 31.0 4. 119.0 82.00 2720. 19.4 82. 1. "chevy s-10" Empty DataFrame In [19]: print(data.shape) (0, 9)
alternatively, w/ sep argument to from_csv(),
In [20]: data.from_csv(url,sep=' ')
yields the error,
ValueError: Expecting 31 columns, got 35 in row 1 In [21]: print(data.shape) (0, 9)
Also alternatively, with the same negative result:
In [32]: data = DataFrame( columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration','model', 'origin', 'car_name']) In [33]: data.from_csv(url,sep=', \t')Out[33]: <class 'pandas.core.frame.DataFrame'> Index: 405 entries, 15.0 8. 350.0 165.0 3693. 11.5 70. 1."buick skylark 320" to 31.0 4. 119.0 82.00 2720. 19.4 82. 1. "chevy s-10" Empty DataFrame In [34]: data.head() Out[34]: Empty DataFrame
I tried using ipython3 instead, but it cannot find/load matplotlib as there is not matplotlib for python3 for my system.
Any help with this problem would be greatly appreciated.
-
importError over 9 yearsThanks for the reply. Yeah that too is odd enough - especially in documentation ! Incidentally, the url of the notebook tried is nbviewer.ipython.org/github/Syrios12/learningwithdata/blob/…
-
importError over 9 yearsThanks for replying and the discrepancy between delim_whitespace and delim-whitespace ! The url for the notebook i was trying is nbviewer.ipython.org/github/Syrios12/learningwithdata/blob/…
-
importError over 9 yearsI just tried it again, the delim-whitespace is a typo on my part, and is treated as an expression if used in the function call. Should i edit the original question or leave it with the typo? Thanks. Now trying delimiter's value suggested by unutbu... ValueError: Expecting 9 columns, got 12 in row 0 And using the delim_whitespace=True gives the error in the original question, as the pandas being used (available through APT) is an older version.
-
importError over 9 yearsUsing delimiter=r'\s+' results in breaking up the quoted string into several columns again.
-
unutbu over 9 yearsWhat version of pandas are you using?
-
unutbu over 9 yearsI've attempted to write instructions for how to install the the latest version of pandas here.
-
importError over 9 yearsPandas version 0.8.0 here , the one through the APT repository (python-pandas pkg).. when i run $ sudo pip install --install-option="--prefix=" -U pandas i get ... ... File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 588, in resolve raise VersionConflict(dist,req) # XXX put more info here pkg_resources.VersionConflict: (numpy 1.6.2 (/usr/lib/pymodules/python2.7), Requirement.parse('numpy>=1.7.0')) ---------------------------------------- Command python setup.py egg_info failed with error code 1 in ~/build/pandas Storing complete log in ~/.pip/pip.log
-
unutbu over 9 yearsMy goodness that's old :) Are you willing to try the git/virtualenv instructions I posted here? It's a lot to download, but if successful, will allow you to always have the latest version of the entire NumPy-->Pandas stack.
-
unutbu over 9 yearsOr, perhaps try the anaconda solution. You won't have complete freedom to choose the latest versions, but it looks easier to install and would get you over this parsing problem.
-
importError over 9 yearsThanks a lot for the instructions you posted to have the latest version with/in a virtualenv. i think that would be my option rather than the "easier" Anaconda.