ipython pandas TypeError: read_csv() got an unexpected keyword argument 'delim-whitespace''

39,040

Solution 1

Oddly, the delim_whitespace parameter appears in the Pandas documentation in the method summary but not the parameters list. Try replacing it with delimiter = r'\s+', which is equivalent to what I assume the authors meant.

CSV does refer to comma-separated values, but it's often used to refer to general delimited-text formats. TSV (tab-separated values) is another variant; in this case it's basically whitespace-separated values.

Solution 2

Your code uses delim_whitespace but the error message says delim-whitespace. The former exists, the latter does not.

If the data file contains

 14.0   8.   454.0      220.0      4354.       9.0   70.  1.    "chevrolet impala"

and you define data with

data = pd.read_csv('data', delim_whitespace = True, header=None, names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model', 'origin', 'car_name'])

then the DataFrame does get parsed successfully:

   mpg  cylinders  displacement  horsepower  weight  acceleration  model  \
0   14          8           454         220    4354             9     70   

   origin          car_name  
0       1  chevrolet impala  

So you just have change the hyphen to an underscore.


Note that when you specify delim_whitespace=True, the pure Python parser is used. In this case I don't think that is necessary. Using delimiter=r'\s+' as Steve Howard suggests would probably perform better. (The source code says, "The C engine is faster while the python engine is currently more feature-complete", but I think the only feature that the python engine has that the C engine does not is skipfooter.)

Share:
39,040
importError
Author by

importError

Updated on January 24, 2020

Comments

  • importError
    importError over 4 years

    While trying the ipython.org notebook, "INTRODUCTION TO PYTHON FOR DATA MINING"

    The following code:

    data = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original",
                   delim_whitespace = True, header=None,
                   names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration',
                            'model', 'origin', 'car_name'])
    

    yields the following error:

     TypeError: read_csv() got an unexpected keyword argument 'delim-whitespace'
    

    Unfortunately the dataset file itself is not really csv, and I don't know why they used read_csv() to get its data.

    The data looks like this line:

     14.0   8.   454.0      220.0      4354.       9.0   70.  1.    "chevrolet impala"
    

    The environment is python/2.7 on Debian stable w/ ipython 0.13. After searching here, I realize it's mostly likely a version problem, as the argument 'delim-whitespace' maybe in a later version of the pandas library, than the one available to the APT package manager.

    I tried several workarounds, without success.

    • First, I tried to upgrade pandas, by building from latest source, but i found i would end up with a cascade of other builds of dependencies whose versions need upgrading and could end up breaking the environment. E.g., I had to install Cython, then it reported it was again a version too old on the APT package manager, so I would have to rebuild Cython, + other libs/modules and so on.

    • Then after looking at the API a bit, I tried using other arguments: using delimiter = ' ' in the call to read_csv() caused it to break up the strings inside quotes into several columns,

      ValueError: Expecting 9 columns, got 13 in row 0
      
    • I tried using the read_csv() argument quotechar='"' , as documented in the API but again it was not recognized (unexpected keyword argument)

    • Finally I tried using a different way to load the file,

      data = DataFrame()
      
      data.from_csv(url)
      

      I got,

      Out[18]: 
      <class 'pandas.core.frame.DataFrame'>
      Index: 405 entries, 15.0   8.   350.0      165.0      3693.      11.5   70.  1."buick skylark 320" to 31.0   4.   119.0      82.00      2720.      19.4   82.  1.   "chevy s-10"
      Empty DataFrame
      
      In [19]: print(data.shape)
      (0, 9)
      
    • alternatively, w/ sep argument to from_csv(),

      In [20]: data.from_csv(url,sep=' ')
      

      yields the error,

      ValueError: Expecting 31 columns, got 35 in row 1
      In [21]: print(data.shape)
      (0, 9)
      
    • Also alternatively, with the same negative result:

      In [32]: data = DataFrame( columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration','model', 'origin', 'car_name'])
      
      In [33]: data.from_csv(url,sep=', \t')Out[33]: 
      <class 'pandas.core.frame.DataFrame'>
      Index: 405 entries, 15.0   8.   350.0      165.0      3693.      11.5   70.  1."buick skylark 320" to 31.0   4.   119.0      82.00      2720.      19.4   82.  1.   "chevy s-10"
      Empty DataFrame
      
      In [34]: data.head()
      Out[34]: 
      Empty DataFrame
      
    • I tried using ipython3 instead, but it cannot find/load matplotlib as there is not matplotlib for python3 for my system.

    Any help with this problem would be greatly appreciated.

  • importError
    importError over 9 years
    Thanks for the reply. Yeah that too is odd enough - especially in documentation ! Incidentally, the url of the notebook tried is nbviewer.ipython.org/github/Syrios12/learningwithdata/blob/…
  • importError
    importError over 9 years
    Thanks for replying and the discrepancy between delim_whitespace and delim-whitespace ! The url for the notebook i was trying is nbviewer.ipython.org/github/Syrios12/learningwithdata/blob/…
  • importError
    importError over 9 years
    I just tried it again, the delim-whitespace is a typo on my part, and is treated as an expression if used in the function call. Should i edit the original question or leave it with the typo? Thanks. Now trying delimiter's value suggested by unutbu... ValueError: Expecting 9 columns, got 12 in row 0 And using the delim_whitespace=True gives the error in the original question, as the pandas being used (available through APT) is an older version.
  • importError
    importError over 9 years
    Using delimiter=r'\s+' results in breaking up the quoted string into several columns again.
  • unutbu
    unutbu over 9 years
    What version of pandas are you using?
  • unutbu
    unutbu over 9 years
    I've attempted to write instructions for how to install the the latest version of pandas here.
  • importError
    importError over 9 years
    Pandas version 0.8.0 here , the one through the APT repository (python-pandas pkg).. when i run $ sudo pip install --install-option="--prefix=" -U pandas i get ... ... File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 588, in resolve raise VersionConflict(dist,req) # XXX put more info here pkg_resources.VersionConflict: (numpy 1.6.2 (/usr/lib/pymodules/python2.7), Requirement.parse('numpy>=1.7.0')) ---------------------------------------- Command python setup.py egg_info failed with error code 1 in ~/build/pandas Storing complete log in ~/.pip/pip.log
  • unutbu
    unutbu over 9 years
    My goodness that's old :) Are you willing to try the git/virtualenv instructions I posted here? It's a lot to download, but if successful, will allow you to always have the latest version of the entire NumPy-->Pandas stack.
  • unutbu
    unutbu over 9 years
    Or, perhaps try the anaconda solution. You won't have complete freedom to choose the latest versions, but it looks easier to install and would get you over this parsing problem.
  • importError
    importError over 9 years
    Thanks a lot for the instructions you posted to have the latest version with/in a virtualenv. i think that would be my option rather than the "easier" Anaconda.