pandas.read_excel error when using usecols

26,746

Solution 1

first read the columns like

df = pd.read_excel(file, usecols="A:D")

where A:D is range of columns in excel you want to read then rename your columns like this

df.columns = ['col1', 'col2', 'col3', 'col4']

then access column accordingly

Solution 2

These methods are really efficient to select Excel columns:

First case using numbers, column "A" = 0, column "B" = 1 etc.

df = pd.read_excel("filename.xlsx",usecols= range(0,5))

Second case using letters:

df = pd.read_excel("filename.xlsx",usecols= "A, C, E:J")

Solution 3

In case you want to read your excel file by specific column names, follow the following sample code using "usecol":

> df = pd.read_excel("filename.xlsx",usecols=["col_name1", "col_name2", "col_name3"])
> print(df)
Share:
26,746

Related videos on Youtube

Giacomo Sachs
Author by

Giacomo Sachs

Updated on July 05, 2022

Comments

  • Giacomo Sachs
    Giacomo Sachs almost 2 years

    I am having some problem in reading data from an Excel file. The Excel file contains column names with unicode characters.

    I need, because of some automation reasons, to pass the usecols argument to the pandas.read_excel function.

    The thing is that when I don't use the usecols argument the data is loaded with no errors.

    Here's the code:

    import pandas as pd
    
    df = pd.read_excel(file)
    df.colums
    
    Index([u'col1', u'col2', u'col3', u'col with unicode à', u'col4'], dtype='object')
    

    If I use usecols:

    COLUMNS = ['col1', 'col2', 'col with unicode à']
    df = pd.read_excel(file, usecols = COLUMNS)
    

    I receive the following error:

    ValueError: Usecols do not match columns, columns expected but not found: ['col with unicode \xc3\xa0']
    

    Using encoding = 'utf-8' as argument of read_excel does not solve the problem, and also encoding the COLUMNS elements.

    EDIT: Here the complete error window.

     ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-22-541ccb88da6a> in <module>()
          2 df = pd.read_excel(file)
          3 cols = df.columns
    ----> 4 df = pd.read_excel(file, usecols = ['col1', 'col2', 'col with unicode à'])
    
    C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\util\_decorators.pyc in wrapper(*args, **kwargs)
        186                 else:
        187                     kwargs[new_arg_name] = new_arg_value
    --> 188             return func(*args, **kwargs)
        189         return wrapper
        190     return _deprecate_kwarg
    
    C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\util\_decorators.pyc in wrapper(*args, **kwargs)
        186                 else:
        187                     kwargs[new_arg_name] = new_arg_value
    --> 188             return func(*args, **kwargs)
        189         return wrapper
        190     return _deprecate_kwarg
    
    C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\excel.pyc in read_excel(io, sheet_name, header, names, index_col, parse_cols, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, verbose, parse_dates, date_parser, thousands, comment, skip_footer, skipfooter, convert_float, mangle_dupe_cols, **kwds)
        373         convert_float=convert_float,
        374         mangle_dupe_cols=mangle_dupe_cols,
    --> 375         **kwds)
        376 
        377 
    
    C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\excel.pyc in parse(self, sheet_name, header, names, index_col, usecols, squeeze, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
        716                                   convert_float=convert_float,
        717                                   mangle_dupe_cols=mangle_dupe_cols,
    --> 718                                   **kwds)
        719 
        720     @property
    
    C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\excel.pyc in parse(self, sheet_name, header, names, index_col, usecols, squeeze, dtype, true_values, false_values, skiprows, nrows, na_values, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
        599                                     usecols=usecols,
        600                                     mangle_dupe_cols=mangle_dupe_cols,
    --> 601                                     **kwds)
        602 
        603                 output[asheetname] = parser.read(nrows=nrows)
    
    C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in TextParser(*args, **kwds)
       2154     """
       2155     kwds['engine'] = 'python'
    -> 2156     return TextFileReader(*args, **kwds)
       2157 
       2158 
    
    C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in __init__(self, f, engine, **kwds)
        893             self.options['has_index_names'] = kwds['has_index_names']
        894 
    --> 895         self._make_engine(self.engine)
        896 
        897     def close(self):
    
    C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in _make_engine(self, engine)
       1130                                  ' "c", "python", or' ' "python-fwf")'.format(
       1131                                      engine=engine))
    -> 1132             self._engine = klass(self.f, **self.options)
       1133 
       1134     def _failover_to_python(self):
    
    C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in __init__(self, f, **kwds)
       2236         self._col_indices = None
       2237         (self.columns, self.num_original_columns,
    -> 2238          self.unnamed_cols) = self._infer_columns()
       2239 
       2240         # Now self.columns has the set of columns that we will process.
    
    C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in _infer_columns(self)
       2609                 columns = [names]
       2610             else:
    -> 2611                 columns = self._handle_usecols(columns, columns[0])
       2612         else:
       2613             try:
    
    C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in _handle_usecols(self, columns, usecols_key)
       2669                             col_indices.append(usecols_key.index(col))
       2670                         except ValueError:
    -> 2671                             _validate_usecols_names(self.usecols, usecols_key)
       2672                     else:
       2673                         col_indices.append(col)
    
    C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in _validate_usecols_names(usecols, names)
       1235         raise ValueError(
       1236             "Usecols do not match columns, "
    -> 1237             "columns expected but not found: {missing}".format(missing=missing)
       1238         )
       1239 
    
    ValueError: Usecols do not match columns, columns expected but not found: ['col with unicode \xc3\xa0']
    
    • EdChum
      EdChum about 5 years
      does it work if you pass the columns with indexing: cols = df.columns and then df = pd.read_excel(file, usecols = cols[0,1,3])?
    • Giacomo Sachs
      Giacomo Sachs about 5 years
      @EdChum no, it gives the same error
    • EdChum
      EdChum about 5 years
      This maybe a bug in the excel module, you could just drop the columns after loading or rename the cols in excel prior to loading if it becomes a persistent issue
  • Giacomo Sachs
    Giacomo Sachs about 5 years
    I tried using this instead of what EdChum suggested, but it gives UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 17: ordinal not in range(128) (and I get the reason for this error). I tried also map(lambda x: str(x.encode('utf-8')), df.columns), but it still gives the first error. I think the problem is that I can't get where the usecols comes into play when using read_excel. I looked into the pandas code, but I lose the trail of usecols.
  • user69659
    user69659 about 5 years
    possibly this can help df.columns = map(lambda x: x.encode('utf-8').decode('utf-8'), df.columns)
  • Giacomo Sachs
    Giacomo Sachs about 5 years
    it doesn't work. It gives back the error UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 17: ordinal not in range(128)
  • user69659
    user69659 about 5 years
    Giacomo Sachs update the solution please try that will work
  • Giacomo Sachs
    Giacomo Sachs over 4 years
    yes, that's exactly what I was trying to do, but I get the error you can find in the question, because of a column name with an accent character.