Skip specific set of columns when reading excel frame - pandas

29,624

Solution 1

You can use the following technique. Let the columns we don't want(want to skip) are 2 5 8, then find all reamining columns we DO WANT TO KEEP as cols such that:

In [7]: cols2skip = [2,5,8]  
In [8]: cols = [i for i in range(10) if i not in cols2skip]

In [9]: cols
Out[9]: [0, 1, 3, 4, 6, 7, 9]

and then we can use those remaining columns(which we DO WANT TO KEEP) using usecols:

df = pd.read_excel(filename, usecols=cols)

Solution 2

If your version of pandas allows (check first if you can pass a function to usecols), I would try something like:

import pandas as pd
df = pd.read_excel('large_excel_file.xlsx', usecols=lambda x: 'Unnamed' not in x,)

This should skip all columns without header names. You could substitute 'Unnamed' with a list of column names you do not want.

Share:
29,624

Related videos on Youtube

Juan David
Author by

Juan David

Updated on July 24, 2022

Comments

  • Juan David
    Juan David almost 2 years

    I know beforehand what columns I don't need from an excel file and I'd like to avoid them when reading the file to improve the performance. Something like this:

    import pandas as pd
    df = pd.read_excel('large_excel_file.xlsx', skip_cols=['col_a', 'col_b',...,'col_zz'])
    

    There is nothing related to this in the documentation. is there any workaround for this?

    • Aran-Fey
      Aran-Fey about 6 years
      Can you use the usecols parameter instead?
  • Naypa
    Naypa almost 4 years
    Note that usecols accept the columns letters as parameter: usecols = "A,C:AA"
  • Will Croxford
    Will Croxford about 3 years
    I think this is more 'Pythonic' than @MarMat, as this uses readable list comprehension in 2 lines, and other uses lambda. My understanding is always avoid lambda in Python if you can use a list comprehension, and lambda is rarely much faster. If you want someone else to understand your code quicker, this will be easier imho. If you are processing Excel and you find one of columns is binary image string (I get that surprisingly often), this is quite useful!

Related