Skip specific set of columns when reading excel frame - pandas
29,624
Solution 1
You can use the following technique. Let the columns we don't want(want to skip) are 2 5 8, then find all reamining columns we DO WANT TO KEEP as cols
such that:
In [7]: cols2skip = [2,5,8]
In [8]: cols = [i for i in range(10) if i not in cols2skip]
In [9]: cols
Out[9]: [0, 1, 3, 4, 6, 7, 9]
and then we can use those remaining columns(which we DO WANT TO KEEP) using usecols
:
df = pd.read_excel(filename, usecols=cols)
Solution 2
If your version of pandas allows (check first if you can pass a function to usecols), I would try something like:
import pandas as pd
df = pd.read_excel('large_excel_file.xlsx', usecols=lambda x: 'Unnamed' not in x,)
This should skip all columns without header names. You could substitute 'Unnamed' with a list of column names you do not want.
Related videos on Youtube
Author by
Juan David
Updated on July 24, 2022Comments
-
Juan David almost 2 years
I know beforehand what columns I don't need from an excel file and I'd like to avoid them when reading the file to improve the performance. Something like this:
import pandas as pd df = pd.read_excel('large_excel_file.xlsx', skip_cols=['col_a', 'col_b',...,'col_zz'])
There is nothing related to this in the documentation. is there any workaround for this?
-
Aran-Fey about 6 yearsCan you use the
usecols
parameter instead?
-
-
Naypa almost 4 yearsNote that
usecols
accept the columns letters as parameter: usecols = "A,C:AA" -
Will Croxford about 3 yearsI think this is more 'Pythonic' than @MarMat, as this uses readable list comprehension in 2 lines, and other uses lambda. My understanding is always avoid lambda in Python if you can use a list comprehension, and lambda is rarely much faster. If you want someone else to understand your code quicker, this will be easier imho. If you are processing Excel and you find one of columns is binary image string (I get that surprisingly often), this is quite useful!