How to fix "IndexError: tuple index out of range" in python?
Solution 1
The root cause of your issue is that, while you ask for the evaluation of 6 models in GridSearchCV
, you provide parameters only for the first 2 ones:
models = [SVR(), RandomForestRegressor(), LinearRegression(), Ridge(), Lasso(), XGBRegressor()]
params = [{'C': [0.01, 1]}, {'n_estimators': [10, 20]}]
The result of enumerate(zip(models, params))
in this setting, i.e:
for i, (model, param) in enumerate(zip(models, params)):
print((model, param))
is
(SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='auto',
kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False), {'C': [0.01, 1]})
(RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
oob_score=False, random_state=None, verbose=0, warm_start=False), {'n_estimators': [10, 20]})
i.e the last 4 models are simply ignored, so you get empty entries for them in cv
:
print(cv)
# result:
[[5950.6018771284835, 5987.293514740653, 6055.368320208183, 6099.316091619069, 6146.478702335218], [3625.3243553665975, 3301.3552182952058, 3404.3321983193728, 3521.5160621260898, 3561.254684271113], [], [], [], []]
which causes the downstream error when trying to get the np.mean(cv, 1)
.
The solution, as already correctly pointed out by Psi in their answer, is to go for empty dictionaries in the models in which you actually don't perform any CV search; omitting the XGBRegressor
(have not installed it), here are the results:
models = [SVR(), RandomForestRegressor(), LinearRegression(), Ridge(), Lasso()]
params2 = [{'C': [0.01, 1]}, {'n_estimators': [10, 20]}, {}, {}, {}]
cv = [[] for _ in range(len(models))]
fold = KFold(5,shuffle=False)
for tr, ts in fold.split(X):
for i, (model, param) in enumerate(zip(models, params2)):
best_m = GridSearchCV(model, param)
best_m.fit(X[tr], y[tr])
s = mean_squared_error(y[ts], best_m.predict(X[ts]))
cv[i].append(s)
where print(cv)
gives:
[[4048.660483326826, 3973.984055352062, 3847.7215568088545, 3907.0566348092684, 3820.0517432992765], [1037.9378737329769, 1025.237441119364, 1016.549294695313, 993.7083268195154, 963.8115632611381], [2.2948917095935095e-26, 1.971022007799432e-26, 4.1583774042712844e-26, 2.0229469068846665e-25, 1.9295075684919642e-26], [0.0003350178681602639, 0.0003297411022124562, 0.00030834076832371557, 0.0003355298330301431, 0.00032049282437794516], [10.372789356303688, 10.137748082073076, 10.136028304131141, 10.499159069700834, 9.80779910439471]]
and print(np.mean(cv, 1))
works OK, giving:
[3.91949489e+03 1.00744890e+03 6.11665355e-26 3.25824479e-04
1.01907048e+01]
So, in your case, you should indeed change params
to:
params = [{'C': [0.01, 1]}, {'n_estimators': [10, 20]}, {}, {}, {}, {}]
as already suggested by Psi.
Solution 2
When you define
cv = [[] for _ in range(len(models))]
it has an empty list for each model.
In the loop, however, you go over enumerate(zip(models, params))
which has only two elements, since your params
list has two elements (because list(zip(x,y))
has length equal to min(len(x),len(y)
).
Hence, you get an IndexError
because some of the lists in cv
are empty (all but the first two) when you calculate the mean with np.mean
.
Solution:
If you don't need to use GridSearchCV
on the remaining models you may just extend the params
list with empty dictionaries:
params = [{'C': [0.01, 1]}, {'n_estimators': [10, 20]}, {}, {}, {}, {}]
Comments
-
Jerry07 almost 2 years
I am using
sklearn
modules to find the best fitting models and model parameters. However, I have an unexpected Index error down below:> IndexError Traceback (most recent call > last) <ipython-input-38-ea3f99e30226> in <module> > 22 s = mean_squared_error(y[ts], best_m.predict(X[ts])) > 23 cv[i].append(s) > ---> 24 print(np.mean(cv, 1)) > IndexError: tuple index out of range
what I want to do is to find best fitting regressor and its parameters, but I got above error. I looked into
SO
and tried this solution but still, same error bumps up. any idea to fix this bug? can anyone point me out why this error happening? any thought?my code:
from sklearn.model_selection import KFold from sklearn.metrics import mean_squared_error from sklearn.linear_model import LinearRegression, Ridge, Lasso from xgboost.sklearn import XGBRegressor from sklearn.datasets import make_regression models = [SVR(), RandomForestRegressor(), LinearRegression(), Ridge(), Lasso(), XGBRegressor()] params = [{'C': [0.01, 1]}, {'n_estimators': [10, 20]}] X, y = make_regression(n_samples=10000, n_features=20) with warnings.catch_warnings(): warnings.filterwarnings("ignore") cv = [[] for _ in range(len(models))] fold = KFold(5,shuffle=False) for tr, ts in fold.split(X): for i, (model, param) in enumerate(zip(models, params)): best_m = GridSearchCV(model, param) best_m.fit(X[tr], y[tr]) s = mean_squared_error(y[ts], best_m.predict(X[ts])) cv[i].append(s) print(np.mean(cv, 1))
desired output:
if there is a way to fix up above error, I am expecting to pick up best-fitted models with parameters, then use it for estimation. Any idea to improve the above attempt? Thanks
-
Jerry07 almost 5 years@desertnaut Do you think how can I optimize this code? any better idea?
-
desertnaut almost 5 yearsThat's a very general question, but doing a grid search in each one of 5 folds sounds like overkill. I kindly suggest you open another question asking for advice in this (be sure to make your code fully reproducible, including all relevant imports).
-
hpaulj almost 5 yearsThe error can be reproduced with
np.mean([],1)
, which supports the idea thecv
is[]
, or contains[]
lists.
-
-
Jerry07 almost 5 yearsI don't think this is the answer for this question. Please read
SO
community rule. -
Psi almost 5 years@Dan Since you haven't posted a MWE I can't verify with certainty that this is the solution, but it works with your code after importing the appropiate modules and it matches the output you gave in the comments for
cv
(see the last edit for the specific change you would have to make toparams
). -
desertnaut almost 5 yearsThis is the correct answer indeed (upvoted) - can't understand the downvotes; I proceed to explain in more detail...