Random Forest hyperparameter tuning scikit-learn using GridSearchCV
The coarse-to-fine is actually commonly used to find the best parameters. You first start with a wide range of parameters and refined them as you get closer to the best results.
I found an awesome library which does hyperparameter optimization for scikit-learn, hyperopt-sklearn. It can auto-tune your RandomForest or any other standard classifiers. You can even auto-tune and benchmark different classifiers at the same time.
I suggest you start with that because it implements different schemes to get the best parameters:
Random Search
Tree of Parzen Estimators (TPE)
Annealing
Tree
Gaussian Process Tree
EDIT:
In the case of regression, you still need to assert if your predictions are good on the test set.
Anyways, the coarse-to-fine approach still holds and is valid for any estimator.
Muhammad
Updated on July 26, 2022Comments
-
Muhammad over 1 year
I am trying to use Random forest for my problem (below is a sample code for boston datasets, not for my data). I am planning to use
GridSearchCV
for hyperparameter tuning but what should be the range of values for different parameters? How will I know that the range I am selecting is the correct one?I was reading about it on the internet and someone suggested to try "zoom in" on the optimum in a second grid-search (e.g. if it was 10 then try [5, 20, 50]).
Is this the right approach? Shall I use this approach for ALL the parameters required for random forest? This approach may miss a "good" combination, right?
import numpy as np from sklearn.grid_search import GridSearchCV from sklearn.datasets import load_digits from sklearn.ensemble import RandomForestRegressor digits = load_boston() X, y = dataset.data, dataset.target model = RandomForestRegressor(random_state=30) param_grid = { "n_estimators" : [250, 300], "criterion" : ["gini", "entropy"], "max_features" : [3, 5], "max_depth" : [10, 20], "min_samples_split" : [2, 4] , "bootstrap": [True, False]} grid_search = GridSearchCV(clf, param_grid, n_jobs=-1, cv=2) grid_search.fit(X, y) print grid_search.best_params_
-
Muhammad about 8 yearsThis does not support Regression and many algorithms, does it? Actually, my problem is regression not classification. I have edited my question.
-
Muhammad about 8 yearsI have edited my question, sorry for the confusion.