Predicting how long an scikit-learn classification will take to run

python machine-learning classification scikit-learn

21,021

Solution 1

There are very specific classes of classifier or regressors that directly report remaining time or progress of your algorithm (number of iterations etc.). Most of this can be turned on by passing verbose=2 (any high number > 1) option to the constructor of individual models. Note: this behavior is according to sklearn-0.14. Earlier versions have a bit different verbose output (still useful though).

The best example of this is ensemble.RandomForestClassifier or ensemble.GradientBoostingClassifier` that print the number of trees built so far and remaining time.

clf = ensemble.GradientBoostingClassifier(verbose=3)
clf.fit(X, y)
Out:
   Iter       Train Loss   Remaining Time
     1           0.0769            0.10s
     ...

clf = ensemble.RandomForestClassifier(verbose=3)
clf.fit(X, y)
Out:
  building tree 1 of 100
  ...

This progress information is fairly useful to estimate the total time.

Then there are other models like SVMs that print the number of optimization iterations completed, but do not directly report the remaining time.

clf = svm.SVC(verbose=2)
clf.fit(X, y)
Out:
   *
    optimization finished, #iter = 1
    obj = -1.802585, rho = 0.000000
    nSV = 2, nBSV = 2
    ...

Models like linear models don't provide such diagnostic information as far as I know.

Check this thread to know more about what the verbosity levels mean: scikit-learn fit remaining time

Solution 2

If you are using IPython, you can consider to use the built-in magic commands such as %time and %timeit

%time - Time execution of a Python statement or expression. The CPU and wall clock times are printed, and the value of the expression (if any) is returned. Note that under Win32, system time is always reported as 0, since it can not be measured.

%timeit - Time execution of a Python statement or expression using the timeit module.

Example:

In [4]: %timeit NMF(n_components=16, tol=1e-2).fit(X)
1 loops, best of 3: 1.7 s per loop

References:

https://ipython.readthedocs.io/en/stable/interactive/magics.html

http://scikit-learn.org/stable/developers/performance.html

21,021

Author by

ntaggart

Updated on February 26, 2020

Comments

ntaggart about 4 years

Is there a way to predict how long it will take to run a classifier from sci-kit learn based on the parameters and dataset? I know, pretty meta, right?

Some classifiers/parameter combinations are quite fast, and some take so long that I eventually just kill the process. I'd like a way to estimate in advance how long it will take.

Alternatively, I'd accept some pointers on how to set common parameters to reduce the run time.
ntaggart about 10 years

Thank you, this is very helpful! I saw verbosity, but didn't connect that it reported time remaining.