Python scikit learn n_jobs
Solution 1
- what is the point of using n-jobs (and joblib) if the the library uses all cores anyway?
It does not, if you specify n_jobs to -1, it will use all cores. If it is set to 1 or 2, it will use one or two cores only (test done scikit-learn 0.20.3 under Linux).
Solution 2
The documentation says:
This parameter is used to specify how many concurrent processes or threads should be used for routines that are parallelized with joblib.
n_jobs is an integer, specifying the maximum number of concurrently running workers. If 1 is given, no joblib parallelism is used at all, which is useful for debugging. If set to -1, all CPUs are used. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. For example with n_jobs=-2, all CPUs but one are used.
n_jobs is None by default, which means unset; it will generally be interpreted as n_jobs=1, unless the current joblib.Parallel backend context specifies otherwise.
For more details on the use of joblib and its interactions with scikit-learn, please refer to our parallelism notes.
Bruno Hanzen
ICT Infrastructure professional, exploring the analytics domain with Python and related tools
Updated on June 02, 2020Comments
-
Bruno Hanzen almost 4 years
This is not a real issue, but I'd like to understand:
- running sklearn from Anaconda distrib on a Win7 4 cores 8 GB system
- fitting a KMeans model on a 200.000 samples*200 values table.
- running with n-jobs = -1: (after adding the
if __name__ == '__main__':
line to my script) I see the script starting 4 processes with 10 threads each. Each process uses about 25% of the CPU (total: 100%). Seems to work as expected - running with n-jobs = 1: stays on a single process (not a surprise), with 20 threads, and also uses 100% of the CPU.
My question: what is the point of using n-jobs (and joblib) if the the library uses all cores anyway? Am I missing something? Is it a Windows-specific behaviour?
-
Kai over 3 yearscan you please explain why?