Insert or delete a step in scikit-learn Pipeline

11,954

Solution 1

I see that everyone mentioned only the delete step. In case you want to also insert a step in the pipeline:

pipe.steps.append(['step name',transformer()])

pipe.steps works in the same way as lists do, so you can also insert an item into a specific location:

pipe.steps.insert(1,['estimator',transformer()]) #insert as second step

Solution 2

Based on rudimentary testing you can safely remove a step from a scikit-learn pipeline just like you would any list item, with a simple

clf_pipeline.steps.pop(n)

where n is the position of the individual estimator you are trying to remove.

Solution 3

Just chiming in because I feel like the other answers answered the question of adding steps to a pipeline really well, but didn't really cover how to delete a step from a pipeline.

Watch out with my approach though. Slicing lists in this instance is a bit weird.

from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.preprocessing import PolynomialFeatures

estimators = [('reduce_dim', PCA()), ('poly', PolynomialFeatures()), ('svm', SVC())]
clf = Pipeline(estimators)

If you want to create a pipeline with just steps PCA/Polynomial you can just slice the list step by indexes and pass it to Pipeline

clf1 = Pipeline(clf.steps[0:2])

Want to just use steps 2/3? Watch out these slices don't always make the most amount of sense

clf2 = Pipeline(clf.steps[1:3])

Want to just use steps 1/3? I can't seem to do using this approach

clf3 = Pipeline(clf.steps[0] + clf.steps[2]) # errors

Solution 4

Yes, that's possible, but you must fulfill same requirements which Pipeline requires at initialization, i.e. you cannot insert predictor in any step except last, you should call fit after you update Pipeline.steps, because after such update all steps (maybe they were learned in previous fit calls) will be invalidated, also last step of Pipeline should always implement fit method, all previous steps should implement fit_transform.

So yes, it will work in current codebase, but i think it's not a good solution for your task, it makes your code more dependent on current implementation of Pipeline, i think it's more convenient to create new Pipeline with modified steps, because Pipeline will at least validate all your steps in initialization, also creating new Pipeline will not significantly differ in terms of speed from modifying steps of existing pipeline, but as i've just said - creation of new Pipeline after each modification of steps is safer in case when someone will significantly change implementation of Pipeline.

Share:
11,954
Bin
Author by

Bin

Updated on June 02, 2022

Comments

  • Bin
    Bin almost 2 years

    Is it possible to delete or insert a step in a sklearn.pipeline.Pipeline object?

    I am trying to do a grid search with or without one step in the Pipeline object. And wondering whether I can insert or delete a step in the pipeline. I saw in the Pipeline source code, there is a self.steps object holding all the steps. We can get the steps by named_steps(). Before modifying it, I want to make sure, I do not cause unexpected effects.

    Here is a example code:

    from sklearn.pipeline import Pipeline
    from sklearn.svm import SVC
    from sklearn.decomposition import PCA
    estimators = [('reduce_dim', PCA()), ('svm', SVC())]
    clf = Pipeline(estimators)
    clf 
    

    Is it possible that we do something like steps = clf.named_steps(), then insert or delete in this list? Does this cause undesired effect on the clf object?