How to execute both parallel and serial transformations with sklearn pipeline?

Question

I'd like to execute some preprocessing like this diagram using sklearn's pipelines.

I can do this without any problems if I leave off the standardization step. But I cannot understand how to indicate that the output from the imputation step should flow to the standardization step.

Here is the current code without the standardization step:

preprocessor = ColumnTransformer(
    transformers=[
        ("numeric_imputation", NumericImputation(), dq.numeric_variables),
        ("onehot", OneHotEncoder(handle_unknown="ignore"), dq.categorical_variables),
    ],
    remainder="passthrough",
)

bp2 = make_pipeline(
    preprocessor, ElasticNet()
)

amiola · Accepted Answer · 2022-01-17T18:22:33.600

The fact is that ColumnTransformer applies its transformers in parallel to the dataset you're passing to it. Therefore if you're adding the transformer which standardizes your numeric data as the second step in your transformers list, this won't apply on the output of the imputation, but rather on the initial dataset.

One possibility to solve such problem is to enclose the transformations on the numeric columns in a Pipeline.

preprocessor = ColumnTransformer([
    ('num_pipe', Pipeline([('numeric_imputation', NumericImputation()),
                           ('standardizer', YourStandardizer())]), dq.numeric_variables),
    ('onehot', OneHotEncoder(handle_unknown="ignore"), dq.categorical_variables)],
remainder = 'passthrough')

I would suggest you the following posts on a similar topic:

(you'll find some others linked within them).

Is there a way I can use different set of columns for the numeric Imputation and YourStandardizer in above ? I want to execute pipeline and column transformer. But need to apply 2 different transformers for a few of the same columns. Also, though some of the columns are same in the transformations, some columns are different. https://stackoverflow.com/questions/73480526/apply-transformation-a-for-a-subset-of-numerical-columns-and-apply-transformatio — tjt, Aug 25 '22 at 00:02

How to execute both parallel and serial transformations with sklearn pipeline?

1 Answers1

Linked