Doc on CT:
remainder{‘drop’, ‘passthrough’} or estimator, default=’drop’
By default, only the specified columns in transformers are transformed and combined in the output, and the non-specified columns are dropped. (default of 'drop'). By specifying remainder='passthrough', all remaining columns that were not specified in transformers will be automatically passed through. This subset of columns is concatenated with the output of the transformers. By setting remainder to be an estimator, the remaining non-specified columns will use the remainder estimator. The estimator must support fit and transform. Note that using this feature requires that the DataFrame columns input at fit and transform have identical order.
I believe this remainder= is not relevant to the field being OneHot Encoded. I would like to know how is the OHE field (eg. 'CatX') being handled.
When I do a standalone CT transform, I see that 'CatX' does not appear in the output.
ct = ColumnTransformer(transformers=[('OHE',ohe,ohe_col)],remainder='passthrough')
When I do a standalone CT with OHE repeated, it is successful (ie OHE 2 times). This tells me that within CT, the field is still available but only removed on exiting CT.
ct = ColumnTransformer(transformers=[('OHE',ohe,ohe_col),('OHE1',ohe,ohe_col)],
remainder='passthrough')
Then I tried putting this in a Pipeline, I tried doing CT twice. This is the confusing part. It passed. This tells me that the first CT1 passed 'CatX' to CT2.
ct = ColumnTransformer(transformers=[('OHE',ohe,ohe_col)],remainder='passthrough')
Pipeline([('ct1',ct),('ct2',ct)('model',v)])
Question:
- When using Pipeline, who is controlling whether CT would pass 'CatX' on exit ?
- When using Pipeline, if the 'CatX' is being passed, then won't the model be able to process it ?
I hope my question is clear. Thanks for any answers in advance.