0
    trf1=ColumnTransformer([("Infuse_val",SimpleImputer(strategy="mean"),[0])],remainder="passthrough")
    trf4=ColumnTransformer([("One_hot",OneHotEncoder(sparse=False,handle_unknown="ignore"),[1,4])],remainder="passthrough")
    trf2=ColumnTransformer([("Ord_encode",OrdinalEncoder(categories=["Strong","Mild"]),[3])],remainder="passthrough")
    trf3=ColumnTransformer([("scale",StandardScaler(),[0,2])],remainder="passthrough")
    pipe = Pipeline([
        ('trf1',trf1),
        ('trf2',trf2),
        ('trf3',trf3),
        ('trf4',trf4),
    ])
    pipe.fit(x_train,y_tarin)

Error

ValueError: Shape mismatch: if categories is an array, it has to be of shape (n_features,).

The table is

enter image description here

I don't understand what's the error here in my code?

1 Answers1

1

The error isn't about the column transformers, it's about the OrdinalEncoder. categories needs to be a list of lists: for each column, the list of categories in that column. Since you have just one column, categories=[["Strong","Mild"]] should work.

With just two categories, most subsequent algorithms won't care which one is 0 or 1, so here you could just use the default auto.

Finally, you'll have problems with your column transformers. The change the order (and names) of the columns, so by the end of the pipeline, scaling columns 0 and 2 might not be the two numeric columns. The column order is predictable (transformers in order followed by passthrough), so you could manually keep track. But I would suggest a single column transformer with multiple pipelines instead.

Ben Reiniger
  • 10,517
  • 3
  • 16
  • 29
  • ValueError: could not convert string to float: 'Male' . Even after doing above categories='auto' it throws the error – Akash Mukherjee Jul 24 '22 at 14:57
  • @AkashMukherjee That's possibly because of the third paragraph of my answer. – Ben Reiniger Jul 24 '22 at 16:04
  • Hey Ben sorry to disturb you , but could you please elaborate your third para, what you said? I'm not able to understand. I did this :- trf=ColumnTransformer([("Infuse_val",SimpleImputer(strategy="mean"),[0]), ("One_hot",OneHotEncoder(sparse=False,handle_unknown="ignore"),[1,4]), ("scale",StandardScaler(),[0,1,2,3,4])]) What changes needs to be done here? – Akash Mukherjee Jul 28 '22 at 14:49
  • Ah, see https://stackoverflow.com/q/65554163/10495893. The scaler sees the original columns 0--4, _not_ the already-imputed/encoded columns 0, 1, 4. – Ben Reiniger Jul 28 '22 at 15:07