I am following the machine learning book from Aurelion Geron.
I am experimenting with the ColumnTransformer
class. When I include SimplerImputer
, an additional columns was created. I understand that SimplerImputer
is for filling up missing value in column total_bedrooms
(column index 4 in result) , hence I am not clear why it is adding new column (column index: 10) in result.
When i do not include
the SimplerImputer
from ColumnTransformer
, but create an instance, and fit_transform
the output of the ColumnTransformer
, i will not get the additional column. Please advise.
category_att = X.select_dtypes(include='object').columns
num_att = X.select_dtypes(include='number').columns
transformer = ColumnTransformer(
[
('adder', AttributeAdder(), num_att ),
('imputer', SimpleImputer(strategy='median'), ['total_bedrooms']),
('ohe', OneHotEncoder(), category_att)
],
remainder = 'passthrough'
)
Custom Class for adding two new feature/columns
class AttributeAdder(BaseEstimator, TransformerMixin):
def __init__(self, add_bed_room = False):
self.add_bed_room = add_bed_room
def fit(self,y=None):
return self
def transform(self,X,y=None):
room_per_household = X.iloc[: , t_room ] / X.iloc[: , t_household ]
population_per_household = X.iloc[: , t_population ] / X.iloc[: , t_household ]
return np.c_[X,room_per_household,population_per_household]