I am working on a binary classification problem using LSTM layers where I classify each timestep as belonging to a class (0,1). I have sequences of variable sizes. So, I padded and masked those extra steps at the end of the sequence using the -1 value. My input follows the shape (300,2000,8)
, so all my 300 samples have 2000 timesteps and 8 features. If a sample originally had 1500 timesteps, I, therefore, add 500 extra steps at the end as with the value = -1, to each of the 8 features.
I added the padding for both my inputs = train_x and the labels = train_y, so the shape of train_y is actually (300,2000,1)
. So, both the input and label vectors are padded with a -1 signaling the steps to ignore.
Now, I have some doubts that been causing my headaches for days.From what I understood from here, whenever keras 'sees' a timestep where all the feature values are -1 it will ignore it in the processing. However, when I access the tensors in my custom loss function, y_predictions and y_labels, the timesteps that have a value of -1 in the y_labels also have a prediction given by the model (e.g. a value between 0 and 1) that is usually the same for all -1 timesteps. I am wondering if I did something wrong?
Should I pad and mask only the features vector and keep the vector of labels with their original size when passing it to the model?
I think I end up 'ignoring' the -1 timesteps by doing this at the start of the loss function and then use only the indexes of the timesteps where y_true is != -1, when doing the calculations and returning the loss value. Does it make sense?
pos_class = tf.where(y_true > 0) neg_class = tf.where(y_true == 0) ... rest of calculations ...
The code for the model building part goes as follows:
# train_x -> (300,2000,8)
# train_y -> (300,2000,1)
# both already padded and masked with -1 for the extra steps
input_layer = Input(shape=(2000, 8))
mask_1 = Masking(mask_value=-1)(input_layer)
lstm_1 = LSTM(64, return_sequences=True)(mask_1)
dense_1 = Dense(1, activation="sigmoid")(lstm_1)
model = Model(inputs=input_layer, outputs=dense_1)
model.summary()
optimizer = Adam(lr=0.001)
model.compile(optimizer=optimizer, loss=CustomLossFun, metrics=[CustomMcc(), CustomPrecision()])
train_model = model.fit(x = train_x, y = train_y, ...)