0

I am working on a binary classification problem using LSTM layers where I classify each timestep as belonging to a class (0,1). I have sequences of variable sizes. So, I padded and masked those extra steps at the end of the sequence using the -1 value. My input follows the shape (300,2000,8), so all my 300 samples have 2000 timesteps and 8 features. If a sample originally had 1500 timesteps, I, therefore, add 500 extra steps at the end as with the value = -1, to each of the 8 features.

I added the padding for both my inputs = train_x and the labels = train_y, so the shape of train_y is actually (300,2000,1). So, both the input and label vectors are padded with a -1 signaling the steps to ignore.

Now, I have some doubts that been causing my headaches for days.From what I understood from here, whenever keras 'sees' a timestep where all the feature values are -1 it will ignore it in the processing. However, when I access the tensors in my custom loss function, y_predictions and y_labels, the timesteps that have a value of -1 in the y_labels also have a prediction given by the model (e.g. a value between 0 and 1) that is usually the same for all -1 timesteps. I am wondering if I did something wrong?

  1. Should I pad and mask only the features vector and keep the vector of labels with their original size when passing it to the model?

  2. I think I end up 'ignoring' the -1 timesteps by doing this at the start of the loss function and then use only the indexes of the timesteps where y_true is != -1, when doing the calculations and returning the loss value. Does it make sense?

    pos_class = tf.where(y_true > 0)
    neg_class = tf.where(y_true == 0)
    ... rest of calculations ...
    

The code for the model building part goes as follows:

# train_x -> (300,2000,8)
# train_y -> (300,2000,1)
# both already padded and masked with -1 for the extra steps

input_layer = Input(shape=(2000, 8))
mask_1 = Masking(mask_value=-1)(input_layer)
lstm_1 = LSTM(64, return_sequences=True)(mask_1)
dense_1 = Dense(1, activation="sigmoid")(lstm_1)
model = Model(inputs=input_layer, outputs=dense_1)
model.summary()
optimizer = Adam(lr=0.001)
model.compile(optimizer=optimizer, loss=CustomLossFun, metrics=[CustomMcc(), CustomPrecision()])
train_model = model.fit(x = train_x, y = train_y, ...)
Kunis
  • 576
  • 7
  • 24

1 Answers1

0

Got to understood the answer from this question.

I will still obtain a tensor with the size of the entire sequence in the loss function. However, the values that were masked are there just as fillers and were not computed (that's probably why they're the same value repeated). In the case of my loss function I still need to ignore those though.

Kunis
  • 576
  • 7
  • 24