I've trained several models in Keras. I have 39, 592 samples in my training set, and 9, 899 in my validation set. I used a batch size of 2.
As I was examining my code, it occurred to me that my generators may have been missing some batches of data.
This is the code for my generator:
train_datagen = ImageDataGenerator(
rescale=1. / 255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
val_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(224, 224)
batch_size=batch_size,
class_mode='categorical')
validation_generator = val_datagen.flow_from_directory(
val_dir,
target_size=(224, 224),
batch_size=batch_size,
class_mode='categorical')
I searched around to see how my generators behave, and found this answer: what if steps_per_epoch does not fit into numbers of samples?
I calculated my steps_per_epoch and validation_steps this way:
steps_per_epoch = int(number_of_train_samples / batch_size)
val_steps = int(number_of_val_samples / batch_size)
Using the code in this link with my own batch size and number of samples, I got these results: "missing the last batch" for train_generator and "weird behavior" for val_generator.
I'm afraid that I have to retrain my models again. What values should I choose for steps_per_epoch and validation_steps? Is there a way to use exact values for these variables(Other than setting batch_size to 1 or removing some of the samples)? I have several other models with different number of samples, and I think they've all been missing some batches. Any help would be much appreciated.
Two related question:
1- Regarding the models I already trained, are they reliable and properly trained?
2- What would happen if I set these variables using following values:
steps_per_epoch = np.ceil(number_of_train_samples / batch_size)
val_steps = np.ceil(number_of_val_samples / batch_size)
will my model see some of the images more than once in each epoch during training and validation? or Is this the solution to my question?!