4

using Keras fit_generator, steps_per_epoch should be equivalent to the total number available of samples divided by the batch_size.

But how would the generator or the fit_generator react if I choose a batch_size that does not fit n times into the samples? Does it yield samples until it cannot fill a whole batch_size anymore or does it just use a smaller batch_size for the last yield?

Why I ask: I divide my data into train/validation/test of different size (different %) but would use the same batch size for train and validation sets but especially for train and test sets. As they are different in size I cannot guarantee that batch size fit into the total amount of samples.

Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
Florida Man
  • 2,021
  • 3
  • 25
  • 43

3 Answers3

7

If it's your generator with yield

It's you who create the generator, so the behavior is defined by you.

If steps_per_epoch is greater than the expected batches, fit will not see anything, it will simply keep requesting batches until it reaches the number of steps.

The only thing is: you must assure your generator is infinite.

Do this with while True: at the beginning, for instance.

If it's a generator from ImageDataGenerator.

If the generator is from an ImageDataGenerator, it's actually a keras.utils.Sequence and it has the length property: len(generatorInstance).

Then you can check yourself what happens:

remainingSamples = total_samples % batch_size #confirm that this is gerater than 0
wholeBatches = total_samples // batch_size
totalBatches = wholeBatches + 1

if len(generator) == wholeBatches:
    print("missing the last batch")    
elif len(generator) == totalBatches:
    print("last batch included")
else:
    print('weird behavior')

And check the size of the last batch:

lastBatch = generator[len(generator)-1]

if lastBatch.shape[0] == remainingSamples:
    print('last batch contains the remaining samples')
else:
    print('last batch is different')
Community
  • 1
  • 1
Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • Hi Daniel. Great to see you again, I use a classic "while nlen(Data). I was just wondering what the fit_generator function does with no returned values. – Florida Man Jun 01 '18 at 13:09
  • See improved answer. – Daniel Möller Jun 01 '18 at 13:11
  • A more complete answer it is :) However, to cover cases when `total_samples` *is* a multiple of `batch_size`, I would write `totalBatches = wholeBatches + (remainingSamples != 0)` (or simply `totalBatches = np.ceil(total_samples / batch_size)`, and change a bit the conditions below accordingly...? – benjaminplanche Jun 01 '18 at 13:22
  • Indeed, I was only "checking" a `Sequence` generator to see what it does (since you don't create it). – Daniel Möller Jun 01 '18 at 13:38
2

If you assign N to the parameter steps_per_epoch of fit_generator(), Keras will basically call your generator N times before considering one epoch done. It's up to your generator to yield all your samples in N batches.

Note that since for most models it is fine to have different batch sizes each iteration, you could fix steps_per_epoch = ceil(dataset_size / batch_size) and let your generator output a smaller batch for the last samples.

benjaminplanche
  • 14,689
  • 5
  • 57
  • 69
0

i had facing the same logical error solved it with defining steps_per_epochs

BS = 32
steps_per_epoch=len(trainX) // BS
history = model.fit(train_batches,
                epochs=initial_epochs,steps_per_epoch=steps_per_epoch,
                validation_data=validation_batches)
Muhammad Zakaria
  • 1,269
  • 6
  • 14