what if steps_per_epoch does not fit into numbers of samples?

Question

using Keras fit_generator, steps_per_epoch should be equivalent to the total number available of samples divided by the batch_size.

But how would the generator or the fit_generator react if I choose a batch_size that does not fit n times into the samples? Does it yield samples until it cannot fill a whole batch_size anymore or does it just use a smaller batch_size for the last yield?

Why I ask: I divide my data into train/validation/test of different size (different %) but would use the same batch size for train and validation sets but especially for train and test sets. As they are different in size I cannot guarantee that batch size fit into the total amount of samples.

score 7 · Answer 1 · edited Jun 20 '20 at 09:12

7

If it's your generator with `yield`

It's you who create the generator, so the behavior is defined by you.

If steps_per_epoch is greater than the expected batches, fit will not see anything, it will simply keep requesting batches until it reaches the number of steps.

The only thing is: you must assure your generator is infinite.

Do this with while True: at the beginning, for instance.

If it's a generator from `ImageDataGenerator`.

If the generator is from an ImageDataGenerator, it's actually a keras.utils.Sequence and it has the length property: len(generatorInstance).

Then you can check yourself what happens:

remainingSamples = total_samples % batch_size #confirm that this is gerater than 0
wholeBatches = total_samples // batch_size
totalBatches = wholeBatches + 1

if len(generator) == wholeBatches:
    print("missing the last batch")    
elif len(generator) == totalBatches:
    print("last batch included")
else:
    print('weird behavior')

And check the size of the last batch:

lastBatch = generator[len(generator)-1]

if lastBatch.shape[0] == remainingSamples:
    print('last batch contains the remaining samples')
else:
    print('last batch is different')

edited Jun 20 '20 at 09:12

Community

1
1

answered Jun 01 '18 at 13:05

Daniel Möller

84,878
18
192
214

Hi Daniel. Great to see you again, I use a classic "while nlen(Data). I was just wondering what the fit_generator function does with no returned values. – Florida Man Jun 01 '18 at 13:09
See improved answer. – Daniel Möller Jun 01 '18 at 13:11
A more complete answer it is :) However, to cover cases when `total_samples` *is* a multiple of `batch_size`, I would write `totalBatches = wholeBatches + (remainingSamples != 0)` (or simply `totalBatches = np.ceil(total_samples / batch_size)`, and change a bit the conditions below accordingly...? – benjaminplanche Jun 01 '18 at 13:22
Indeed, I was only "checking" a `Sequence` generator to see what it does (since you don't create it). – Daniel Möller Jun 01 '18 at 13:38

score 2 · Answer 2 · answered Jun 01 '18 at 13:08

If you assign N to the parameter steps_per_epoch of fit_generator(), Keras will basically call your generator N times before considering one epoch done. It's up to your generator to yield all your samples in N batches.

Note that since for most models it is fine to have different batch sizes each iteration, you could fix steps_per_epoch = ceil(dataset_size / batch_size) and let your generator output a smaller batch for the last samples.

score 0 · Answer 3 · answered Jun 19 '21 at 19:11

i had facing the same logical error solved it with defining steps_per_epochs

BS = 32
steps_per_epoch=len(trainX) // BS
history = model.fit(train_batches,
                epochs=initial_epochs,steps_per_epoch=steps_per_epoch,
                validation_data=validation_batches)

what if steps_per_epoch does not fit into numbers of samples?

3 Answers3

If it's your generator with `yield`

If it's a generator from `ImageDataGenerator`.

Linked

what if steps_per_epoch does not fit into numbers of samples?

3 Answers3

If it's your generator with yield

If it's a generator from ImageDataGenerator.

Linked

If it's your generator with `yield`

If it's a generator from `ImageDataGenerator`.