3

I am working with Keras for about three months and now I wonder whether it could be useful to train on batches of different (random) sizes (16, 32, 64, 128), to combine the benefits of the different batch sizes.

I didn't found any document that answers this question. Am I totally wrong?

Maxim
  • 52,561
  • 27
  • 155
  • 209
JuliB
  • 73
  • 1
  • 7

3 Answers3

0

I've seen two most popular strategies of working with the batch size:

  • Select it as large as possible so that the model still fits in GPU memory. This is done mostly to speed up training due to parallelism and vectorization.

  • Tune batch size, just like any other hyper-parameter, either via random search or via Bayesian Optimization. Surprisingly, bigger batch size doesn't always mean better model performance (though in many cases it does). See this discussion on this matter: the main idea is that extra noise in training can be beneficial to generalization. Remember that L2 regularization is equivalent to adding Gaussian noise to x. Reducing batch size also adds noise to the training process, especially if you're using batch-norm.

I don't know any work on changing the batch size for the same model during training. But choosing the random batch size for different models can have benefits for sure.

Caveat: in some settings, e.g., in deep reinforcement learning, extra noise actually hurts the performance, so in case reducing batch size can be a bad idea. So, as always, it greatly depends on your problem.

Maxim
  • 52,561
  • 27
  • 155
  • 209
  • Thank you for this. Is there an easy way to calculate how many images will fit on your GPU with the popular frameworks? – Moondra Jan 14 '18 at 01:58
  • That depends on the network, because it takes memory for variables, gradients, etc. In general, for CIFAR-10 images, i.e. 32x32x3, the max batch size is about 500-1000. – Maxim Jan 14 '18 at 07:01
0

I'm not sure if my answer fits quite well to what you're looking for. In principle what you could do in order to have random batch sizes during training is to make the fit in a loop. That is, in each loop step you save the model weights, then you load it into the model, change the batch size, compile it and fit again. It might be possible to implement a callback class to do that internally instead of needing a loop.

Though I think this is not exactly what you want, since the batch size just change once the net goes over the full sample. But, as previous answer, so far we don't have a option to do that per epoch (in principle it could be implemented).

0

I've trained neural network with batches that are sequential but the amount of data is not so high and model accuracy is low. I was recommended to train it using random generated batches. Theoretically it shall improve the model prediction.

Marcin Wu
  • 11
  • 2