For making the training of a model faster, it seems to be a good practice to populate/generate batches on CPU and run the training of the model on GPU in parallel. For this purpose, a generator class can be written in Python that inherits the Sequence
class.
Here is the link to the documentation: https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence
The important thing that the document states is:
Sequence
are a safer way to do multiprocessing. This structure guarantees that the network will only train once on each sample per epoch which is not the case with generators.
And it gives a simple code example as following:
from skimage.io import imread
from skimage.transform import resize
import numpy as np
import math
# Here, `x_set` is list of path to the images
# and `y_set` are the associated classes.
class CIFAR10Sequence(Sequence):
def __init__(self, x_set, y_set, batch_size):
self.x, self.y = x_set, y_set
self.batch_size = batch_size
def __len__(self):
return math.ceil(len(self.x) / self.batch_size)
def __getitem__(self, idx):
batch_x = self.x[idx * self.batch_size:(idx + 1) *
self.batch_size]
batch_y = self.y[idx * self.batch_size:(idx + 1) *
self.batch_size]
return np.array([
resize(imread(file_name), (200, 200))
for file_name in batch_x]), np.array(batch_y)
What - to my understanding - ideally needs to be done in the model is to create an instance of this generator class and give it to the fit_generator(...)
function.
gen = CIFAR10Sequence(x_set, y_set, batch_size)
# Train the model
model.fit_generator(generator=gen,
use_multiprocessing=True,
workers=6)
Here is a quote from Keras documentation:
The use of
keras.utils.Sequence
guarantees the ordering and guarantees the single use of every input per epoch when usinguse_multiprocessing=True
.
In this shape, I assume that this setup is thread safe. Question 1) Is my assumption correct?
One confusing thing though is that the parameter use_multiprocessing
may not be set to True on Windows 10. Keras does not allow it; seemingly it only can be set to True on Linux. (I don't know how it is in other platforms.) But the workers
parameter can still be set to a value that is greater than 0.
Let's have a look at the definition of these 2 parameters:
workers:
Integer. Maximum number of processes to spin up when using process-based threading. If unspecified, workers will default to 1. If 0, will execute the generator on the main thread.
use_multiprocessing:
Boolean. If True, use process-based threading. If unspecified, use_multiprocessing will default to False. Note that because this implementation relies on multiprocessing, you should not pass non-picklable arguments to the generator as they can't be passed easily to children processes.
So, by using the workers
parameter, it seems to be possible to create multiple processes to speed up the training independent from whether use_multiprocessing
is True or not.
If one wants to use the generator class inheriting Sequence
(on Windows 10), s/he has to set the use_multiprocessing
to False as following:
gen = CIFAR10Sequence(x_set, y_set, batch_size)
# Train the model
model.fit_generator(generator=gen,
use_multiprocessing=False, # CHANGED
workers=6)
And there are still multiple processes running here because workers = 6.
Question 2) Is this setup still thread safe or is the thread safe characteristic lost now after setting the use_multiprocessing
parameter to False? I cannot make it clear based on documentation.
Question 3) Still related to this topic... When training is done in this way where data is generated by the CPU and training on GPU, if the model that is being trained is shallow, the GPU utilization ends up being very low and CPU utilization becomes significantly higher because the GPU keeps waiting for data that will come from CPU. In such cases, is there a way to utilize some GPU resources as well for data generation?