0

Given a code which uses multiple graphs or multiple versions of the same graph, it is sometimes necessary to ensure that a particular graph uses only CPU for computation, while some other graph uses only GPU.

The basic question is

How to make sure that a particular graph makes use of only CPU (XOR) GPU for computations ?

There is not an exhaustive discussion of this topic on SO and hence this question.

I have tried a number of different approaches and none seem to work as will be outlined below.

Before further details on the question and various options that have been tried, I will lay down following details :-

  • TensorFlow Version : 'v1.1.0-rc2-1003-g3792dd9' 1.1.0-rc2 (Compiled from source)
  • OS details : CentOS Linux release 7.2.1511 (Core)
  • Bazel version : 0.4.5
  • Basic code with which various approaches have been tried is mentioned below :

    import tensorflow as tf
    from tensorflow.python.client import timeline
    import matplotlib.pyplot as plt
    
    
    
    def coloraugment(image):
        output = tf.image.random_brightness(image, max_delta=10./255.)
        output = tf.clip_by_value(output, 0.0, 1.0)
        output = tf.image.random_saturation(output, lower=0.5, upper=1.5)
        output = tf.clip_by_value(output, 0.0, 1.0)
        output = tf.image.random_contrast(output, lower=0.5, upper=1.5)
        output = tf.clip_by_value(output, 0.0, 1.0)
        return output
    
    
    def augmentbody(image, sz):
    
        for i in range(10):
            if i == 0:
                cropped = tf.random_crop(value=image, size=sz)
                croppedflipped = tf.image.flip_left_right(cropped)
                out = tf.stack([cropped, croppedflipped], axis=0)
            else:
                cropimg = tf.random_crop(value=image, size=sz)
                augcolor = coloraugment(cropimg)
                augflipped = tf.image.flip_left_right(augcolor)
                coll = tf.stack([augcolor, augflipped], axis=0)
                out = tf.concat([coll, out], axis=0)
    
        out = tf.random_shuffle(out)
        return out
    
    
    def aspect1(aspectratio):
        newheight = tf.constant(256, dtype=tf.float32)
        newwidth = tf.divide(newheight, aspectratio)
        newsize = tf.stack([newheight, newwidth], axis=0)
        newsize = tf.cast(newsize, dtype=tf.int32)
        return newsize
    
    
    def aspect2(aspectratio):
        newwidth = tf.constant(256, dtype=tf.float32)
        newheight = tf.multiply(newwidth, aspectratio)
        newsize = tf.stack([newheight, newwidth], axis=0)
        newsize = tf.cast(newsize, dtype=tf.int32)
        return newsize
    
    
    def resize_image(image):
        imageshape = tf.shape(image)
        imageheight = tf.cast(tf.gather(imageshape, tf.constant(0, dtype=tf.int32)),
                              dtype=tf.float32)
        imagewidth = tf.cast(tf.gather(imageshape, tf.constant(1, dtype=tf.int32)),
                             dtype=tf.float32)
    
        aspectratio = tf.divide(imageheight, imagewidth)
        newsize = tf.cond(tf.less_equal(imageheight, imagewidth),
                          lambda: aspect1(aspectratio),
                          lambda: aspect2(aspectratio))
        image = tf.image.convert_image_dtype(image, dtype=tf.float32)
        image = tf.image.resize_images(image, newsize)
        return image
    
    
    def readimage(file_queue):
        reader = tf.WholeFileReader()
        key, value = reader.read(file_queue)
        image = tf.image.decode_jpeg(value)
        image = resize_image(image)
    
        return image
    
    
    if __name__ == "__main__":
        queue = tf.train.string_input_producer(["holly2.jpg"])
        image = readimage(queue)
        augmented = augmentbody(image, [221,221,3])
        init_op = tf.global_variables_initializer()
        config_cpu = tf.ConfigProto()
        config = tf.ConfigProto(
            device_count = {'GPU': 0}
        )
        sess_cpu = tf.Session(config=config)
        with tf.Session(config=config_cpu) as sess:
            run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
            run_metadata = tf.RunMetadata()
            coord = tf.train.Coordinator()
            threads = tf.train.start_queue_runners(sess=sess, coord=coord)
            sess.run(init_op)
            [tense] = sess.run([augmented],options=run_options, run_metadata=run_metadata)
            coord.request_stop()
            coord.join(threads)
            tl = timeline.Timeline(run_metadata.step_stats)
            ctf = tl.generate_chrome_trace_format()
            with open('timeline.json', 'w') as f:
                f.write(ctf)
    
        print("The tensor size is {}".format(tense.shape))
        numcols = tense.shape[0]/2
        for i in range(tense.shape[0]):
            plt.subplot(2,numcols,i+1)
            plt.imshow(tense[i, :, :, :])
    
        plt.show()
        plt.close()
    

Various approaches which have been tried

Various related questions exist on SO with accepted answers, but they do not seem to work very well as I outline with examples and outputs.

Various tried approaches

Approach 1

Related question is ( Run Tensorflow on CPU ). The accepted answer is to run tf.Session() with the following configuration :

config = tf.ConfigProto(
        device_count = {'GPU': 0}
    )
sess = tf.Session(config=config)

The corresponding output is :

2017-05-18 13:34:27.477189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7715
pciBusID 0000:04:00.0
Total memory: 7.92GiB
Free memory: 7.80GiB
2017-05-18 13:34:27.477232: I tensorflow/core/common_runtime/gpu/gpu_device.cc:927] DMA: 0 
2017-05-18 13:34:27.477240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:937] 0:   Y 
2017-05-18 13:34:27.477259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:996] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:04:00.0)
2017-05-18 13:34:27.482600: I tensorflow/core/common_runtime/gpu/gpu_device.cc:996] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:04:00.0)
2017-05-18 13:34:27.848864: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
2017-05-18 13:34:27.848902: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 40 visible devices
2017-05-18 13:34:27.851670: I tensorflow/compiler/xla/service/service.cc:184] XLA service 0x7f0fd81d5500 executing computations on platform Host. Devices:
2017-05-18 13:34:27.851688: I tensorflow/compiler/xla/service/service.cc:192]   StreamExecutor device (0): <undefined>, <undefined>
2017-05-18 13:34:27.851894: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
2017-05-18 13:34:27.851903: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 40 visible devices
2017-05-18 13:34:27.854698: I tensorflow/compiler/xla/service/service.cc:184] XLA service 0x7f0fd82b4c50 executing computations on platform CUDA. Devices:
2017-05-18 13:34:27.854713: I tensorflow/compiler/xla/service/service.cc:192]   StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
2017-05-18 13:34:28.918980: I tensorflow/stream_executor/dso_loader.cc:139] successfully opened CUDA library libcupti.so.8.0 locally

You can clearly see that the GPU is still being used and the XLA service is running on GPU

Approach 2

Related question is ( Run Tensorflow on CPU ). This answer states that the following environment variable can be set as follows to use CPU

CUDA_VISIBLE_DEVICES=""

When the GPU computation is required, it can be unset.

The corresponding output is

2017-05-18 13:42:24.871020: E tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_NO_DEVICE
2017-05-18 13:42:24.871071: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: nefgpu12
2017-05-18 13:42:24.871081: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: nefgpu12
2017-05-18 13:42:24.871114: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 367.48.0
2017-05-18 13:42:24.871147: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  367.48  Sat Sep  3 18:21:08 PDT 2016
GCC version:  gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) 
"""
2017-05-18 13:42:24.871170: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 367.48.0
2017-05-18 13:42:24.871178: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 367.48.0
2017-05-18 13:42:25.159632: W tensorflow/compiler/xla/service/platform_util.cc:61] platform CUDA present but no visible devices found
2017-05-18 13:42:25.159674: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 40 visible devices
2017-05-18 13:42:25.162626: I tensorflow/compiler/xla/service/service.cc:184] XLA service 0x7f5798002df0 executing computations on platform Host. Devices:
2017-05-18 13:42:25.162663: I tensorflow/compiler/xla/service/service.cc:192]   StreamExecutor device (0): <undefined>, <undefined>
2017-05-18 13:42:25.223309: I tensorflow/stream_executor/dso_loader.cc:139] successfully opened CUDA library libcupti.so.8.0 locally

You can see from this output that the GPU is not being used.

Approach 3

The related question is ( Running multiple graphs in different device modes in TensorFlow ). One answer gives the following solution :

# The config for CPU usage
config_cpu = tf.ConfigProto()
config_cpu.gpu_options.visible_device_list=''
sess_cpu = tf.Session(config=config_cpu)

# The config for GPU usage
config_gpu = tf.ConfigProto()
config_gpu.gpu_options.visible_device_list='0'
sess_gpu = tf.Session(config=config_gpu)

The output of using the configuration for CPU usage as outlined in the solution is as follows :

2017-05-18 13:50:32.999431: I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7715
pciBusID 0000:04:00.0
Total memory: 7.92GiB
Free memory: 7.80GiB
2017-05-18 13:50:32.999472: I tensorflow/core/common_runtime/gpu/gpu_device.cc:927] DMA: 0 
2017-05-18 13:50:32.999478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:937] 0:   Y 
2017-05-18 13:50:32.999490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:996] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:04:00.0)
2017-05-18 13:50:33.084737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:996] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:04:00.0)
2017-05-18 13:50:33.395798: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
2017-05-18 13:50:33.395837: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 40 visible devices
2017-05-18 13:50:33.398634: I tensorflow/compiler/xla/service/service.cc:184] XLA service 0x7f08181ecfa0 executing computations on platform Host. Devices:
2017-05-18 13:50:33.398695: I tensorflow/compiler/xla/service/service.cc:192]   StreamExecutor device (0): <undefined>, <undefined>
2017-05-18 13:50:33.398908: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
2017-05-18 13:50:33.398920: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 40 visible devices
2017-05-18 13:50:33.401731: I tensorflow/compiler/xla/service/service.cc:184] XLA service 0x7f081821e1f0 executing computations on platform CUDA. Devices:
2017-05-18 13:50:33.401745: I tensorflow/compiler/xla/service/service.cc:192]   StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
2017-05-18 13:50:34.484142: I tensorflow/stream_executor/dso_loader.cc:139] successfully opened CUDA library libcupti.so.8.0 locally

You can see that the GPU is still being used.

Community
  • 1
  • 1
Ujjwal
  • 1,849
  • 2
  • 17
  • 37
  • constructing your graph with `with tf.device('/cpu:0)` will force it to run on CPU – Yaroslav Bulatov May 18 '17 at 14:10
  • Two follow-up questions on this : 1. When I am using multiprocessing and running multiple copies of the same graph on 32 processes, would `/cpu:0` be appropriate ? 2. Is there a more flexible approach, where I run some of its copies on CPU and some copies on GPU ? – Ujjwal May 18 '17 at 14:14
  • 1
    `cpu:0` refers to all cores. There's no built-in more flexible approach, but it should be easy to make some wrappers that make a copy of a graph and make it CPU-only. Note that using different tf.Graph objects will be slow since it will serialize/deserialize the Graph every time you "switch" graphs, it's faster to use a single tf.Graph, with subsets of nodes representing your "graphs" – Yaroslav Bulatov May 18 '17 at 14:18
  • 1
    BTW, here's a [helper utility](https://github.com/yaroslavvb/stuff/blob/master/graph_template.py) I use. It lets you clone part of graph multiple times onto different devices – Yaroslav Bulatov May 18 '17 at 14:36
  • Merci beaucoup :) – Ujjwal May 18 '17 at 14:46

1 Answers1

1

See issues #9201 and #2175. The fact that the GPU devices are created does not mean that your graph is necessarily running on the GPU. You can enforce CPU execution with device_count = {'GPU': 0} or tf.device, but the GPU devices are still created with the session, just in case some op wants it. About 'CUDA_VISIBLE_DEVICES', making it empty did not work for me either, but doing export CUDA_VISIBLE_DEVICES"-1" (before starting Python, or inside Python through os.environ before importing TensorFlow) did the trick (TensorFlow will output a warning about the GPU not being found, but it will work). You can see the documentation for CUDA_VISIBLE_DEVICES here.

jdehesa
  • 58,456
  • 7
  • 77
  • 121