34

I am looking at the TensorFlow "MNIST For ML Beginners" tutorial, and I want to print out the training loss after every training step.

My training loop currently looks like this:

for i in range(100):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

Now, train_step is defined as:

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

Where cross_entropy is the loss which I want to print out:

cross_entropy = -tf.reduce_sum(y_ * tf.log(y))

One way to print this would be to explicitly compute cross_entropy in the training loop:

for i in range(100):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    cross_entropy = -tf.reduce_sum(y_ * tf.log(y))
    print 'loss = ' + str(cross_entropy)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

I now have two questions regarding this:

  1. Given that cross_entropy is already computed during sess.run(train_step, ...), it seems inefficient to compute it twice, requiring twice the number of forward passes of all the training data. Is there a way to access the value of cross_entropy when it was computed during sess.run(train_step, ...)?

  2. How do I even print a tf.Variable? Using str(cross_entropy) gives me an error...

Thank you!

Karnivaurus
  • 22,823
  • 57
  • 147
  • 247

2 Answers2

47

You can fetch the value of cross_entropy by adding it to the list of arguments to sess.run(...). For example, your for-loop could be rewritten as follows:

for i in range(100):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    cross_entropy = -tf.reduce_sum(y_ * tf.log(y))
    _, loss_val = sess.run([train_step, cross_entropy],
                           feed_dict={x: batch_xs, y_: batch_ys})
    print 'loss = ' + loss_val

The same approach can be used to print the current value of a variable. Let's say, in addition to the value of cross_entropy, you wanted to print the value of a tf.Variable called W, you could do the following:

for i in range(100):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    cross_entropy = -tf.reduce_sum(y_ * tf.log(y))
    _, loss_val, W_val = sess.run([train_step, cross_entropy, W],
                                  feed_dict={x: batch_xs, y_: batch_ys})
    print 'loss = %s' % loss_val
    print 'W = %s' % W_val
mrry
  • 125,488
  • 26
  • 399
  • 400
  • 5
    Thanks. So everytime I call `sess.run([train_step, cross_entropy])`, it still only computes `cross_entropy` once, right? It doesn't do an additional forward pass for each of the variables I pass? – Karnivaurus Nov 20 '15 at 19:35
  • 5
    That's right - it executes the exact same subgraph (because `cross_entropy` is already calculated as part of the training step), and just adds an extra node to fetch the value of `cross_entropy` back to your Python program. – mrry Nov 20 '15 at 19:36
  • Thanks. As a side point, after updating my code as you suggested, the value of `cross_entropy` does, on average, decrease over the loop. However, sometimes it actually increases from one training iteration to the next. This happens for a range of step sizes in the gradient descent. Is this expected? Wouldn't the loss always decrease after each iteration, because you are moving the weights in a direction which should reduce this loss? The graph of loss vs iteration is here: http://i.stack.imgur.com/f8B80.png – Karnivaurus Nov 20 '15 at 19:41
  • 1
    That's to be expected - the loss will fluctuate as you pass in different training examples, but it should have an overall downward trend. If it starts to increase again, then you may be overfitting so you should investigate early stopping: https://en.wikipedia.org/wiki/Early_stopping – mrry Nov 21 '15 at 00:11
  • Is this loss for a batch of 100 records ? Should we divide by 100 ? – Mohan Radhakrishnan May 30 '18 at 05:12
3

Instead of just running the training_step, run also the cross_entropy node so that its value is returned to you. Remember that:

var_as_a_python_value = sess.run(tensorflow_variable)

will give you what you want, so you can do this:

[_, cross_entropy_py] = sess.run([train_step, cross_entropy],
                                 feed_dict={x: batch_xs, y_: batch_ys})

to both run the training and pull out the value of the cross entropy as it was computed during the iteration. Note that I turned both the arguments to sess.run and the return values into a list so that both happen.

dga
  • 21,757
  • 3
  • 44
  • 51