Debugging Tensorflow 2

Tensorflow comes with eager execution enabled by default which makes debugging much more straightforward. What does eager execution mean, and why it’s not always so eager?

Eager execution?

What is eager execution? The Tensorflow documentation has a great guide explaining this feature. To see the difference, try

>>> t = tf.constant(3)
>>> print(t * t)  # Executed eagerly
tf.Tensor(9, shape=(), dtype=int32)

>>> tf.compat.v1.disable_eager_execution()
>>> t = tf.constant(3)
>>> print(t * t)  # Executed lazily, result 9 not available
Tensor("mul_1:0", shape=(), dtype=int32)

Eager execution is the opposite of lazy evaluation, where operations such as t * t are not actually executed. With lazy evaluation, the operations are recorded for later execution. Lazy evaluation makes it possible, for example, to reuse the same operations on a new batch of data. Lazy evaluation is good for performance but not for debugging. You see in the above example that the intermediate result 9 is not available with lazy evaluation. Without eager execution, you see the result of t * t has just the symbolic value mul_1:0.

With eager execution enabled, you can use your standard tools to debug your code. Let it be a breakpoint() or even just a simple print(). You can inspect your tensors at runtime to see where the code goes wrong.

Eager execution by default, so everything’s fine?

Well, not exactly. If you have worked with keras and wrote your own class-based model, there’s a fair chance you had to debug the training step. However, this doesn’t work well. The training in keras, e.g., the method train_step() is called from within a Tensorflow tf.function. This, in turn means, the whole training loop is evaluated lazily. The benefits of eager execution for debugging are traded in favour of performance.

There is a way out. Pass run_eagerly=True to your model’s compile() method.

This will make the training step run in eager execution mode. This gives you back the ability to debug your code with standard tools. Once your code works properly with eager execution, you want to make sure to switch back to lazy evaluation for performance and portability reasons.