Python decorators and the tf.function
machine learning Python TensorflowContents
Python decorators
A decorator is a function that accepts another function as an argument and adds new functionality to it. Decorators are used for all sorts of things, like logging, timing the execution of functions, and caching values. Let’s see a couple of examples!
The most useless decorator
Arguably, the most useless decorator is the following one that does absolutely nothing! :)
So, basically, the noop_decorator
returns whatever function we hand it over, without modifying it at all.
Timing the execution of a function
In the following example, we construct a decorator called mytimer
that prints the time in seconds a function takes to execute. The decorator accepts as input the function func and returns another function called wrapper
. So, every time we call the calc_stuff()
function, in reality the wrapper()
function is executed. The latter saves the current value of a performance counter. It then uses the *args
and **kwargs
to collect the positional and keyword arguments. Subsequently, it runs the function func by forwarding the args and kwargs with the unpacking operators (asterisk and double asterisk). Next, it takes the difference between the new minus the old value of the performance counter and prints the result. This is the time elapsed during the execution of func. Finally, it returns the result of func, as we would expect if we had called the undecorated function.
Summing up the individual execution times, we get a total of 4.85 seconds, which is pretty close to the cumulative time timeit()
reports. Next, we redefine the function with no decorator, and we notice how each function call’s execution time is gone now.
Retrieving the lost metadata
When we decorate a function, we basically replace it with another function. This has the undesired side effect that some of the original function’s metadata are lost since they are replaced by the wrapper’s. See, for instance, the following code:
The same applies for the docstrings:
To preserve the original function’s metadata, we use the functools.wraps()
, which copies the metadata to the wrapper function that would be otherwise lost!
Neat! The function names were copied over. The same applies to the docstrings:
Notice that although we had a docstring for the wrapper
function, it was replaced by the original functions’ docstrings. Last, to blow your mind, functools.wraps
is in itself a decorator! :P
Eager vs. lazy Tensorflow’s execution modes
Basic computation model
In Tensorflow, computations are modeled as a directed graph. Each node in the graph is a mathematical operation (say an addition of two scalars or a multiplication of two matrices). Every node has some inputs and outputs, possibly even zero. Along the edges of the graph, tensors flow! :) Tensors are multidimensional arrays with a specific type (e.g., float or double, etc.) and should not be confused with tensors in mathematical physics. For example, the mathematical operation \(\mathbf{\text{Relu}}\left(\mathbf{W} \mathbf{x} + \mathbf{b}\right)\) is represented as:
Image taken from here.
Tensorflow 1.0 and lazy execution
In Tensorflow 1.0, one had to construct the computation graph, then set up a session.run() with feed_dict to populate the graph with actual data. The advantage of working with a computation graph is that it allowed Tensorflow to perform many optimizations (e.g., graph simplifications, inlining function bodies to accommodate interprocedural optimizations, and so on). As of the time of writing, Grappler is the default graph optimization engine in the Tensorflow runtime. Grappler rewrites the graphs in order to improve performance, and also provides a plugin interface to register custom-made optimizers. A very basic example of such a simplification is the following algebraic one, that takes into account the properties of commutativity, assosiativity and distributivity:
\[2\mathbf{A} + 3 \mathbf{B} + 3 \mathbf{C} + \mathbf{\text{Identity}}(\mathbf{A}) \Rightarrow 3\mathbf{A} + 3 \mathbf{B} + 3 \mathbf{C} \Rightarrow 3 \text{tf.raw_ops.AddN}(\mathbf{A},\mathbf{B},\mathbf{C})\]Despite the speed benefits, though, Tensorflow’s 1.0 user experience left much to be desired, so eager execution mode was eventually introduced.
Tensorflow 2.0 and eager execution
In eager execution, we write some code, and we can run it immediately, line by line, examine the output, modify it, re-run it, etc. Everything is evaluated on the spot without constructing a computation graph that will be run later in a session. This is easier to debug and feels like writing regular Python code. Compare the following code:
However, by running Tensorflow one step at a time, we give up the previous speed optimizations that were possible during the lazy execution mode. In Tensorflow 2.0, the default execution mode has been set to eager, presumably after people started to favor Pytorch over TF since Pytorch was eager from the beginning. So, where does the tf.function
fit in this narrative? By using the tf.function decorator, we can convert a function into a Tensorflow Graph (tf.Graph
) and lazy execute it, so we bring back some of the speed acceleration we gave up before. The following code uses the tf.function
decorator to convert my_func()
into a callable Tensorflow graph that we visualize with Tensorboard.
Fire up the Tensorboard to inspect the computation graph. By the way, if you are using ssh tunneling, you will probably need to add local port forwarding for port 6006.
Show me the speedup!
We will borrow some real code from a previous post, where we used trainable probability distributions and measure the speedup that tf.function
brings. First, we load the necessary modules and generate some normally distributed training data.
We then define a negative log-likelihood loss function and a function to calculate the gradients and loss. Naturally, we would call get_loss_and_grads()
in a custom training loop, and then we would pass the gradients to the optimizer with optimizer.apply_gradients()
to update the model’s parameters. Here, we will just call get_loss_and_grads()
repeatedly.
We run it 1000 times and measure the execution time:
We do the same as before, but this time we decorate the get_loss_and_grads()
function with tf.function()
:
So, by decorating the get_loss_and_grads()
with tf.function
, we reduced the execution time from about 5.66 seconds to 0.70, that’s roughly a 88% relative reduction. Not bad!
Caveats
Functions with side-effects
By now, I might have given you the false impression that adding tf.function
to any existing function, whatsoever, automatically converts it into a computation graph. We will now discuss some of the caveats with the tf.function
decorator. First, any Python side-effects will only happen once, when func
is traced. Such side-effects include, for instance, printing with print()
or appending to a list:
Similarly, if we modify a Python list:
The correct way to is to rewrite the append to a list as a Tensorflow operations, e.g. with TensorArray()
:
Passing Python scalars to tf.function
Probably the most subtle gotcha here is this. Passing Python scalars or lists as arguments to tf.function
, will always build a new graph! So by passing Python scalars repeatedly, say in a loop, as arguments to tf.function
, it will thrash the system by creating new computation graphs again and again!
Here we measure the performance degradation:
Tensorflow will even warn if it detects such a usage:
WARNING:tensorflow:5 out of the last 10006 calls to <function f at 0x7f68e6f75a60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.