Path: blob/master/site/en-snapshot/xla/tfcompile.md
25115 views
Using AOT compilation
What is tfcompile?
tfcompile
is a standalone tool that ahead-of-time (AOT) compiles TensorFlow graphs into executable code. It can reduce total binary size, and also avoid some runtime overheads. A typical use-case of tfcompile
is to compile an inference graph into executable code for mobile devices.
The TensorFlow graph is normally executed by the TensorFlow runtime. This incurs some runtime overhead for execution of each node in the graph. This also leads to a larger total binary size, since the code for the TensorFlow runtime needs to be available, in addition to the graph itself. The executable code produced by tfcompile
does not use the TensorFlow runtime, and only has dependencies on kernels that are actually used in the computation.
The compiler is built on top of the XLA framework. The code bridging TensorFlow to the XLA framework resides under tensorflow/compiler.
What does tfcompile do?
tfcompile
takes a subgraph, identified by the TensorFlow concepts of feeds and fetches, and generates a function that implements that subgraph. The feeds
are the input arguments for the function, and the fetches
are the output arguments for the function. All inputs must be fully specified by the feeds; the resulting pruned subgraph cannot contain Placeholder or Variable nodes. It is common to specify all Placeholders and Variables as feeds, which ensures the resulting subgraph no longer contains these nodes. The generated function is packaged as a cc_library
, with a header file exporting the function signature, and an object file containing the implementation. The user writes code to invoke the generated function as appropriate.
Using tfcompile
This section details high level steps for generating an executable binary with tfcompile
from a TensorFlow subgraph. The steps are:
Step 1: Configure the subgraph to compile
Step 2: Use the
tf_library
build macro to compile the subgraphStep 3: Write code to invoke the subgraph
Step 4: Create the final binary
Step 1: Configure the subgraph to compile
Identify the feeds and fetches that correspond to the input and output arguments for the generated function. Then configure the feeds
and fetches
in a tensorflow.tf2xla.Config
proto.
Step 2: Use tf_library build macro to compile the subgraph
This step converts the graph into a cc_library
using the tf_library
build macro. The cc_library
consists of an object file containing the code generated from the graph, along with a header file that gives access to the generated code. tf_library
utilizes tfcompile
to compile the TensorFlow graph into executable code.
To generate the GraphDef proto (test_graph_tfmatmul.pb) for this example, run make_test_graphs.py and specify the output location with the --out_dir flag.
Typical graphs contain Variables
representing the weights that are learned via training, but tfcompile
cannot compile a subgraph that contain Variables
. The freeze_graph.py tool converts variables into constants, using values stored in a checkpoint file. As a convenience, the tf_library
macro supports the freeze_checkpoint
argument, which runs the tool. For more examples see tensorflow/compiler/aot/tests/BUILD.
Constants that show up in the compiled subgraph are compiled directly into the generated code. To pass the constants into the generated function, rather than having them compiled-in, simply pass them in as feeds.
For details on the tf_library
build macro, see tfcompile.bzl.
For details on the underlying tfcompile
tool, see tfcompile_main.cc.
Step 3: Write code to invoke the subgraph
This step uses the header file (test_graph_tfmatmul.h
) generated by the tf_library
build macro in the previous step to invoke the generated code. The header file is located in the bazel-bin
directory corresponding to the build package, and is named based on the name attribute set on the tf_library
build macro. For example, the header generated for test_graph_tfmatmul
would be test_graph_tfmatmul.h
. Below is an abbreviated version of what is generated. The generated file, in bazel-bin
, contains additional useful comments.
The generated C++ class is called MatMulComp
in the foo::bar
namespace, because that was the cpp_class
specified in the tf_library
macro. All generated classes have a similar API, with the only difference being the methods to handle arg and result buffers. Those methods differ based on the number and types of the buffers, which were specified by the feed
and fetch
arguments to the tf_library
macro.
There are three types of buffers managed within the generated class: args
representing the inputs, results
representing the outputs, and temps
representing temporary buffers used internally to perform the computation. By default, each instance of the generated class allocates and manages all of these buffers for you. The AllocMode
constructor argument may be used to change this behavior. All buffers are aligned to 64-byte boundaries.
The generated C++ class is just a wrapper around the low-level code generated by XLA.
Example of invoking the generated function based on tfcompile_test.cc
:
Step 4: Create the final binary
This step combines the library generated by tf_library
in step 2 and the code written in step 3 to create a final binary. Below is an example bazel
BUILD file.