Path: blob/master/site/en-snapshot/xla/tfcompile.md
39067 views
Using AOT compilation
What is tfcompile?
tfcompile is a standalone tool that ahead-of-time (AOT) compiles TensorFlow graphs into executable code. It can reduce total binary size, and also avoid some runtime overheads. A typical use-case of tfcompile is to compile an inference graph into executable code for mobile devices.
The TensorFlow graph is normally executed by the TensorFlow runtime. This incurs some runtime overhead for execution of each node in the graph. This also leads to a larger total binary size, since the code for the TensorFlow runtime needs to be available, in addition to the graph itself. The executable code produced by tfcompile does not use the TensorFlow runtime, and only has dependencies on kernels that are actually used in the computation.
The compiler is built on top of the XLA framework. The code bridging TensorFlow to the XLA framework resides under tensorflow/compiler.
What does tfcompile do?
tfcompile takes a subgraph, identified by the TensorFlow concepts of feeds and fetches, and generates a function that implements that subgraph. The feeds are the input arguments for the function, and the fetches are the output arguments for the function. All inputs must be fully specified by the feeds; the resulting pruned subgraph cannot contain Placeholder or Variable nodes. It is common to specify all Placeholders and Variables as feeds, which ensures the resulting subgraph no longer contains these nodes. The generated function is packaged as a cc_library, with a header file exporting the function signature, and an object file containing the implementation. The user writes code to invoke the generated function as appropriate.
Using tfcompile
This section details high level steps for generating an executable binary with tfcompile from a TensorFlow subgraph. The steps are:
Step 1: Configure the subgraph to compile
Step 2: Use the
tf_librarybuild macro to compile the subgraphStep 3: Write code to invoke the subgraph
Step 4: Create the final binary
Step 1: Configure the subgraph to compile
Identify the feeds and fetches that correspond to the input and output arguments for the generated function. Then configure the feeds and fetches in a tensorflow.tf2xla.Config proto.
Step 2: Use tf_library build macro to compile the subgraph
This step converts the graph into a cc_library using the tf_library build macro. The cc_library consists of an object file containing the code generated from the graph, along with a header file that gives access to the generated code. tf_library utilizes tfcompile to compile the TensorFlow graph into executable code.
To generate the GraphDef proto (test_graph_tfmatmul.pb) for this example, run make_test_graphs.py and specify the output location with the --out_dir flag.
Typical graphs contain Variables representing the weights that are learned via training, but tfcompile cannot compile a subgraph that contain Variables. The freeze_graph.py tool converts variables into constants, using values stored in a checkpoint file. As a convenience, the tf_library macro supports the freeze_checkpoint argument, which runs the tool. For more examples see tensorflow/compiler/aot/tests/BUILD.
Constants that show up in the compiled subgraph are compiled directly into the generated code. To pass the constants into the generated function, rather than having them compiled-in, simply pass them in as feeds.
For details on the tf_library build macro, see tfcompile.bzl.
For details on the underlying tfcompile tool, see tfcompile_main.cc.
Step 3: Write code to invoke the subgraph
This step uses the header file (test_graph_tfmatmul.h) generated by the tf_library build macro in the previous step to invoke the generated code. The header file is located in the bazel-bin directory corresponding to the build package, and is named based on the name attribute set on the tf_library build macro. For example, the header generated for test_graph_tfmatmul would be test_graph_tfmatmul.h. Below is an abbreviated version of what is generated. The generated file, in bazel-bin, contains additional useful comments.
The generated C++ class is called MatMulComp in the foo::bar namespace, because that was the cpp_class specified in the tf_library macro. All generated classes have a similar API, with the only difference being the methods to handle arg and result buffers. Those methods differ based on the number and types of the buffers, which were specified by the feed and fetch arguments to the tf_library macro.
There are three types of buffers managed within the generated class: args representing the inputs, results representing the outputs, and temps representing temporary buffers used internally to perform the computation. By default, each instance of the generated class allocates and manages all of these buffers for you. The AllocMode constructor argument may be used to change this behavior. All buffers are aligned to 64-byte boundaries.
The generated C++ class is just a wrapper around the low-level code generated by XLA.
Example of invoking the generated function based on tfcompile_test.cc:
Step 4: Create the final binary
This step combines the library generated by tf_library in step 2 and the code written in step 3 to create a final binary. Below is an example bazel BUILD file.