Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/main/examples/onnx-export.ipynb
Views: 2535
How to export 🤗 Transformers Models to ONNX ?
ONNX is open format for machine learning models. It allows you to save your neural network's computation graph in a framework agnostic way, which might be particularly helpful when deploying deep learning models.
Indeed, businesses might have other requirements (languages, hardware, ...) for which the training framework might not be the best suited in inference scenarios. In that context, having a representation of the actual computation graph that can be shared accross various business units and logics across an organization might be a desirable component.
Along with the serialization format, ONNX also provides a runtime library which allows efficient and hardware specific execution of the ONNX graph. This is done through the onnxruntime project and already includes collaborations with many hardware vendors to seamlessly deploy models on various platforms.
Through this notebook we'll walk you through the process to convert a PyTorch or TensorFlow transformers model to the ONNX and leverage onnxruntime to run inference tasks on models from 🤗 transformers
Exporting 🤗 transformers model to ONNX
Exporting models (either PyTorch or TensorFlow) is easily achieved through the conversion tool provided as part of 🤗 transformers repository.
Under the hood the process is sensibly the following:
Allocate the model from transformers (PyTorch or TensorFlow)
Forward dummy inputs through the model this way ONNX can record the set of operations executed
Optionally define dynamic axes on input and output tensors
Save the graph along with the network parameters
Collecting git+https://github.com/huggingface/transformers
Cloning https://github.com/huggingface/transformers to /tmp/pip-req-build-9rvbp9p8
Running command git clone -q https://github.com/huggingface/transformers /tmp/pip-req-build-9rvbp9p8
Requirement already satisfied, skipping upgrade: numpy in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from transformers==3.0.2) (1.18.1)
Requirement already satisfied, skipping upgrade: tokenizers==0.8.1.rc2 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from transformers==3.0.2) (0.8.1rc2)
Requirement already satisfied, skipping upgrade: packaging in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from transformers==3.0.2) (20.4)
Requirement already satisfied, skipping upgrade: filelock in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from transformers==3.0.2) (3.0.12)
Requirement already satisfied, skipping upgrade: requests in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from transformers==3.0.2) (2.23.0)
Requirement already satisfied, skipping upgrade: tqdm>=4.27 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from transformers==3.0.2) (4.46.1)
Requirement already satisfied, skipping upgrade: regex!=2019.12.17 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from transformers==3.0.2) (2020.6.8)
Requirement already satisfied, skipping upgrade: sentencepiece!=0.1.92 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from transformers==3.0.2) (0.1.91)
Requirement already satisfied, skipping upgrade: sacremoses in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from transformers==3.0.2) (0.0.43)
Requirement already satisfied, skipping upgrade: pyparsing>=2.0.2 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from packaging->transformers==3.0.2) (2.4.7)
Requirement already satisfied, skipping upgrade: six in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from packaging->transformers==3.0.2) (1.15.0)
Requirement already satisfied, skipping upgrade: chardet<4,>=3.0.2 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from requests->transformers==3.0.2) (3.0.4)
Requirement already satisfied, skipping upgrade: idna<3,>=2.5 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from requests->transformers==3.0.2) (2.9)
Requirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from requests->transformers==3.0.2) (1.25.9)
Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from requests->transformers==3.0.2) (2020.6.20)
Requirement already satisfied, skipping upgrade: click in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from sacremoses->transformers==3.0.2) (7.1.2)
Requirement already satisfied, skipping upgrade: joblib in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from sacremoses->transformers==3.0.2) (0.15.1)
Building wheels for collected packages: transformers
Building wheel for transformers (setup.py) ... done
Created wheel for transformers: filename=transformers-3.0.2-py3-none-any.whl size=883063 sha256=5f2caef76450921ae2e5b10abbbaab436e9c87c83486114fa08d305e4396d4cd
Stored in directory: /tmp/pip-ephem-wheel-cache-kftypcjz/wheels/42/68/45/c63edff61c292f2dfd4df4ef6522dcbecc603e7af82813c1d7
Successfully built transformers
Installing collected packages: transformers
Attempting uninstall: transformers
Found existing installation: transformers 3.0.2
Uninstalling transformers-3.0.2:
Successfully uninstalled transformers-3.0.2
Successfully installed transformers-3.0.2
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Requirement already up-to-date: torch==1.6.0+cpu in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (1.6.0+cpu)
Requirement already up-to-date: torchvision==0.7.0+cpu in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (0.7.0+cpu)
Requirement already satisfied, skipping upgrade: numpy in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from torch==1.6.0+cpu) (1.18.1)
Requirement already satisfied, skipping upgrade: future in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from torch==1.6.0+cpu) (0.18.2)
Requirement already satisfied, skipping upgrade: pillow>=4.1.1 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from torchvision==0.7.0+cpu) (7.2.0)
Requirement already up-to-date: onnxruntime==1.4.0 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (1.4.0)
Requirement already satisfied, skipping upgrade: protobuf in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from onnxruntime==1.4.0) (3.12.2)
Requirement already satisfied, skipping upgrade: numpy>=1.16.6 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from onnxruntime==1.4.0) (1.18.1)
Requirement already satisfied, skipping upgrade: setuptools in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from protobuf->onnxruntime==1.4.0) (47.1.1.post20200604)
Requirement already satisfied, skipping upgrade: six>=1.9 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from protobuf->onnxruntime==1.4.0) (1.15.0)
Looking in indexes: https://test.pypi.org/simple/
Requirement already satisfied: ort-nightly in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (1.4.0.dev202008262)
Requirement already satisfied: protobuf in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from ort-nightly) (3.12.2)
Requirement already satisfied: numpy>=1.16.6 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from ort-nightly) (1.18.1)
Requirement already satisfied: setuptools in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from protobuf->ort-nightly) (47.1.1.post20200604)
Requirement already satisfied: six>=1.9 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from protobuf->ort-nightly) (1.15.0)
Requirement already up-to-date: onnxruntime-tools in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (1.4.2)
Requirement already satisfied, skipping upgrade: numpy in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from onnxruntime-tools) (1.18.1)
Requirement already satisfied, skipping upgrade: coloredlogs in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from onnxruntime-tools) (14.0)
Requirement already satisfied, skipping upgrade: py3nvml in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from onnxruntime-tools) (0.2.6)
Requirement already satisfied, skipping upgrade: psutil in /home/mfuntowicz/.local/lib/python3.8/site-packages/psutil-5.7.0-py3.8-linux-x86_64.egg (from onnxruntime-tools) (5.7.0)
Requirement already satisfied, skipping upgrade: packaging in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from onnxruntime-tools) (20.4)
Requirement already satisfied, skipping upgrade: py-cpuinfo in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from onnxruntime-tools) (5.0.0)
Requirement already satisfied, skipping upgrade: onnx in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from onnxruntime-tools) (1.7.0)
Requirement already satisfied, skipping upgrade: humanfriendly>=7.1 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from coloredlogs->onnxruntime-tools) (8.2)
Requirement already satisfied, skipping upgrade: xmltodict in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from py3nvml->onnxruntime-tools) (0.12.0)
Requirement already satisfied, skipping upgrade: six in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from packaging->onnxruntime-tools) (1.15.0)
Requirement already satisfied, skipping upgrade: pyparsing>=2.0.2 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from packaging->onnxruntime-tools) (2.4.7)
Requirement already satisfied, skipping upgrade: typing-extensions>=3.6.2.1 in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from onnx->onnxruntime-tools) (3.7.4.2)
Requirement already satisfied, skipping upgrade: protobuf in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from onnx->onnxruntime-tools) (3.12.2)
Requirement already satisfied, skipping upgrade: setuptools in /home/mfuntowicz/miniconda3/envs/pytorch/lib/python3.8/site-packages (from protobuf->onnx->onnxruntime-tools) (47.1.1.post20200604)
We also quickly upload some telemetry - this tells us which examples and software versions are getting used so we know where to prioritize our maintenance efforts. We don't collect (or care about) any personally identifiable information, but if you'd prefer not to be counted, feel free to skip this step or delete this cell entirely.
How to leverage runtime for inference over an ONNX graph
As mentionned in the introduction, ONNX is a serialization format and many side projects can load the saved graph and run the actual computations from it. Here, we'll focus on the official onnxruntime. The runtime is implemented in C++ for performance reasons and provides API/Bindings for C++, C, C#, Java and Python.
In the case of this notebook, we will use the Python API to highlight how to load a serialized ONNX graph and run inference workload on various backends through onnxruntime.
onnxruntime is available on pypi:
onnxruntime: ONNX + MLAS (Microsoft Linear Algebra Subprograms)
onnxruntime-gpu: ONNX + MLAS + CUDA
Preparing for an Inference Session
Inference is done using a specific backend definition which turns on hardware specific optimizations of the graph.
Optimizations are basically of three kinds:
Constant Folding: Convert static variables to constants in the graph
Deadcode Elimination: Remove nodes never accessed in the graph
Operator Fusing: Merge multiple instruction into one (Linear -> ReLU can be fused to be LinearReLU)
ONNX Runtime automatically applies most optimizations by setting specific SessionOptions
.
Note:Some of the latest optimizations that are not yet integrated into ONNX Runtime are available in optimization script that tunes models for the best performance.
Forwarding through our optimized ONNX model running on CPU
When the model is loaded for inference over a specific provider, for instance CPUExecutionProvider as above, an optimized graph can be saved. This graph will might include various optimizations, and you might be able to see some higher-level operations in the graph (through Netron for instance) such as:
EmbedLayerNormalization
Attention
FastGeLU
These operations are an example of the kind of optimization onnxruntime is doing, for instance here gathering multiple operations into bigger one (Operator Fusing).
Benchmarking PyTorch model
Note: PyTorch model benchmark is run on CPU
Benchmarking PyTorch & ONNX on CPU
Disclamer: results may vary from the actual hardware used to run the model
Quantization support from transformers
Quantization enables the use of integers (instead of floatting point) arithmetic to run neural networks models faster. From a high-level point of view, quantization works as mapping the float32 ranges of values as int8 with the less loss in the performances of the model.
Hugging Face provides a conversion tool as part of the transformers repository to easily export quantized models to ONNX Runtime. For more information, please refer to the following:
With this method, the accuracy of the model remains at the same level than the full-precision model. If you want to see benchmarks on model performances, we recommand reading the ONNX Runtime notebook on the subject.