Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
TensorSpeech
GitHub Repository: TensorSpeech/TensorFlowTTS
Path: blob/master/examples/cpptflite/README.md
1558 views

C++ Inference using TFlite

TensorFlow Lite is an open source deep learning framework for on-device inference. On Android and Linux (including Raspberry Pi) platforms, we can run inferences using TensorFlow Lite APIs available in C++. The repository TensorFlowTTS and TensorFlow Lite help developers run popular text-to-speech (TTS) models on mobile, embedded, and IoT devices.

TFlite model convert method

Method see colab notebook.

Notes:

  • Quantization will deteriorate vocoder and bring noise, so the vocoder doesn't do optimization.

  • TensorFlow Lite in C++ doesn't support the TensorFlow operation of Dropout. So the inference function need delete Dropout before converting tflite model, and it doesn't affect the inference result. For example, fastspeech2 models:

# tensorflow_tts/models/fastspeech2.py # ... def _inference(): # ... # f0_embedding = self.f0_dropout( # self.f0_embeddings(tf.expand_dims(f0_outputs, 2)), training=True # ) # energy_embedding = self.energy_dropout( # self.energy_embeddings(tf.expand_dims(energy_outputs, 2)), training=True # ) f0_embedding = self.f0_embeddings(tf.expand_dims(f0_outputs, 2)) energy_embedding = self.energy_embeddings(tf.expand_dims(energy_outputs, 2)) # ...

About Code

  • TfliteBase.cpp: A base class for loading tflite-model and creating tflite interpreter. By inheriting from this class, you can implement specific behavior, like Mel-spectrogram and Vocoder.

  • TTSFrontend.cpp: Text preprocessor converts string to ID based on your desiged phoneme2ID dict, which needs a text to pronunciation module, like g2p for English and pinyin for Chinese.

  • TTSBackend.cpp: It contains two-step process - first generating a Mel-spectrogram from phoneme-ID sequence and then generating the audio waveform by Vocoder.

Using the demo

A demo of English or Mandarin TTS and the tflite-models are available for linux platform. The pretrained models to be converted are download from the colab notebook (English or Mandarin). Mel-generator and Vocoder select FastSpeech2 and Multiband-MelGAN, respectively.

Notes: The text2ids function in TTSFrontend.cpp is implemented by using bash command in C++ instead of developing a new pronunciation module (see /demo/text2ids.py). In fact, it is not a recommended method, and you should redevelop a appropriate text2ids module, like the code in examples/cppwin.

Firstly, it should compile a Tensorflow Lite static library. The method see the reference from the official guidance of Tensorflow.

Execute the following command to compile a static library for linux:

./tensorflow/lite/tools/make/download_dependencies.sh ./tensorflow/lite/tools/make/build_lib.sh (for linux)

(The official also provides different complie methods for other platforms (such as rpi, aarch64, and riscv), see /tensorflow/lite/tools/make/)

Because this process takes much time, so a static library builded for linux is also available (libtensorflow-lite.a).

The structure of the demo folder should be:

|- [cpptflite]/ | |- demo/ | |- src/ | |- lib/ | |- flatbuffers/ | |- tensorflow/lite/ | |- libtensorflow-lite.a

The two folders of flatbuffers/ and tensorflow/lite/ provide the required header files.

Then,

cd examples/cpptflite mkdir build cd build

English Demo (using LJSPEECH dataset)

cmake .. -DMAPPER=LJSPEECH make ./demo "Bill got in the habit of asking himself “Is that thought true?”" test.wav

or Mandarin Demo (using Baker dataset)

cmake .. -DMAPPER=BAKER make ./demo "这是一个开源的端到端中文语音合成系统" test.wav

Results

  • Comparison before and after conversion (English TTS)

    "Bill got in the habit of asking himself “Is that thought true?” \ And if he wasn’t absolutely certain it was, he just let it go."
  • Before conversion (Python)

    ori_mel

  • After conversion (C++)

    tflite_mel

  • Adding #3 in chinese text will create pause prosody in audio

这是一个开源的端到端中文语音合成系统"

tflite_mel

"这是一个开源的#3端到端#3中文语音合成系统"

tflite_mel