Path: blob/master/examples/mfa_extraction/README.md
1558 views
MFA based extraction for FastSpeech
Prepare
Everything is done from main repo folder so TensorflowTTS/
Optional* Modify MFA scripts to work with your language (https://montreal-forced-aligner.readthedocs.io/en/latest/pretrained_models.html)
Download pretrained mfa, lexicon and run extract textgrids:
After this step, the TextGrids is allocated at
./mfa/parsed.
Extract duration from textgrid files:
Dataset structure after finish this step:
Optional* add your own dataset parser based on tensorflow_tts/processor/experiment/example_dataset.py ( If base processor dataset didnt match yours )
Run preprocess and normalization (Step 4,5 in
examples/fastspeech2_libritts/README.MD)Run fix mismatch to fix few frames difference in audio and duration files:
Problems with MFA extraction
Looks like MFA have problems with trimmed files it works better (in my experiments) with ~100ms of silence at start and end
Short files can get a lot of false positive like only silence extraction (LibriTTS example) so i would get only samples >2s