Path: blob/master/site/en-snapshot/hub/tutorials/yamnet.ipynb
25118 views
Copyright 2020 The TensorFlow Hub Authors.
Licensed under the Apache License, Version 2.0 (the "License");
Sound classification with YAMNet
YAMNet is a deep net that predicts 521 audio event classes from the AudioSet-YouTube corpus it was trained on. It employs the Mobilenet_v1 depthwise-separable convolution architecture.
Load the Model from TensorFlow Hub.
Note: to read the documentation just follow the model's url
The labels file will be loaded from the models assets and is present at model.class_map_path()
. You will load it on the class_names
variable.
Add a method to verify and convert a loaded audio is on the proper sample_rate (16K), otherwise it would affect the model's results.
Downloading and preparing the sound file
Here you will download a wav file and listen to it. If you have a file already available, just upload it to colab and use it instead.
Note: The expected audio file should be a mono wav file at 16kHz sample rate.
The wav_data
needs to be normalized to values in [-1.0, 1.0]
(as stated in the model's documentation).
Executing the Model
Now the easy part: using the data already prepared, you just call the model and get the: scores, embedding and the spectrogram.
The score is the main result you will use. The spectrogram you will use to do some visualizations later.
Visualization
YAMNet also returns some additional information that we can use for visualization. Let's take a look on the Waveform, spectrogram and the top classes inferred.