Path: blob/master/site/en-snapshot/hub/tutorials/bird_vocalization_classifier.ipynb
25118 views
Copyright 2023 The TensorFlow Hub Authors.
Licensed under the Apache License, Version 2.0 (the "License");
Using Google Bird Vocalization model
The Google Bird Vocalization is a global bird embedding and classification model.
This model expects as input a 5-second audio segment sampled at 32kHz
The model outputs both the logits and the embeddigs for each input window of audio.
On this notebook you'll learn how to feed the audio properly to the model and how to use the logits for inference.
Loading the Model from TFHub
Lets load the labels that the model was trained on.
The labels file is in the assets forlder under label.csv. Each line is an ebird id.
The frame_audio
function is based on the Chirp lib version but using tf.signal instead of librosa.
The ensure_sample_rate
is a function to make sure that any audio used with the model has the expected sample rate of 32kHz
Lets load a file from Wikipedia.
To be more precise, the audio of a Common Blackbird
![]() |
---|
By Andreas Trepte - Own work, CC BY-SA 2.5, Link |
The audio was contributed by Oona Räisänen (Mysid) under the public domain license.
The audio has 24 seconds and the model expects chunks of 5 seconds.
The frame_audio
function can fix that and split the audio in proper frames
Let's apply the model only on the first frame:
The label.csv file contains ebirds ids. The ebird id for Turdus Merula is eurbla
Lets apply the model on all the frames now:
note: this code is also based on the Chirp library