Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
tensorflow
GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/zh-cn/hub/tutorials/bird_vocalization_classifier.ipynb
25118 views
Kernel: Python 3

Licensed under the Apache License, Version 2.0 (the "License");

#@title Copyright 2023 The TensorFlow Hub Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ==============================================================================

使用 Google Bird Vocalization 模型

Google Bird Vocalization 是一个全球鸟类嵌入和分类模型。

此模型需要以 32kHz 采样的 5 秒音频片段作为输入

此模型为音频的每个输入窗口输出逻辑和嵌入向量。

在此笔记本上,您会学习如何将音频正确提供给模型以及如何使用 logit 进行推断。

!pip install -q "tensorflow_io==0.28.*" !pip install -q librosa
import tensorflow as tf import tensorflow_hub as hub import tensorflow_io as tfio import numpy as np import librosa import csv import io from IPython.display import Audio

从 TFHub 加载模型

model_handle = "https://tfhub.dev/google/bird-vocalization-classifier/1" model = hub.load(model_handle)

我们来加载训练模型使用的标签。

标签文件位于 assets 文件夹中的 label.csv 下。每行都是一个 ebird id。

# Find the name of the class with the top score when mean-aggregated across frames. def class_names_from_csv(class_map_csv_text): """Returns list of class names corresponding to score vector.""" with open(labels_path) as csv_file: csv_reader = csv.reader(csv_file, delimiter=',') class_names = [mid for mid, desc in csv_reader] return class_names[1:] labels_path = hub.resolve(model_handle) + "/assets/label.csv" classes = class_names_from_csv(labels_path) print(classes)

frame_audio 函数基于 Chirp 库版本,但使用 tf.signal 而不是 librosa。

ensure_sample_rate 是一个用于确保模型使用的任何音频都具有 32kHz 预期采样率的函数

def frame_audio( audio_array: np.ndarray, window_size_s: float = 5.0, hop_size_s: float = 5.0, sample_rate = 32000, ) -> np.ndarray: """Helper function for framing audio for inference.""" if window_size_s is None or window_size_s < 0: return audio_array[np.newaxis, :] frame_length = int(window_size_s * sample_rate) hop_length = int(hop_size_s * sample_rate) framed_audio = tf.signal.frame(audio_array, frame_length, hop_length, pad_end=True) return framed_audio def ensure_sample_rate(waveform, original_sample_rate, desired_sample_rate=32000): """Resample waveform if required.""" if original_sample_rate != desired_sample_rate: waveform = tfio.audio.resample(waveform, original_sample_rate, desired_sample_rate) return desired_sample_rate, waveform

我们从 Wikipedia 加载一个文件。

更准确地说,是常见黑鸟的声音

Common Blackbird.jpg

:-: *作者:Andreas Trepte - 自有作品CC BY-SA 2.5链接*

此音频由 Oona Räisänen (Mysid) 根据公共领域许可提供。

!curl -O "https://upload.wikimedia.org/wikipedia/commons/7/7c/Turdus_merula_2.ogg"
turdus_merula = "Turdus_merula_2.ogg" audio, sample_rate = librosa.load(turdus_merula) sample_rate, wav_data_turdus = ensure_sample_rate(audio, sample_rate) Audio(wav_data_turdus, rate=sample_rate)

此音频有 24 秒,而模型需要 5 秒的块。

frame_audio 函数可以解决此问题并将音频拆分为适当的帧

fixed_tm = frame_audio(wav_data_turdus) fixed_tm.shape

我们只在第一帧应用模型:

logits, embeddings = model.infer_tf(fixed_tm[:1])

label.csv 文件包含 ebird id。乌鸫的 ebird id 为 eurbla

probabilities = tf.nn.softmax(logits) argmax = np.argmax(probabilities) print(f"The audio is from the class {classes[argmax]} (element:{argmax} in the label.csv file), with probability of {probabilities[0][argmax]}")

现在我们在所有帧上应用模型:

:此代码也基于 Chirp 库

all_logits, all_embeddings = model.infer_tf(fixed_tm[:1]) for window in fixed_tm[1:]: logits, embeddings = model.infer_tf(window[np.newaxis, :]) all_logits = np.concatenate([all_logits, logits], axis=0) all_logits.shape
frame = 0 for frame_logits in all_logits: probabilities = tf.nn.softmax(frame_logits) argmax = np.argmax(probabilities) print(f"For frame {frame}, the audio is from the class {classes[argmax]} (element:{argmax} in the label.csv file), with probability of {probabilities[argmax]}") frame += 1