GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/es-419/hub/tutorials/yamnet.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2020 The TensorFlow Hub Authors.

Licensed under the Apache License, Version 2.0 (the "License");

In [ ]:

#@title Copyright 2020 The TensorFlow Hub Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

Ver en TensorFlow.org

Ejecutar en Google Colab

Ver en GitHub

Descargar el bloc de notas

Ver modelos de TF Hub

Clasificación de sonido con YAMNet

YAMNet es una red profunda que predice 521 clases de eventos de audio a partir del corpus AudioSet-YouTube en el que se entrenó. Emplea la arquitectura de convolución separable en profundidad Mobilenet_v1.

In [ ]:

import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import csv

import matplotlib.pyplot as plt
from IPython.display import Audio
from scipy.io import wavfile

Cargue el modelo desde TensorFlow Hub.

Nota: Para leer la documentación simplemente siga la URL del modelo.

In [ ]:

# Load the model.
model = hub.load('https://tfhub.dev/google/yamnet/1')

El archivo de etiquetas se cargará desde los activos del modelo y está presente en model.class_map_path(). Lo cargaremos en la variable class_names.

In [ ]:

# Find the name of the class with the top score when mean-aggregated across frames.
def class_names_from_csv(class_map_csv_text):
  """Returns list of class names corresponding to score vector."""
  class_names = []
  with tf.io.gfile.GFile(class_map_csv_text) as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
      class_names.append(row['display_name'])

  return class_names

class_map_path = model.class_map_path().numpy()
class_names = class_names_from_csv(class_map_path)

Agregue un método para verificar y convertir un audio cargado que tenga la sample_rate adecuada (16 K); de lo contrario, podría afectar los resultados del modelo.

In [ ]:

def ensure_sample_rate(original_sample_rate, waveform,
                       desired_sample_rate=16000):
  """Resample waveform if required."""
  if original_sample_rate != desired_sample_rate:
    desired_length = int(round(float(len(waveform)) /
                               original_sample_rate * desired_sample_rate))
    waveform = scipy.signal.resample(waveform, desired_length)
  return desired_sample_rate, waveform

Descargar y preparar el archivo de sonido

Aquí descargará un archivo wav y lo escuchará. Si ya tiene un archivo disponible, simplemente cárguelo en Colab y úselo ese.

Nota: Se requiere que el archivo de audio sea un archivo wav mono con una frecuencia de muestreo de 16 kHz.

In [ ]:

!curl -O https://storage.googleapis.com/audioset/speech_whistling2.wav

In [ ]:

!curl -O https://storage.googleapis.com/audioset/miaow_16k.wav

In [ ]:

# wav_file_name = 'speech_whistling2.wav'
wav_file_name = 'miaow_16k.wav'
sample_rate, wav_data = wavfile.read(wav_file_name, 'rb')
sample_rate, wav_data = ensure_sample_rate(sample_rate, wav_data)

# Show some basic information about the audio.
duration = len(wav_data)/sample_rate
print(f'Sample rate: {sample_rate} Hz')
print(f'Total duration: {duration:.2f}s')
print(f'Size of the input: {len(wav_data)}')

# Listening to the wav file.
Audio(wav_data, rate=sample_rate)

wav_data debe normalizarse a valores entre [-1.0, 1.0] (como se indica en la documentación del modelo).

In [ ]:

waveform = wav_data / tf.int16.max

Ejecutar el modelo

Ahora la parte fácil: con los datos ya preparados, simplemente llama al modelo y obtenga: puntuaciones, incorporaciones y el espectrograma.

La puntuación es el resultado principal que usaremos. El espectrograma se usará para hacer algunas visualizaciones más adelante.

In [ ]:

# Run the model, check the output.
scores, embeddings, spectrogram = model(waveform)

In [ ]:

scores_np = scores.numpy()
spectrogram_np = spectrogram.numpy()
infered_class = class_names[scores_np.mean(axis=0).argmax()]
print(f'The main sound is: {infered_class}')

Visualización

YAMNet también devuelve información adicional que podemos usar para la visualización. Veamos la forma de onda, el espectrograma y las clases principales inferidas.

In [ ]:

plt.figure(figsize=(10, 6))

# Plot the waveform.
plt.subplot(3, 1, 1)
plt.plot(waveform)
plt.xlim([0, len(waveform)])

# Plot the log-mel spectrogram (returned by the model).
plt.subplot(3, 1, 2)
plt.imshow(spectrogram_np.T, aspect='auto', interpolation='nearest', origin='lower')

# Plot and label the model output scores for the top-scoring classes.
mean_scores = np.mean(scores, axis=0)
top_n = 10
top_class_indices = np.argsort(mean_scores)[::-1][:top_n]
plt.subplot(3, 1, 3)
plt.imshow(scores_np[:, top_class_indices].T, aspect='auto', interpolation='nearest', cmap='gray_r')

# patch_padding = (PATCH_WINDOW_SECONDS / 2) / PATCH_HOP_SECONDS
# values from the model documentation
patch_padding = (0.025 / 2) / 0.01
plt.xlim([-patch_padding-0.5, scores.shape[0] + patch_padding-0.5])
# Label the top_N classes.
yticks = range(0, top_n, 1)
plt.yticks(yticks, [class_names[top_class_indices[x]] for x in yticks])
_ = plt.ylim(-0.5 + np.array([top_n, 0]))

Copyright 2020 The TensorFlow Hub Authors.

Clasificación de sonido con YAMNet

Descargar y preparar el archivo de sonido

Ejecutar el modelo

Visualización

Product

Resources

Company