Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
snakers4
GitHub Repository: snakers4/silero-vad
Path: blob/master/examples/pyaudio-streaming/pyaudio-streaming-examples.ipynb
1150 views
Kernel: Python 3 (ipykernel)

Pyaudio Microphone Streaming Examples

A simple notebook that uses pyaudio to get the microphone audio and feeds this audio then to Silero VAD.

I created it as an example on how binary data from a stream could be feed into Silero VAD.

Has been tested on Ubuntu 21.04 (x86). After you installed the dependencies below, no additional setup is required.

This notebook does not work in google colab! For local usage only.

Dependencies

The cell below lists all used dependencies and the used versions. Uncomment to install them from within the notebook.

#!pip install numpy>=1.24.0 #!pip install torch>=1.12.0 #!pip install matplotlib>=3.6.0 #!pip install torchaudio>=0.12.0 #!pip install soundfile==0.12.1 #!apt install python3-pyaudio (linux) or pip install pyaudio (windows)

Imports

import io import numpy as np import torch torch.set_num_threads(1) import torchaudio import matplotlib import matplotlib.pylab as plt import pyaudio
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) Cell In[2], line 8 6 import matplotlib 7 import matplotlib.pylab as plt ----> 8 import pyaudio ModuleNotFoundError: No module named 'pyaudio'
model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad', model='silero_vad', force_reload=True)
(get_speech_timestamps, save_audio, read_audio, VADIterator, collect_chunks) = utils

Helper Methods

# Taken from utils_vad.py def validate(model, inputs: torch.Tensor): with torch.no_grad(): outs = model(inputs) return outs # Provided by Alexander Veysov def int2float(sound): abs_max = np.abs(sound).max() sound = sound.astype('float32') if abs_max > 0: sound *= 1/32768 sound = sound.squeeze() # depends on the use case return sound

Pyaudio Set-up

FORMAT = pyaudio.paInt16 CHANNELS = 1 SAMPLE_RATE = 16000 CHUNK = int(SAMPLE_RATE / 10) audio = pyaudio.PyAudio()

Simple Example

The following example reads the audio as 250ms chunks from the microphone, converts them to a Pytorch Tensor, and gets the probabilities/confidences if the model thinks the frame is voiced.

num_samples = 512
stream = audio.open(format=FORMAT, channels=CHANNELS, rate=SAMPLE_RATE, input=True, frames_per_buffer=CHUNK) data = [] voiced_confidences = [] frames_to_record = 50 print("Started Recording") for i in range(0, frames_to_record): audio_chunk = stream.read(num_samples) # in case you want to save the audio later data.append(audio_chunk) audio_int16 = np.frombuffer(audio_chunk, np.int16); audio_float32 = int2float(audio_int16) # get the confidences and add them to the list to plot them later new_confidence = model(torch.from_numpy(audio_float32), 16000).item() voiced_confidences.append(new_confidence) print("Stopped the recording") # plot the confidences for the speech plt.figure(figsize=(20,6)) plt.plot(voiced_confidences) plt.show()

Real Time Visualization

As an enhancement to plot the speech probabilities in real time I added the implementation below. In contrast to the simeple one, it records the audio until to stop the recording by pressing enter. While looking into good ways to update matplotlib plots in real-time, I found a simple libarary that does the job. https://github.com/lvwerra/jupyterplot It has some limitations, but works for this use case really well.

#!pip install jupyterplot==0.0.3
from jupyterplot import ProgressPlot import threading continue_recording = True def stop(): input("Press Enter to stop the recording:") global continue_recording continue_recording = False def start_recording(): stream = audio.open(format=FORMAT, channels=CHANNELS, rate=SAMPLE_RATE, input=True, frames_per_buffer=CHUNK) data = [] voiced_confidences = [] global continue_recording continue_recording = True pp = ProgressPlot(plot_names=["Silero VAD"],line_names=["speech probabilities"], x_label="audio chunks") stop_listener = threading.Thread(target=stop) stop_listener.start() while continue_recording: audio_chunk = stream.read(num_samples) # in case you want to save the audio later data.append(audio_chunk) audio_int16 = np.frombuffer(audio_chunk, np.int16); audio_float32 = int2float(audio_int16) # get the confidences and add them to the list to plot them later new_confidence = model(torch.from_numpy(audio_float32), 16000).item() voiced_confidences.append(new_confidence) pp.update(new_confidence) pp.finalize()
start_recording()