CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
huggingface

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

GitHub Repository: huggingface/notebooks
Path: blob/main/diffusers/speech_to_image.ipynb
Views: 2535
Kernel: Unknown Kernel

Speech to Image

The following code can generate an image from an audio sample using pre-trained OpenAI whisper-small and Stable Diffusion. The script was contributed by Mikhail Duzenli and the notebook by Parag Ekbote.

pip install datasets matplotlib diffusers transformers torch librosa soundfile
Requirement already satisfied: datasets in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (3.1.0) Requirement already satisfied: matplotlib in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (3.8.2) Requirement already satisfied: diffusers in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (0.31.0) Requirement already satisfied: transformers in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (4.46.2) Requirement already satisfied: torch in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (2.2.1+cu121) Collecting librosa Downloading librosa-0.10.2.post1-py3-none-any.whl.metadata (8.6 kB) Collecting soundfile Downloading soundfile-0.12.1-py2.py3-none-manylinux_2_31_x86_64.whl.metadata (14 kB) Requirement already satisfied: filelock in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from datasets) (3.16.1) Requirement already satisfied: numpy>=1.17 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from datasets) (1.26.4) Requirement already satisfied: pyarrow>=15.0.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from datasets) (18.0.0) Requirement already satisfied: dill<0.3.9,>=0.3.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from datasets) (0.3.8) Requirement already satisfied: pandas in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from datasets) (2.1.4) Requirement already satisfied: requests>=2.32.2 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from datasets) (2.32.3) Requirement already satisfied: tqdm>=4.66.3 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from datasets) (4.66.6) Requirement already satisfied: xxhash in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from datasets) (3.5.0) Requirement already satisfied: multiprocess<0.70.17 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from datasets) (0.70.16) Requirement already satisfied: fsspec<=2024.9.0,>=2023.1.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets) (2024.9.0) Requirement already satisfied: aiohttp in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from datasets) (3.10.10) Requirement already satisfied: huggingface-hub>=0.23.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from datasets) (0.26.2) Requirement already satisfied: packaging in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from datasets) (24.1) Requirement already satisfied: pyyaml>=5.1 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from datasets) (6.0.2) Requirement already satisfied: contourpy>=1.0.1 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from matplotlib) (1.3.0) Requirement already satisfied: cycler>=0.10 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from matplotlib) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from matplotlib) (4.54.1) Requirement already satisfied: kiwisolver>=1.3.1 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from matplotlib) (1.4.7) Requirement already satisfied: pillow>=8 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from matplotlib) (11.0.0) Requirement already satisfied: pyparsing>=2.3.1 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from matplotlib) (3.2.0) Requirement already satisfied: python-dateutil>=2.7 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from matplotlib) (2.9.0.post0) Requirement already satisfied: importlib-metadata in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from diffusers) (8.5.0) Requirement already satisfied: regex!=2019.12.17 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from diffusers) (2024.11.6) Requirement already satisfied: safetensors>=0.3.1 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from diffusers) (0.4.5) Requirement already satisfied: tokenizers<0.21,>=0.20 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from transformers) (0.20.3) Requirement already satisfied: typing-extensions>=4.8.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (4.12.2) Requirement already satisfied: sympy in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (1.13.3) Requirement already satisfied: networkx in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (3.4.2) Requirement already satisfied: jinja2 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (3.1.4) Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (12.1.105) Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (12.1.105) Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (12.1.105) Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (8.9.2.26) Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (12.1.3.1) Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (11.0.2.54) Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (10.3.2.106) Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (11.4.5.107) Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (12.1.0.106) Requirement already satisfied: nvidia-nccl-cu12==2.19.3 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (2.19.3) Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (12.1.105) Requirement already satisfied: triton==2.2.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (2.2.0) Requirement already satisfied: nvidia-nvjitlink-cu12 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from nvidia-cusolver-cu12==11.4.5.107->torch) (12.6.77) Collecting audioread>=2.1.9 (from librosa) Downloading audioread-3.0.1-py3-none-any.whl.metadata (8.4 kB) Requirement already satisfied: scipy>=1.2.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from librosa) (1.11.4) Requirement already satisfied: scikit-learn>=0.20.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from librosa) (1.3.2) Requirement already satisfied: joblib>=0.14 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from librosa) (1.4.2) Requirement already satisfied: decorator>=4.3.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from librosa) (5.1.1) Collecting numba>=0.51.0 (from librosa) Downloading numba-0.60.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.7 kB) Collecting pooch>=1.1 (from librosa) Downloading pooch-1.8.2-py3-none-any.whl.metadata (10 kB) Collecting soxr>=0.3.2 (from librosa) Downloading soxr-0.5.0.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.6 kB) Collecting lazy-loader>=0.1 (from librosa) Downloading lazy_loader-0.4-py3-none-any.whl.metadata (7.6 kB) Collecting msgpack>=1.0 (from librosa) Downloading msgpack-1.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (8.4 kB) Requirement already satisfied: cffi>=1.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from soundfile) (1.17.1) Requirement already satisfied: pycparser in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from cffi>=1.0->soundfile) (2.22) Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from aiohttp->datasets) (2.4.3) Requirement already satisfied: aiosignal>=1.1.2 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from aiohttp->datasets) (1.3.1) Requirement already satisfied: attrs>=17.3.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from aiohttp->datasets) (24.2.0) Requirement already satisfied: frozenlist>=1.1.1 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from aiohttp->datasets) (1.5.0) Requirement already satisfied: multidict<7.0,>=4.5 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from aiohttp->datasets) (6.1.0) Requirement already satisfied: yarl<2.0,>=1.12.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from aiohttp->datasets) (1.17.1) Requirement already satisfied: async-timeout<5.0,>=4.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from aiohttp->datasets) (4.0.3) Collecting llvmlite<0.44,>=0.43.0dev0 (from numba>=0.51.0->librosa) Downloading llvmlite-0.43.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.8 kB) Requirement already satisfied: platformdirs>=2.5.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from pooch>=1.1->librosa) (4.3.6) Requirement already satisfied: six>=1.5 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0) Requirement already satisfied: charset-normalizer<4,>=2 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from requests>=2.32.2->datasets) (3.4.0) Requirement already satisfied: idna<4,>=2.5 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from requests>=2.32.2->datasets) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from requests>=2.32.2->datasets) (2.2.3) Requirement already satisfied: certifi>=2017.4.17 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from requests>=2.32.2->datasets) (2024.8.30) Requirement already satisfied: threadpoolctl>=2.0.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from scikit-learn>=0.20.0->librosa) (3.5.0) Requirement already satisfied: zipp>=3.20 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from importlib-metadata->diffusers) (3.21.0) Requirement already satisfied: MarkupSafe>=2.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from jinja2->torch) (3.0.2) Requirement already satisfied: pytz>=2020.1 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from pandas->datasets) (2024.2) Requirement already satisfied: tzdata>=2022.1 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from pandas->datasets) (2024.2) Requirement already satisfied: mpmath<1.4,>=1.1.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from sympy->torch) (1.3.0) Requirement already satisfied: propcache>=0.2.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from yarl<2.0,>=1.12.0->aiohttp->datasets) (0.2.0) Downloading librosa-0.10.2.post1-py3-none-any.whl (260 kB) Downloading soundfile-0.12.1-py2.py3-none-manylinux_2_31_x86_64.whl (1.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 105.9 MB/s eta 0:00:00 Downloading audioread-3.0.1-py3-none-any.whl (23 kB) Downloading lazy_loader-0.4-py3-none-any.whl (12 kB) Downloading msgpack-1.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (378 kB) Downloading numba-0.60.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.7/3.7 MB 175.9 MB/s eta 0:00:00 Downloading pooch-1.8.2-py3-none-any.whl (64 kB) Downloading soxr-0.5.0.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (252 kB) Downloading llvmlite-0.43.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (43.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.9/43.9 MB 194.7 MB/s eta 0:00:0000:01 Installing collected packages: soxr, msgpack, llvmlite, lazy-loader, audioread, soundfile, pooch, numba, librosa Successfully installed audioread-3.0.1 lazy-loader-0.4 librosa-0.10.2.post1 llvmlite-0.43.0 msgpack-1.1.0 numba-0.60.0 pooch-1.8.2 soundfile-0.12.1 soxr-0.5.0.post1 Note: you may need to restart the kernel to use updated packages.
#!/usr/bin/env python3 import torch import matplotlib.pyplot as plt from datasets import load_dataset from diffusers import DiffusionPipeline from transformers import ( WhisperForConditionalGeneration, WhisperProcessor, ) # Load the model and move it to the correct device model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small") processor = WhisperProcessor.from_pretrained("openai/whisper-small") # Load the dataset ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") # Get a sample from the dataset audio_sample = ds[3] # Prepare the data text = audio_sample["text"].lower() speech_data = audio_sample["audio"]["array"] # Load the Diffusion Pipeline diffuser_pipeline = DiffusionPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", custom_pipeline="speech_to_image_diffusion", speech_model=model, speech_processor=processor, torch_dtype=torch.float32, ) diffuser_pipeline.enable_attention_slicing() # Generate the image output = diffuser_pipeline(speech_data) plt.imshow(output.images[0]) plt.show()
Image in a Jupyter notebook