CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
huggingface

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

GitHub Repository: huggingface/notebooks
Path: blob/main/diffusers/training_example.ipynb
Views: 2535
Kernel: Python 3 (ipykernel)

Open In Colab

🤗 Training with Diffusers

In recent months, it has become clear that diffusion models have taken the throne as the state-of-the-art generative models. Here, we will use Hugging Face's brand new Diffusers library to train a simple diffusion model.

Installing the dependencies

This notebook leverages the 🤗 Datasets library to load and preprocess image datasets and the 🤗 Accelerate library to simplify training on any number of GPUs, with features like automatic gradient accumulation and tensorboard logging. Let's install them here:

%%capture !pip install diffusers[training]==0.11.1

To be able to share your model with the community, there are a few more steps to follow.|

First you have to store your authentication token from the Hugging Face website (sign up here if you haven't already!) then execute the following cell and input your write token:

from huggingface_hub import notebook_login notebook_login()
Login successful Your token has been saved to /root/.huggingface/token Authenticated through git-credential store but this isn't the helper defined on your machine. You might have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal in case you want to set this credential helper as the default git config --global credential.helper store

Then you need to install Git-LFS to upload your model checkpoints:

%%capture !sudo apt -qq install git-lfs !git config --global credential.helper store

Config

For convenience, we define a configuration grouping all the training hyperparameters. This would be similar to the arguments used for a training script. Here we choose reasonable defaults for hyperparameters like num_epochs, learning_rate, lr_warmup_steps, but feel free to adjust them if you train on your own dataset. For example, num_epochs can be increased to 100 for better visual quality.

from dataclasses import dataclass @dataclass class TrainingConfig: image_size = 128 # the generated image resolution train_batch_size = 16 eval_batch_size = 16 # how many images to sample during evaluation num_epochs = 50 gradient_accumulation_steps = 1 learning_rate = 1e-4 lr_warmup_steps = 500 save_image_epochs = 10 save_model_epochs = 30 mixed_precision = 'fp16' # `no` for float32, `fp16` for automatic mixed precision output_dir = 'ddpm-butterflies-128' # the model namy locally and on the HF Hub push_to_hub = True # whether to upload the saved model to the HF Hub hub_private_repo = False overwrite_output_dir = True # overwrite the old model when re-running the notebook seed = 0 config = TrainingConfig()

Loading the dataset

We will use the 🤗 Datasets library to download our image dataset.

In this case, the Butterflies dataset is hosted remotely, but you can load a local ImageFolder as shown in the commets below.

from datasets import load_dataset config.dataset_name = "huggan/smithsonian_butterflies_subset" dataset = load_dataset(config.dataset_name, split="train") # Feel free to try other datasets from https://hf.co/huggan/ too! # Here's is a dataset of flower photos: # config.dataset_name = "huggan/flowers-102-categories" # dataset = load_dataset(config.dataset_name, split="train") # Or just load images from a local folder! # config.dataset_name = "imagefolder" # dataset = load_dataset(config.dataset_name, data_dir="path/to/folder")
Using custom data configuration huggan--smithsonian_butterflies_subset-dca27cc6049931aa
Downloading and preparing dataset None/None (download: 226.45 MiB, generated: 226.74 MiB, post-processed: Unknown size, total: 453.19 MiB) to /root/.cache/huggingface/datasets/huggan___parquet/huggan--smithsonian_butterflies_subset-dca27cc6049931aa/0.0.0/7328ef7ee03eaf3f86ae40594d46a1cec86161704e02dd19f232d81eee72ade8...
Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/huggan___parquet/huggan--smithsonian_butterflies_subset-dca27cc6049931aa/0.0.0/7328ef7ee03eaf3f86ae40594d46a1cec86161704e02dd19f232d81eee72ade8. Subsequent calls will reuse this data.

The dataset contains several extra features (columns), but the one that we're interested in is image:

dataset
Dataset({ features: ['image_url', 'image_alt', 'id', 'name', 'scientific_name', 'gender', 'taxonomy', 'region', 'locality', 'date', 'usnm_no', 'guid', 'edan_url', 'source', 'stage', 'image', 'image_hash', 'sim_score'], num_rows: 1000 })

Since the Image feature loads the images with PIL, we can easily look at a few examples:

import matplotlib.pyplot as plt fig, axs = plt.subplots(1, 4, figsize=(16, 4)) for i, image in enumerate(dataset[:4]["image"]): axs[i].imshow(image) axs[i].set_axis_off() fig.show()
Image in a Jupyter notebook

The images in the dataset are all different, so we need to preprocess them first:

  • Resize makes the images conform to a square resolution of config.image_size

  • RandomHorizontalFlip augments the dataset by randomly mirroring the images.

  • Normalize is important to rescale the pixel values into a [-1, 1] range (which our model will expect).

from torchvision import transforms preprocess = transforms.Compose( [ transforms.Resize((config.image_size, config.image_size)), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize([0.5], [0.5]), ] )

🤗 Datasets offer a handy set_transform() method to apply the image transformations on the fly during training:

def transform(examples): images = [preprocess(image.convert("RGB")) for image in examples["image"]] return {"images": images} dataset.set_transform(transform)
Parameter 'transform'=<function transform at 0x7f7750910170> of the transform datasets.arrow_dataset.Dataset.set_format couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.

Let's see what they look like now

fig, axs = plt.subplots(1, 4, figsize=(16, 4)) for i, image in enumerate(dataset[:4]["images"]): axs[i].imshow(image.permute(1, 2, 0).numpy() / 2 + 0.5) axs[i].set_axis_off() fig.show()
Image in a Jupyter notebook

Now that all our images have the same size and are converted to tensors, we can create the dataloader we will use for training.

import torch train_dataloader = torch.utils.data.DataLoader(dataset, batch_size=config.train_batch_size, shuffle=True)

Defining the diffusion model

Here we set up our diffusion model. Diffusion models are neural networks that are trained to predict slightly less noisy images from a noisy input. At inference, they can be used to iteratively transform a random noise to generate an image:


Figure from DDPM paper (https://arxiv.org/abs/2006.11239).

Don't worry too much about the math if you're not familiar with it, the import part to remember is that our model corresponds to the arrow pθ(xt1xt)p_{\theta}(x_{t-1}|x_{t}) (which is a fancy way of saying: predict a slightly less noisy image).

The interesting part is that it's really easy to add some noise to an image, so the training can happen in a semi-supervised fashion as follows:

  1. Take an image from the training set.

  2. Apply to it some random noise tt times (this will give the xt1x_{t-1} and the xtx_{t} in the figure above).

  3. Give this noisy image to the model along with the value of tt.

  4. Compute a loss from the output of the model and the noised image xt1x_{t-1}.

Then we can apply gradient descent and repeat this process multiple times.

Most diffusion models use architectures that are some variant of a U-net and that's what we'll use here.

In a nutshell:

  • the model has the input image go through several blocks of ResNet layers which halves the image size by 2

  • then through the same number of blocks that upsample it again.

  • there are skip connections linking the features on the downample path to the corresponsding layers in the upsample path.

A key feature of this model is that it predicts images of the same size as the input, which is exactly what we need here.

Diffusers provides us a handy UNet2DModel class which creates the desired architecture in PyTorch.

Let's create a U-net for our desired image size. Note that down_block_types correspond to the downsampling blocks (green on the diagram above), and up_block_types are the upsampling blocks (red on the diagram):

from diffusers import UNet2DModel model = UNet2DModel( sample_size=config.image_size, # the target image resolution in_channels=3, # the number of input channels, 3 for RGB images out_channels=3, # the number of output channels layers_per_block=2, # how many ResNet layers to use per UNet block block_out_channels=(128, 128, 256, 256, 512, 512), # the number of output channes for each UNet block down_block_types=( "DownBlock2D", # a regular ResNet downsampling block "DownBlock2D", "DownBlock2D", "DownBlock2D", "AttnDownBlock2D", # a ResNet downsampling block with spatial self-attention "DownBlock2D", ), up_block_types=( "UpBlock2D", # a regular ResNet upsampling block "AttnUpBlock2D", # a ResNet upsampling block with spatial self-attention "UpBlock2D", "UpBlock2D", "UpBlock2D", "UpBlock2D" ), )

Let's get a sample image from our dataset and pass it into our model. We just need to add a batch dimension:

sample_image = dataset[0]['images'].unsqueeze(0) print('Input shape:', sample_image.shape)
Input shape: torch.Size([1, 3, 128, 128])

And let's check the output is a tensor of the same exact shape:

print('Output shape:', model(sample_image, timestep=0).sample.shape)
Output shape: torch.Size([1, 3, 128, 128])

Great!

Note that our model takes in the (noisy) image and also the current time-step (as we saw before in the training overview). That time-step information is converted for the model using a sinusoidal positional embedding, similar to what Transformer models often do.

Now that we have our model, we just need an object to add noise to an image. This is done by the schedulers in the diffusers library.

Defining the noise scheduler

Depending on the diffusion algorithm you want to use, the way images are noised is slightly different. That's why 🤗 Diffusers contains different scheduler classes which each define the algorithm-specific diffusion steps. Here we are going to use the DDPMScheduler which corresponds to the training denoising and training algorithm proposed in Denoising Diffusion Probabilistic Models.

from diffusers import DDPMScheduler noise_scheduler = DDPMScheduler(num_train_timesteps=1000)

Let's see how this noise scheduler works: it takes a batch of images from the trainng set (here we will reuse the batch of one image sample_image form before), a batch of random noise of the same shape and the timesteps for each image (which correspond to the number of times we want to apply noise to each image):

import torch from PIL import Image noise = torch.randn(sample_image.shape) timesteps = torch.LongTensor([50]) noisy_image = noise_scheduler.add_noise(sample_image, noise, timesteps) Image.fromarray(((noisy_image.permute(0, 2, 3, 1) + 1.0) * 127.5).type(torch.uint8).numpy()[0])
Image in a Jupyter notebook

In the DDPM algorithm, the training objective of the model is then to be able to predict the noise we used in noise_scheduler.add_noise, so the loss at this step would be:

import torch.nn.functional as F noise_pred = model(noisy_image, timesteps).sample loss = F.mse_loss(noise_pred, noise)

Setting up training

We have all we need to be able to train our model! Let's use a standard AdamW optimizer:

optimizer = torch.optim.AdamW(model.parameters(), lr=config.learning_rate)

And a cosine learning rate schedule:

from diffusers.optimization import get_cosine_schedule_with_warmup lr_scheduler = get_cosine_schedule_with_warmup( optimizer=optimizer, num_warmup_steps=config.lr_warmup_steps, num_training_steps=(len(train_dataloader) * config.num_epochs), )

To evaluate our model, we use the DDPMPipeline which is an easy way to perform end-to-end inference (see this notebook [TODO link] for more detail). We will use this pipeline to generate a batch of sample images and save it as a grid to the disk.

from diffusers import DDPMPipeline import math def make_grid(images, rows, cols): w, h = images[0].size grid = Image.new('RGB', size=(cols*w, rows*h)) for i, image in enumerate(images): grid.paste(image, box=(i%cols*w, i//cols*h)) return grid def evaluate(config, epoch, pipeline): # Sample some images from random noise (this is the backward diffusion process). # The default pipeline output type is `List[PIL.Image]` images = pipeline( batch_size = config.eval_batch_size, generator=torch.manual_seed(config.seed), ).images # Make a grid out of the images image_grid = make_grid(images, rows=4, cols=4) # Save the images test_dir = os.path.join(config.output_dir, "samples") os.makedirs(test_dir, exist_ok=True) image_grid.save(f"{test_dir}/{epoch:04d}.png")

With this in end, we can group all together and write our training function. This just wraps the training step we saw in the previous section in a loop, using Accelerate for easy TensorBoard logging, gradient accumulation, mixed precision training and multi-GPUs or TPU training.

from accelerate import Accelerator from huggingface_hub import HfFolder, Repository, whoami from tqdm.auto import tqdm from pathlib import Path import os def get_full_repo_name(model_id: str, organization: str = None, token: str = None): if token is None: token = HfFolder.get_token() if organization is None: username = whoami(token)["name"] return f"{username}/{model_id}" else: return f"{organization}/{model_id}" def train_loop(config, model, noise_scheduler, optimizer, train_dataloader, lr_scheduler): # Initialize accelerator and tensorboard logging accelerator = Accelerator( mixed_precision=config.mixed_precision, gradient_accumulation_steps=config.gradient_accumulation_steps, log_with="tensorboard", logging_dir=os.path.join(config.output_dir, "logs") ) if accelerator.is_main_process: if config.push_to_hub: repo_name = get_full_repo_name(Path(config.output_dir).name) repo = Repository(config.output_dir, clone_from=repo_name) elif config.output_dir is not None: os.makedirs(config.output_dir, exist_ok=True) accelerator.init_trackers("train_example") # Prepare everything # There is no specific order to remember, you just need to unpack the # objects in the same order you gave them to the prepare method. model, optimizer, train_dataloader, lr_scheduler = accelerator.prepare( model, optimizer, train_dataloader, lr_scheduler ) global_step = 0 # Now you train the model for epoch in range(config.num_epochs): progress_bar = tqdm(total=len(train_dataloader), disable=not accelerator.is_local_main_process) progress_bar.set_description(f"Epoch {epoch}") for step, batch in enumerate(train_dataloader): clean_images = batch['images'] # Sample noise to add to the images noise = torch.randn(clean_images.shape).to(clean_images.device) bs = clean_images.shape[0] # Sample a random timestep for each image timesteps = torch.randint(0, noise_scheduler.num_train_timesteps, (bs,), device=clean_images.device).long() # Add noise to the clean images according to the noise magnitude at each timestep # (this is the forward diffusion process) noisy_images = noise_scheduler.add_noise(clean_images, noise, timesteps) with accelerator.accumulate(model): # Predict the noise residual noise_pred = model(noisy_images, timesteps, return_dict=False)[0] loss = F.mse_loss(noise_pred, noise) accelerator.backward(loss) accelerator.clip_grad_norm_(model.parameters(), 1.0) optimizer.step() lr_scheduler.step() optimizer.zero_grad() progress_bar.update(1) logs = {"loss": loss.detach().item(), "lr": lr_scheduler.get_last_lr()[0], "step": global_step} progress_bar.set_postfix(**logs) accelerator.log(logs, step=global_step) global_step += 1 # After each epoch you optionally sample some demo images with evaluate() and save the model if accelerator.is_main_process: pipeline = DDPMPipeline(unet=accelerator.unwrap_model(model), scheduler=noise_scheduler) if (epoch + 1) % config.save_image_epochs == 0 or epoch == config.num_epochs - 1: evaluate(config, epoch, pipeline) if (epoch + 1) % config.save_model_epochs == 0 or epoch == config.num_epochs - 1: if config.push_to_hub: repo.push_to_hub(commit_message=f"Epoch {epoch}", blocking=True) else: pipeline.save_pretrained(config.output_dir)

Let's train!

Let's launch the training (including multi-GPU training) from the notebook using Accelerate's notebook_launcher function:

from accelerate import notebook_launcher args = (config, model, noise_scheduler, optimizer, train_dataloader, lr_scheduler) notebook_launcher(train_loop, args, num_processes=1)
Launching training on one GPU.
Cloning https://huggingface.co/anton-l/ddpm-butterflies-128 into local empty directory.
Adding files tracked by Git LFS: ['samples/0009.png', 'samples/0019.png', 'samples/0029.png']. This may take a bit of time if the files are large.
To https://huggingface.co/anton-l/ddpm-butterflies-128 fe22ac0..1b22dd2 main -> main To https://huggingface.co/anton-l/ddpm-butterflies-128 1b22dd2..f5625e7 main -> main
Adding files tracked by Git LFS: ['samples/0039.png', 'samples/0049.png']. This may take a bit of time if the files are large.
To https://huggingface.co/anton-l/ddpm-butterflies-128 f5625e7..c180b35 main -> main

Let's have a look at the final image grid produced by the trained diffusion model:

import glob sample_images = sorted(glob.glob(f"{config.output_dir}/samples/*.png")) Image.open(sample_images[-1])
Image in a Jupyter notebook

Not bad! There's room for improvement of course, so feel free to play with the hyperparameters, model definition and image augmentations 🤗

If you've chosen to upload the model to the Hugging Face Hub, its repository should now look like so: https://huggingface.co/anton-l/ddpm-butterflies-128/tree/main

If you want to dive deeper into the code, we also have more advanced training scripts with features like Exponential Moving Average of model weights here:

https://github.com/huggingface/diffusers/tree/main/examples