Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/main/diffusers/unidiffuser.ipynb
Views: 2535
Generating images and text with UniDiffuser
UniDiffuser was introduced in One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale.
In this notebook, we will show how the UniDiffuser pipeline in 🧨 diffusers can be used for:
Unconditional image generation
Unconditional text generation
Text-to-image generation
Image-to-text generation
Image variation
Text variation
One pipeline to rule six use cases 🤯
Let's start!
Setup
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 224.5/224.5 kB 18.0 MB/s eta 0:00:00
Building wheel for diffusers (pyproject.toml) ... done
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.1/7.1 MB 104.1 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 219.1/219.1 kB 27.3 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 105.7 MB/s eta 0:00:00
Unconditional image and text generation
Throughout this notebook, we'll be using the "thu-ml/unidiffuser-v1" checkpoint. UniDiffuser comes with two checkpoints:
You can also generate only an image or only text (which the UniDiffuser paper calls “marginal” generation since we sample from the marginal distribution of images and text, respectively):
To reset a mode, call: pipe.reset_mode()
.
Text-to-image generation
The UniDiffuserPipeline
can infer the right mode of execution from provided inputs to the pipeline called. Since we started with the joint unconditional mode (set_joint_mode()
), the subsequent calls will be executed in this model. Now, we want to generate images from text. So, we set the model accordingly.
Image-to-text generation
Image variation
For image variation, we follow a "round-trip" method as suggested in the paper. We first generate a caption from a given image. And then use the caption to generate a image from it.
Text variation
The same round-trip methodology can be applied here.