Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/main/diffusers/exploring_simple optimizations_for_sdxl.ipynb
Views: 2535
Kernel: Python 3
Exploring simple optimizations for Stable Diffusion XL
In [ ]:
In [ ]:
Unoptimized setup
FP32 computation
Default attention processor
In [ ]:
In [ ]:
In [ ]:
In [ ]:
Just FP16
In [ ]:
In [ ]:
FP16 + SDPA
In [ ]:
In [ ]:
From here on, we refer to "FP16 + SDPA" as the default setting.
Default + torch.compile()
In [ ]:
In [ ]:
Default + Model CPU Offloading
Here we focus more on the memory optimization rather than inference speed.
In [ ]:
In [ ]:
Default + Sequential CPU Offloading
In [ ]:
In [ ]:
Default + VAE Slicing
Specifically suited for optimizing memory for decoding latents into higher-res images without compromising too much on the inference speed.
In [ ]:
In [ ]:
Default + VAE Slicing + Sequential CPU Offloading
In [ ]:
In [ ]:
Default + Precompting text embeddings
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
Default + Tiny Autoencoder
This is better suited for generating (almost) instant previews. The "instant" part is of course, GPU-dependent. On an A10G, for example, it can be achieved.
In [ ]: