Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
huggingface
GitHub Repository: huggingface/notebooks
Path: blob/main/diffusers_doc/ko/kandinsky.ipynb
5550 views
Kernel: Unknown Kernel

Kandinsky

Kandinsky ๋ชจ๋ธ์€ ์ผ๋ จ์˜ ๋‹ค๊ตญ์–ด text-to-image ์ƒ์„ฑ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. Kandinsky 2.0 ๋ชจ๋ธ์€ ๋‘ ๊ฐœ์˜ ๋‹ค๊ตญ์–ด ํ…์ŠคํŠธ ์ธ์ฝ”๋”๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ์—ฐ๊ฒฐํ•ด UNet์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

Kandinsky 2.1์€ ํ…์ŠคํŠธ์™€ ์ด๋ฏธ์ง€ ์ž„๋ฒ ๋”ฉ ๊ฐ„์˜ ๋งคํ•‘์„ ์ƒ์„ฑํ•˜๋Š” image prior ๋ชจ๋ธ(CLIP)์„ ํฌํ•จํ•˜๋„๋ก ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋ณ€๊ฒฝํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋งคํ•‘์€ ๋” ๋‚˜์€ text-image alignment๋ฅผ ์ œ๊ณตํ•˜๋ฉฐ, ํ•™์Šต ์ค‘์— ํ…์ŠคํŠธ ์ž„๋ฒ ๋”ฉ๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉ๋˜์–ด ๋” ๋†’์€ ํ’ˆ์งˆ์˜ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, Kandinsky 2.1์€ spatial conditional ์ •๊ทœํ™” ๋ ˆ์ด์–ด๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์‚ฌ์‹ค๊ฐ์„ ๋†’์—ฌ์ฃผ๋Š” Modulating Quantized Vectors (MoVQ) ๋””์ฝ”๋”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ latents๋ฅผ ์ด๋ฏธ์ง€๋กœ ๋””์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค.

Kandinsky 2.2๋Š” image prior ๋ชจ๋ธ์˜ ์ด๋ฏธ์ง€ ์ธ์ฝ”๋”๋ฅผ ๋” ํฐ CLIP-ViT-G ๋ชจ๋ธ๋กœ ๊ต์ฒดํ•˜์—ฌ ํ’ˆ์งˆ์„ ๊ฐœ์„ ํ•จ์œผ๋กœ์จ ์ด์ „ ๋ชจ๋ธ์„ ๊ฐœ์„ ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ image prior ๋ชจ๋ธ์€ ํ•ด์ƒ๋„์™€ ์ข…ํšก๋น„๊ฐ€ ๋‹ค๋ฅธ ์ด๋ฏธ์ง€๋กœ ์žฌํ›ˆ๋ จ๋˜์–ด ๋” ๋†’์€ ํ•ด์ƒ๋„์˜ ์ด๋ฏธ์ง€์™€ ๋‹ค์–‘ํ•œ ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

Kandinsky 3๋Š” ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋‹จ์ˆœํ™”ํ•˜๊ณ  prior ๋ชจ๋ธ๊ณผ diffusion ๋ชจ๋ธ์„ ํฌํ•จํ•˜๋Š” 2๋‹จ๊ณ„ ์ƒ์„ฑ ํ”„๋กœ์„ธ์Šค์—์„œ ๋ฒ—์–ด๋‚˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋Œ€์‹ , Kandinsky 3๋Š” Flan-UL2๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ…์ŠคํŠธ๋ฅผ ์ธ์ฝ”๋”ฉํ•˜๊ณ , BigGan-deep ๋ธ”๋ก์ด ํฌํ•จ๋œ UNet์„ ์‚ฌ์šฉํ•˜๋ฉฐ, Sber-MoVQGAN์„ ์‚ฌ์šฉํ•˜์—ฌ latents๋ฅผ ์ด๋ฏธ์ง€๋กœ ๋””์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค. ํ…์ŠคํŠธ ์ดํ•ด์™€ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€ ํ’ˆ์งˆ์€ ์ฃผ๋กœ ๋” ํฐ ํ…์ŠคํŠธ ์ธ์ฝ”๋”์™€ UNet์„ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ๋‹ฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” text-to-image, image-to-image, ์ธํŽ˜์ธํŒ…, ๋ณด๊ฐ„ ๋“ฑ์„ ์œ„ํ•ด Kandinsky ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ๋‹ค์Œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”:

# Colab์—์„œ ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์„ค์น˜ํ•˜๊ธฐ ์œ„ํ•ด ์ฃผ์„์„ ์ œ์™ธํ•˜์„ธ์š” #!pip install -q diffusers transformers accelerate

Kandinsky 2.1๊ณผ 2.2์˜ ์‚ฌ์šฉ๋ฒ•์€ ๋งค์šฐ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค! ์œ ์ผํ•œ ์ฐจ์ด์ ์€ Kandinsky 2.2๋Š” latents๋ฅผ ๋””์ฝ”๋”ฉํ•  ๋•Œ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋Œ€์‹ , Kandinsky 2.2๋Š” ๋””์ฝ”๋”ฉ ์ค‘์—๋Š” image_embeds๋งŒ ๋ฐ›์•„๋“ค์ž…๋‹ˆ๋‹ค.


Kandinsky 3๋Š” ๋” ๊ฐ„๊ฒฐํ•œ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ prior ๋ชจ๋ธ์ด ํ•„์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ฆ‰, Stable Diffusion XL๊ณผ ๊ฐ™์€ ๋‹ค๋ฅธ diffusion ๋ชจ๋ธ๊ณผ ์‚ฌ์šฉ๋ฒ•์ด ๋™์ผํ•ฉ๋‹ˆ๋‹ค.

Text-to-image

๋ชจ๋“  ์ž‘์—…์— Kandinsky ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ํ•ญ์ƒ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ธ์ฝ”๋”ฉํ•˜๊ณ  ์ด๋ฏธ์ง€ ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•˜๋Š” prior ํŒŒ์ดํ”„๋ผ์ธ์„ ์„ค์ •ํ•˜๋Š” ๊ฒƒ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด์ „ ํŒŒ์ดํ”„๋ผ์ธ์€ negative ํ”„๋กฌํ”„ํŠธ ""์— ํ•ด๋‹นํ•˜๋Š” negative_image_embeds๋„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋” ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์œผ๋ ค๋ฉด ์ด์ „ ํŒŒ์ดํ”„๋ผ์ธ์— ์‹ค์ œ negative_prompt๋ฅผ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด prior ํŒŒ์ดํ”„๋ผ์ธ์˜ ์œ ํšจ ๋ฐฐ์น˜ ํฌ๊ธฐ๊ฐ€ 2๋ฐฐ๋กœ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

from diffusers import KandinskyPriorPipeline, KandinskyPipeline import torch prior_pipeline = KandinskyPriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16).to("cuda") pipeline = KandinskyPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16).to("cuda") prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting" negative_prompt = "low quality, bad quality" # negative ํ”„๋กฌํ”„ํŠธ ํฌํ•จ์€ ์„ ํƒ์ ์ด์ง€๋งŒ, ๋ณดํ†ต ๊ฒฐ๊ณผ๋Š” ๋” ์ข‹์Šต๋‹ˆ๋‹ค image_embeds, negative_image_embeds = prior_pipeline(prompt, negative_prompt, guidance_scale=1.0).to_tuple()

์ด์ œ ๋ชจ๋“  ํ”„๋กฌํ”„ํŠธ์™€ ์ž„๋ฒ ๋”ฉ์„ KandinskyPipeline์— ์ „๋‹ฌํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:

image = pipeline(prompt, image_embeds=image_embeds, negative_prompt=negative_prompt, negative_image_embeds=negative_image_embeds, height=768, width=768).images[0] image
from diffusers import KandinskyV22PriorPipeline, KandinskyV22Pipeline import torch prior_pipeline = KandinskyV22PriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16).to("cuda") pipeline = KandinskyV22Pipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16).to("cuda") prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting" negative_prompt = "low quality, bad quality" # negative ํ”„๋กฌํ”„ํŠธ ํฌํ•จ์€ ์„ ํƒ์ ์ด์ง€๋งŒ, ๋ณดํ†ต ๊ฒฐ๊ณผ๋Š” ๋” ์ข‹์Šต๋‹ˆ๋‹ค image_embeds, negative_image_embeds = prior_pipeline(prompt, guidance_scale=1.0).to_tuple()

์ด๋ฏธ์ง€ ์ƒ์„ฑ์„ ์œ„ํ•ด image_embeds์™€ negative_image_embeds๋ฅผ KandinskyV22Pipeline์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค:

image = pipeline(image_embeds=image_embeds, negative_image_embeds=negative_image_embeds, height=768, width=768).images[0] image

Kandinsky 3๋Š” prior ๋ชจ๋ธ์ด ํ•„์š”ํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ Kandinsky3Pipeline์„ ์ง์ ‘ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์ด๋ฏธ์ง€ ์ƒ์„ฑ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

from diffusers import Kandinsky3Pipeline import torch pipeline = Kandinsky3Pipeline.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16) pipeline.enable_model_cpu_offload() prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting" image = pipeline(prompt).images[0] image

๐Ÿค— Diffusers๋Š” ๋˜ํ•œ KandinskyCombinedPipeline ๋ฐ KandinskyV22CombinedPipeline์ด ํฌํ•จ๋œ end-to-end API๋ฅผ ์ œ๊ณตํ•˜๋ฏ€๋กœ prior ํŒŒ์ดํ”„๋ผ์ธ๊ณผ text-to-image ๋ณ€ํ™˜ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ณ„๋„๋กœ ๋ถˆ๋Ÿฌ์˜ฌ ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ๊ฒฐํ•ฉ๋œ ํŒŒ์ดํ”„๋ผ์ธ์€ prior ๋ชจ๋ธ๊ณผ ๋””์ฝ”๋”๋ฅผ ๋ชจ๋‘ ์ž๋™์œผ๋กœ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค. ์›ํ•˜๋Š” ๊ฒฝ์šฐ prior_guidance_scale ๋ฐ prior_num_inference_steps ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ prior ํŒŒ์ดํ”„๋ผ์ธ์— ๋Œ€ํ•ด ๋‹ค๋ฅธ ๊ฐ’์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‚ด๋ถ€์—์„œ ๊ฒฐํ•ฉ๋œ ํŒŒ์ดํ”„๋ผ์ธ์„ ์ž๋™์œผ๋กœ ํ˜ธ์ถœํ•˜๋ ค๋ฉด AutoPipelineForText2Image๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:

from diffusers import AutoPipelineForText2Image import torch pipeline = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16) pipeline.enable_model_cpu_offload() prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting" negative_prompt = "low quality, bad quality" image = pipeline(prompt=prompt, negative_prompt=negative_prompt, prior_guidance_scale=1.0, guidance_scale=4.0, height=768, width=768).images[0] image
from diffusers import AutoPipelineForText2Image import torch pipeline = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16) pipeline.enable_model_cpu_offload() prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting" negative_prompt = "low quality, bad quality" image = pipeline(prompt=prompt, negative_prompt=negative_prompt, prior_guidance_scale=1.0, guidance_scale=4.0, height=768, width=768).images[0] image

Image-to-image

Image-to-image ๊ฒฝ์šฐ, ์ดˆ๊ธฐ ์ด๋ฏธ์ง€์™€ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ „๋‹ฌํ•˜์—ฌ ํŒŒ์ดํ”„๋ผ์ธ์— ์ด๋ฏธ์ง€๋ฅผ conditioningํ•ฉ๋‹ˆ๋‹ค. Prior ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๊ฒƒ์œผ๋กœ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค:

import torch from diffusers import KandinskyImg2ImgPipeline, KandinskyPriorPipeline prior_pipeline = KandinskyPriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16, use_safetensors=True).to("cuda") pipeline = KandinskyImg2ImgPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16, use_safetensors=True).to("cuda")
import torch from diffusers import KandinskyV22Img2ImgPipeline, KandinskyPriorPipeline prior_pipeline = KandinskyPriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16, use_safetensors=True).to("cuda") pipeline = KandinskyV22Img2ImgPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, use_safetensors=True).to("cuda")

Kandinsky 3๋Š” prior ๋ชจ๋ธ์ด ํ•„์š”ํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ image-to-image ํŒŒ์ดํ”„๋ผ์ธ์„ ์ง์ ‘ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

from diffusers import Kandinsky3Img2ImgPipeline from diffusers.utils import load_image import torch pipeline = Kandinsky3Img2ImgPipeline.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16) pipeline.enable_model_cpu_offload()

Conditioningํ•  ์ด๋ฏธ์ง€๋ฅผ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค:

from diffusers.utils import load_image # ์ด๋ฏธ์ง€ ๋‹ค์šด๋กœ๋“œ url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" original_image = load_image(url) original_image = original_image.resize((768, 512))

Prior ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ image_embeds์™€ negative_image_embeds๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:

prompt = "A fantasy landscape, Cinematic lighting" negative_prompt = "low quality, bad quality" image_embeds, negative_image_embeds = prior_pipeline(prompt, negative_prompt).to_tuple()

์ด์ œ ์›๋ณธ ์ด๋ฏธ์ง€์™€ ๋ชจ๋“  ํ”„๋กฌํ”„ํŠธ ๋ฐ ์ž„๋ฒ ๋”ฉ์„ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ์ „๋‹ฌํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:

from diffusers.utils import make_image_grid image = pipeline(prompt, negative_prompt=negative_prompt, image=original_image, image_embeds=image_embeds, negative_image_embeds=negative_image_embeds, height=768, width=768, strength=0.3).images[0] make_image_grid([original_image.resize((512, 512)), image.resize((512, 512))], rows=1, cols=2)
from diffusers.utils import make_image_grid image = pipeline(image=original_image, image_embeds=image_embeds, negative_image_embeds=negative_image_embeds, height=768, width=768, strength=0.3).images[0] make_image_grid([original_image.resize((512, 512)), image.resize((512, 512))], rows=1, cols=2)
image = pipeline(prompt, negative_prompt=negative_prompt, image=image, strength=0.75, num_inference_steps=25).images[0] image

๋˜ํ•œ ๐Ÿค— Diffusers์—์„œ๋Š” KandinskyImg2ImgCombinedPipeline ๋ฐ KandinskyV22Img2ImgCombinedPipeline์ด ํฌํ•จ๋œ end-to-end API๋ฅผ ์ œ๊ณตํ•˜๋ฏ€๋กœ prior ํŒŒ์ดํ”„๋ผ์ธ๊ณผ image-to-image ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ณ„๋„๋กœ ๋ถˆ๋Ÿฌ์˜ฌ ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ๊ฒฐํ•ฉ๋œ ํŒŒ์ดํ”„๋ผ์ธ์€ prior ๋ชจ๋ธ๊ณผ ๋””์ฝ”๋”๋ฅผ ๋ชจ๋‘ ์ž๋™์œผ๋กœ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค. ์›ํ•˜๋Š” ๊ฒฝ์šฐ prior_guidance_scale ๋ฐ prior_num_inference_steps ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด์ „ ํŒŒ์ดํ”„๋ผ์ธ์— ๋Œ€ํ•ด ๋‹ค๋ฅธ ๊ฐ’์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‚ด๋ถ€์—์„œ ๊ฒฐํ•ฉ๋œ ํŒŒ์ดํ”„๋ผ์ธ์„ ์ž๋™์œผ๋กœ ํ˜ธ์ถœํ•˜๋ ค๋ฉด AutoPipelineForImage2Image๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:

from diffusers import AutoPipelineForImage2Image from diffusers.utils import make_image_grid, load_image import torch pipeline = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16, use_safetensors=True) pipeline.enable_model_cpu_offload() prompt = "A fantasy landscape, Cinematic lighting" negative_prompt = "low quality, bad quality" url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" original_image = load_image(url) original_image.thumbnail((768, 768)) image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=original_image, strength=0.3).images[0] make_image_grid([original_image.resize((512, 512)), image.resize((512, 512))], rows=1, cols=2)
from diffusers import AutoPipelineForImage2Image from diffusers.utils import make_image_grid, load_image import torch pipeline = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16) pipeline.enable_model_cpu_offload() prompt = "A fantasy landscape, Cinematic lighting" negative_prompt = "low quality, bad quality" url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" original_image = load_image(url) original_image.thumbnail((768, 768)) image = pipeline(prompt=prompt, negative_prompt=negative_prompt, image=original_image, strength=0.3).images[0] make_image_grid([original_image.resize((512, 512)), image.resize((512, 512))], rows=1, cols=2)

Inpainting

โš ๏ธ Kandinsky ๋ชจ๋ธ์€ ์ด์ œ ๊ฒ€์€์ƒ‰ ํ”ฝ์…€ ๋Œ€์‹  โฌœ๏ธ ํฐ์ƒ‰ ํ”ฝ์…€์„ ์‚ฌ์šฉํ•˜์—ฌ ๋งˆ์Šคํฌ ์˜์—ญ์„ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค. ํ”„๋กœ๋•์…˜์—์„œ KandinskyInpaintPipeline์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ํฐ์ƒ‰ ํ”ฝ์…€์„ ์‚ฌ์šฉํ•˜๋„๋ก ๋งˆ์Šคํฌ๋ฅผ ๋ณ€๊ฒฝํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:

# PIL ์ž…๋ ฅ์— ๋Œ€ํ•ด import PIL.ImageOps mask = PIL.ImageOps.invert(mask) # PyTorch์™€ NumPy ์ž…๋ ฅ์— ๋Œ€ํ•ด mask = 1 - mask

์ธํŽ˜์ธํŒ…์—์„œ๋Š” ์›๋ณธ ์ด๋ฏธ์ง€, ์›๋ณธ ์ด๋ฏธ์ง€์—์„œ ๋Œ€์ฒดํ•  ์˜์—ญ์˜ ๋งˆ์Šคํฌ, ์ธํŽ˜์ธํŒ…ํ•  ๋‚ด์šฉ์— ๋Œ€ํ•œ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. Prior ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค:

from diffusers import KandinskyInpaintPipeline, KandinskyPriorPipeline from diffusers.utils import load_image, make_image_grid import torch import numpy as np from PIL import Image prior_pipeline = KandinskyPriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16, use_safetensors=True).to("cuda") pipeline = KandinskyInpaintPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-inpaint", torch_dtype=torch.float16, use_safetensors=True).to("cuda")
from diffusers import KandinskyV22InpaintPipeline, KandinskyV22PriorPipeline from diffusers.utils import load_image, make_image_grid import torch import numpy as np from PIL import Image prior_pipeline = KandinskyV22PriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16, use_safetensors=True).to("cuda") pipeline = KandinskyV22InpaintPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16, use_safetensors=True).to("cuda")

์ดˆ๊ธฐ ์ด๋ฏธ์ง€๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ๋งˆ์Šคํฌ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:

init_image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/cat.png") mask = np.zeros((768, 768), dtype=np.float32) # mask area above cat's head mask[:250, 250:-250] = 1

Prior ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:

prompt = "a hat" prior_output = prior_pipeline(prompt)

์ด์ œ ์ด๋ฏธ์ง€ ์ƒ์„ฑ์„ ์œ„ํ•ด ์ดˆ๊ธฐ ์ด๋ฏธ์ง€, ๋งˆ์Šคํฌ, ํ”„๋กฌํ”„ํŠธ์™€ ์ž„๋ฒ ๋”ฉ์„ ํŒŒ์ดํ”„๋ผ์ธ์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค:

output_image = pipeline(prompt, image=init_image, mask_image=mask, **prior_output, height=768, width=768, num_inference_steps=150).images[0] mask = Image.fromarray((mask*255).astype('uint8'), 'L') make_image_grid([init_image, mask, output_image], rows=1, cols=3)
output_image = pipeline(image=init_image, mask_image=mask, **prior_output, height=768, width=768, num_inference_steps=150).images[0] mask = Image.fromarray((mask*255).astype('uint8'), 'L') make_image_grid([init_image, mask, output_image], rows=1, cols=3)

KandinskyInpaintCombinedPipeline ๋ฐ KandinskyV22InpaintCombinedPipeline์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‚ด๋ถ€์—์„œ prior ๋ฐ ๋””์ฝ”๋” ํŒŒ์ดํ”„๋ผ์ธ์„ ํ•จ๊ป˜ ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด AutoPipelineForInpainting์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:

import torch import numpy as np from PIL import Image from diffusers import AutoPipelineForInpainting from diffusers.utils import load_image, make_image_grid pipe = AutoPipelineForInpainting.from_pretrained("kandinsky-community/kandinsky-2-1-inpaint", torch_dtype=torch.float16) pipe.enable_model_cpu_offload() init_image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/cat.png") mask = np.zeros((768, 768), dtype=np.float32) # ๊ณ ์–‘์ด ๋จธ๋ฆฌ ์œ„ ๋งˆ์Šคํฌ ์ง€์—ญ mask[:250, 250:-250] = 1 prompt = "a hat" output_image = pipe(prompt=prompt, image=init_image, mask_image=mask).images[0] mask = Image.fromarray((mask*255).astype('uint8'), 'L') make_image_grid([init_image, mask, output_image], rows=1, cols=3)
import torch import numpy as np from PIL import Image from diffusers import AutoPipelineForInpainting from diffusers.utils import load_image, make_image_grid pipe = AutoPipelineForInpainting.from_pretrained("kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16) pipe.enable_model_cpu_offload() init_image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/cat.png") mask = np.zeros((768, 768), dtype=np.float32) # ๊ณ ์–‘์ด ๋จธ๋ฆฌ ์œ„ ๋งˆ์Šคํฌ ์˜์—ญ mask[:250, 250:-250] = 1 prompt = "a hat" output_image = pipe(prompt=prompt, image=original_image, mask_image=mask).images[0] mask = Image.fromarray((mask*255).astype('uint8'), 'L') make_image_grid([init_image, mask, output_image], rows=1, cols=3)

Interpolation (๋ณด๊ฐ„)

Interpolation(๋ณด๊ฐ„)์„ ์‚ฌ์šฉํ•˜๋ฉด ์ด๋ฏธ์ง€์™€ ํ…์ŠคํŠธ ์ž„๋ฒ ๋”ฉ ์‚ฌ์ด์˜ latent space๋ฅผ ํƒ์ƒ‰ํ•  ์ˆ˜ ์žˆ์–ด prior ๋ชจ๋ธ์˜ ์ค‘๊ฐ„ ๊ฒฐ๊ณผ๋ฌผ์„ ๋ณผ ์ˆ˜ ์žˆ๋Š” ๋ฉ‹์ง„ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. Prior ํŒŒ์ดํ”„๋ผ์ธ๊ณผ ๋ณด๊ฐ„ํ•˜๋ ค๋Š” ๋‘ ๊ฐœ์˜ ์ด๋ฏธ์ง€๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค:

from diffusers import KandinskyPriorPipeline, KandinskyPipeline from diffusers.utils import load_image, make_image_grid import torch prior_pipeline = KandinskyPriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16, use_safetensors=True).to("cuda") img_1 = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/cat.png") img_2 = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/starry_night.jpeg") make_image_grid([img_1.resize((512,512)), img_2.resize((512,512))], rows=1, cols=2)
from diffusers import KandinskyV22PriorPipeline, KandinskyV22Pipeline from diffusers.utils import load_image, make_image_grid import torch prior_pipeline = KandinskyV22PriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16, use_safetensors=True).to("cuda") img_1 = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/cat.png") img_2 = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky/starry_night.jpeg") make_image_grid([img_1.resize((512,512)), img_2.resize((512,512))], rows=1, cols=2)
a cat
Van Gogh's Starry Night painting

๋ณด๊ฐ„ํ•  ํ…์ŠคํŠธ ๋˜๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ง€์ •ํ•˜๊ณ  ๊ฐ ํ…์ŠคํŠธ ๋˜๋Š” ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ€์ค‘์น˜๋ฅผ ์‹คํ—˜ํ•˜์—ฌ ๋ณด๊ฐ„์— ์–ด๋–ค ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”!

images_texts = ["a cat", img_1, img_2] weights = [0.3, 0.3, 0.4]

interpolate ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•œ ๋‹ค์Œ, ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ์ „๋‹ฌํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:

# ํ”„๋กฌํ”„ํŠธ๋Š” ๋นˆ์นธ์œผ๋กœ ๋‚จ๊ฒจ๋„ ๋ฉ๋‹ˆ๋‹ค prompt = "" prior_out = prior_pipeline.interpolate(images_texts, weights) pipeline = KandinskyPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16, use_safetensors=True).to("cuda") image = pipeline(prompt, **prior_out, height=768, width=768).images[0] image
# ํ”„๋กฌํ”„ํŠธ๋Š” ๋นˆ์นธ์œผ๋กœ ๋‚จ๊ฒจ๋„ ๋ฉ๋‹ˆ๋‹ค prompt = "" prior_out = prior_pipeline.interpolate(images_texts, weights) pipeline = KandinskyV22Pipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, use_safetensors=True).to("cuda") image = pipeline(prompt, **prior_out, height=768, width=768).images[0] image

ControlNet

โš ๏ธ ControlNet์€ Kandinsky 2.2์—์„œ๋งŒ ์ง€์›๋ฉ๋‹ˆ๋‹ค!

ControlNet์„ ์‚ฌ์šฉํ•˜๋ฉด depth map์ด๋‚˜ edge detection์™€ ๊ฐ™์€ ์ถ”๊ฐ€ ์ž…๋ ฅ์„ ํ†ตํ•ด ์‚ฌ์ „ํ•™์Šต๋œ large diffusion ๋ชจ๋ธ์„ conditioningํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋ชจ๋ธ์ด depth map์˜ ๊ตฌ์กฐ๋ฅผ ์ดํ•ดํ•˜๊ณ  ๋ณด์กดํ•  ์ˆ˜ ์žˆ๋„๋ก ๊นŠ์ด ๋งต์œผ๋กœ Kandinsky 2.2๋ฅผ conditioningํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด๋ฏธ์ง€๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  depth map์„ ์ถ”์ถœํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:

from diffusers.utils import load_image img = load_image( "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinskyv22/cat.png" ).resize((768, 768)) img

๊ทธ๋Ÿฐ ๋‹ค์Œ ๐Ÿค— Transformers์˜ depth-estimation Pipeline์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ฒ˜๋ฆฌํ•ด depth map์„ ๊ตฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

import torch import numpy as np from transformers import pipeline def make_hint(image, depth_estimator): image = depth_estimator(image)["depth"] image = np.array(image) image = image[:, :, None] image = np.concatenate([image, image, image], axis=2) detected_map = torch.from_numpy(image).float() / 255.0 hint = detected_map.permute(2, 0, 1) return hint depth_estimator = pipeline("depth-estimation") hint = make_hint(img, depth_estimator).unsqueeze(0).half().to("cuda")

Text-to-image [[controlnet-text-to-image]]

Prior ํŒŒ์ดํ”„๋ผ์ธ๊ณผ KandinskyV22ControlnetPipeline๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค:

from diffusers import KandinskyV22PriorPipeline, KandinskyV22ControlnetPipeline prior_pipeline = KandinskyV22PriorPipeline.from_pretrained( "kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16, use_safetensors=True ).to("cuda") pipeline = KandinskyV22ControlnetPipeline.from_pretrained( "kandinsky-community/kandinsky-2-2-controlnet-depth", torch_dtype=torch.float16 ).to("cuda")

ํ”„๋กฌํ”„ํŠธ์™€ negative ํ”„๋กฌํ”„ํŠธ๋กœ ์ด๋ฏธ์ง€ ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:

prompt = "A robot, 4k photo" negative_prior_prompt = "lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, username, watermark, signature" generator = torch.Generator(device="cuda").manual_seed(43) image_emb, zero_image_emb = prior_pipeline( prompt=prompt, negative_prompt=negative_prior_prompt, generator=generator ).to_tuple()

๋งˆ์ง€๋ง‰์œผ๋กœ ์ด๋ฏธ์ง€ ์ž„๋ฒ ๋”ฉ๊ณผ depth ์ด๋ฏธ์ง€๋ฅผ KandinskyV22ControlnetPipeline์— ์ „๋‹ฌํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:

image = pipeline(image_embeds=image_emb, negative_image_embeds=zero_image_emb, hint=hint, num_inference_steps=50, generator=generator, height=768, width=768).images[0] image

Image-to-image [[controlnet-image-to-image]]

ControlNet์„ ์‚ฌ์šฉํ•œ image-to-image์˜ ๊ฒฝ์šฐ, ๋‹ค์Œ์„ ์‚ฌ์šฉํ•  ํ•„์š”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค:

  • KandinskyV22PriorEmb2EmbPipeline๋กœ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ์™€ ์ด๋ฏธ์ง€์—์„œ ์ด๋ฏธ์ง€ ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

  • KandinskyV22ControlnetImg2ImgPipeline๋กœ ์ดˆ๊ธฐ ์ด๋ฏธ์ง€์™€ ์ด๋ฏธ์ง€ ์ž„๋ฒ ๋”ฉ์—์„œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

๐Ÿค— Transformers์—์„œ depth-estimation Pipeline์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ ์–‘์ด์˜ ์ดˆ๊ธฐ ์ด๋ฏธ์ง€์˜ depth map์„ ์ฒ˜๋ฆฌํ•ด ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค:

import torch import numpy as np from diffusers import KandinskyV22PriorEmb2EmbPipeline, KandinskyV22ControlnetImg2ImgPipeline from diffusers.utils import load_image from transformers import pipeline img = load_image( "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinskyv22/cat.png" ).resize((768, 768)) def make_hint(image, depth_estimator): image = depth_estimator(image)["depth"] image = np.array(image) image = image[:, :, None] image = np.concatenate([image, image, image], axis=2) detected_map = torch.from_numpy(image).float() / 255.0 hint = detected_map.permute(2, 0, 1) return hint depth_estimator = pipeline("depth-estimation") hint = make_hint(img, depth_estimator).unsqueeze(0).half().to("cuda")

Prior ํŒŒ์ดํ”„๋ผ์ธ๊ณผ KandinskyV22ControlnetImg2ImgPipeline์„ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค:

prior_pipeline = KandinskyV22PriorEmb2EmbPipeline.from_pretrained( "kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16, use_safetensors=True ).to("cuda") pipeline = KandinskyV22ControlnetImg2ImgPipeline.from_pretrained( "kandinsky-community/kandinsky-2-2-controlnet-depth", torch_dtype=torch.float16 ).to("cuda")

ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ์™€ ์ดˆ๊ธฐ ์ด๋ฏธ์ง€๋ฅผ ์ด์ „ ํŒŒ์ดํ”„๋ผ์ธ์— ์ „๋‹ฌํ•˜์—ฌ ์ด๋ฏธ์ง€ ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:

prompt = "A robot, 4k photo" negative_prior_prompt = "lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, username, watermark, signature" generator = torch.Generator(device="cuda").manual_seed(43) img_emb = prior_pipeline(prompt=prompt, image=img, strength=0.85, generator=generator) negative_emb = prior_pipeline(prompt=negative_prior_prompt, image=img, strength=1, generator=generator)

์ด์ œ KandinskyV22ControlnetImg2ImgPipeline์„ ์‹คํ–‰ํ•˜์—ฌ ์ดˆ๊ธฐ ์ด๋ฏธ์ง€์™€ ์ด๋ฏธ์ง€ ์ž„๋ฒ ๋”ฉ์œผ๋กœ๋ถ€ํ„ฐ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

image = pipeline(image=img, strength=0.5, image_embeds=img_emb.image_embeds, negative_image_embeds=negative_emb.image_embeds, hint=hint, num_inference_steps=50, generator=generator, height=768, width=768).images[0] make_image_grid([img.resize((512, 512)), image.resize((512, 512))], rows=1, cols=2)

์ตœ์ ํ™”

Kandinsky๋Š” mapping์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•œ prior ํŒŒ์ดํ”„๋ผ์ธ๊ณผ latents๋ฅผ ์ด๋ฏธ์ง€๋กœ ๋””์ฝ”๋”ฉํ•˜๊ธฐ ์œ„ํ•œ ๋‘ ๋ฒˆ์งธ ํŒŒ์ดํ”„๋ผ์ธ์ด ํ•„์š”ํ•˜๋‹ค๋Š” ์ ์—์„œ ๋…ํŠนํ•ฉ๋‹ˆ๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ ๊ณ„์‚ฐ์ด ๋‘ ๋ฒˆ์งธ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ์ด๋ฃจ์–ด์ง€๋ฏ€๋กœ ์ตœ์ ํ™”์˜ ๋…ธ๋ ฅ์€ ๋‘ ๋ฒˆ์งธ ํŒŒ์ดํ”„๋ผ์ธ์— ์ง‘์ค‘๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ์ถ”๋ก  ์ค‘ Kandinskyํ‚ค๋ฅผ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ ๋ช‡ ๊ฐ€์ง€ ํŒ์ž…๋‹ˆ๋‹ค.

  1. PyTorch < 2.0์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ xFormers์„ ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค.

from diffusers import DiffusionPipeline import torch pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16) + pipe.enable_xformers_memory_efficient_attention()
  1. PyTorch >= 2.0์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ torch.compile์„ ํ™œ์„ฑํ™”ํ•˜์—ฌ scaled dot-product attention (SDPA)๋ฅผ ์ž๋™์œผ๋กœ ์‚ฌ์šฉํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค:

pipe.unet.to(memory_format=torch.channels_last) + pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

์ด๋Š” attention processor๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ AttnAddedKVProcessor2_0์„ ์‚ฌ์šฉํ•˜๋„๋ก ์„ค์ •ํ•˜๋Š” ๊ฒƒ๊ณผ ๋™์ผํ•ฉ๋‹ˆ๋‹ค:

from diffusers.models.attention_processor import AttnAddedKVProcessor2_0 pipe.unet.set_attn_processor(AttnAddedKVProcessor2_0())
  1. ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ ์˜ค๋ฅ˜๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด enable_model_cpu_offload()๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ CPU๋กœ ์˜คํ”„๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค:

from diffusers import DiffusionPipeline import torch pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16) + pipe.enable_model_cpu_offload()
  1. ๊ธฐ๋ณธ์ ์œผ๋กœ text-to-image ํŒŒ์ดํ”„๋ผ์ธ์€ DDIMScheduler๋ฅผ ์‚ฌ์šฉํ•˜์ง€๋งŒ, DDPMScheduler์™€ ๊ฐ™์€ ๋‹ค๋ฅธ ์Šค์ผ€์ค„๋Ÿฌ๋กœ ๋Œ€์ฒดํ•˜์—ฌ ์ถ”๋ก  ์†๋„์™€ ์ด๋ฏธ์ง€ ํ’ˆ์งˆ ๊ฐ„์˜ ๊ท ํ˜•์— ์–ด๋–ค ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

from diffusers import DDPMScheduler from diffusers import DiffusionPipeline scheduler = DDPMScheduler.from_pretrained("kandinsky-community/kandinsky-2-1", subfolder="ddpm_scheduler") pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", scheduler=scheduler, torch_dtype=torch.float16, use_safetensors=True).to("cuda")