Skip to content

[Docs] Advertise fp16 instead of autocast #740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 5, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 25 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,14 @@ You need to accept the model license before downloading or using the Stable Diff

### Text-to-Image generation with Stable Diffusion

We recommend using the model in [half-precision (`fp16`)](https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/) as it gives almost always the same results as full
precision while being roughly twice as fast and requiring half the amount of GPU RAM.

```python
# make sure you're logged in with `huggingface-cli login`
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_type=torch.float16, revision="fp16")
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
Expand All @@ -105,8 +108,8 @@ prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
```

If you are limited by GPU memory, you might want to consider using the model in `fp16` as
well as chunking the attention computation.
If you are limited by GPU memory, you might want to consider chunking the attention computation in addition
to using `fp16`.
The following snippet should result in less than 4GB VRAM.

```python
Expand All @@ -122,7 +125,7 @@ pipe.enable_attention_slicing()
image = pipe(prompt).images[0]
```

Finally, if you wish to use a different scheduler, you can simply instantiate
If you wish to use a different scheduler, you can simply instantiate
it before the pipeline and pass it to `from_pretrained`.

```python
Expand All @@ -148,6 +151,24 @@ image = pipe(prompt).images[0]
image.save("astronaut_rides_horse.png")
```

If you want to run Stable Diffusion on CPU or you want to have maximum precision on GPU,
please run the model in the default *full-precision* setting:

```python
# make sure you're logged in with `huggingface-cli login`
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")

# disable the following line if you run on CPU
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]

image.save("astronaut_rides_horse.png")
```

### Image-to-Image text-guided generation with Stable Diffusion

The `StableDiffusionImg2ImgPipeline` lets you pass a text prompt and an initial image to condition the generation of new images.
Expand Down
12 changes: 3 additions & 9 deletions docs/source/api/pipelines/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -98,15 +98,13 @@ logic including pre-processing, an unrolled diffusion loop, and post-processing

```python
# make sure you're logged in with `huggingface-cli login`
from torch import autocast
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
image = pipe(prompt).images[0]
image = pipe(prompt).images[0]

image.save("astronaut_rides_horse.png")
```
Expand All @@ -116,7 +114,6 @@ image.save("astronaut_rides_horse.png")
The `StableDiffusionImg2ImgPipeline` lets you pass a text prompt and an initial image to condition the generation of new images.

```python
from torch import autocast
import requests
from PIL import Image
from io import BytesIO
Expand All @@ -138,8 +135,7 @@ init_image = init_image.resize((768, 512))

prompt = "A fantasy landscape, trending on artstation"

with autocast("cuda"):
images = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5).images
images = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5).images

images[0].save("fantasy_landscape.png")
```
Expand All @@ -157,7 +153,6 @@ The `StableDiffusionInpaintPipeline` lets you edit specific parts of an image by
```python
from io import BytesIO

from torch import autocast
import requests
import PIL

Expand All @@ -181,8 +176,7 @@ pipe = StableDiffusionInpaintPipeline.from_pretrained(
).to(device)

prompt = "a cat sitting on a bench"
with autocast("cuda"):
images = pipe(prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75).images
images = pipe(prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75).images

images[0].save("cat_on_bench.png")
```
Expand Down
9 changes: 6 additions & 3 deletions docs/source/optimization/fp16.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -68,14 +68,18 @@ Despite the precision loss, in our experience the final image results look the s

## Half precision weights

To save more GPU memory, you can load the model weights directly in half precision. This involves loading the float16 version of the weights, which was saved to a branch named `fp16`, and telling PyTorch to use the `float16` type when loading them:
To save more GPU memory and get even more speed, you can load and run the model weights directly in half precision. This involves loading the float16 version of the weights, which was saved to a branch named `fp16`, and telling PyTorch to use the `float16` type when loading them:

```Python
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
revision="fp16",
torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
```

## Sliced attention for additional memory savings
Expand All @@ -101,8 +105,7 @@ pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
pipe.enable_attention_slicing()
with torch.autocast("cuda"):
image = pipe(prompt).images[0]
image = pipe(prompt).images[0]
```

There's a small performance penalty of about 10% slower inference times, but this method allows you to use Stable Diffusion in as little as 3.2 GB of VRAM!
Expand Down
4 changes: 1 addition & 3 deletions docs/source/training/text_inversion.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -109,16 +109,14 @@ A full training run takes ~1 hour on one V100 GPU.
Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `placeholder_token` in your prompt.

```python
from torch import autocast
from diffusers import StableDiffusionPipeline

model_id = "path-to-your-trained-model"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

prompt = "A <cat-toy> backpack"

with autocast("cuda"):
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]

image.save("cat-backpack.png")
```
4 changes: 1 addition & 3 deletions docs/source/using-diffusers/img2img.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ specific language governing permissions and limitations under the License.
The [`StableDiffusionImg2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images.

```python
from torch import autocast
import requests
from PIL import Image
from io import BytesIO
Expand All @@ -37,8 +36,7 @@ init_image = init_image.resize((768, 512))

prompt = "A fantasy landscape, trending on artstation"

with autocast("cuda"):
images = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5).images
images = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5).images

images[0].save("fantasy_landscape.png")
```
Expand Down
4 changes: 1 addition & 3 deletions docs/source/using-diffusers/inpaint.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ The [`StableDiffusionInpaintPipeline`] lets you edit specific parts of an image
```python
from io import BytesIO

from torch import autocast
import requests
import PIL

Expand All @@ -41,8 +40,7 @@ pipe = StableDiffusionInpaintPipeline.from_pretrained(
).to(device)

prompt = "a cat sitting on a bench"
with autocast("cuda"):
images = pipe(prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75).images
images = pipe(prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75).images

images[0].save("cat_on_bench.png")
```
Expand Down
6 changes: 1 addition & 5 deletions examples/dreambooth/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,18 +125,14 @@ accelerate launch train_dreambooth.py \
Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `identifier`(e.g. sks in above example) in your prompt.

```python

from torch import autocast
from diffusers import StableDiffusionPipeline
import torch

model_id = "path-to-your-trained-model"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

prompt = "A photo of sks dog in a bucket"

with autocast("cuda"):
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]

image.save("dog-bucket.png")
```
5 changes: 1 addition & 4 deletions examples/dreambooth/train_dreambooth.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import argparse
import math
import os
from contextlib import nullcontext
from pathlib import Path
from typing import Optional

Expand Down Expand Up @@ -346,12 +345,10 @@ def main():
sample_dataloader = accelerator.prepare(sample_dataloader)
pipeline.to(accelerator.device)

context = torch.autocast("cuda") if accelerator.device.type == "cuda" else nullcontext
for example in tqdm(
sample_dataloader, desc="Generating class images", disable=not accelerator.is_local_main_process
):
with context:
images = pipeline(example["prompt"]).images
images = pipeline(example["prompt"]).images

for i, image in enumerate(images):
image.save(class_images_dir / f"{example['index'][i] + cur_class_images}.jpg")
Expand Down
5 changes: 1 addition & 4 deletions examples/textual_inversion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,17 +74,14 @@ A full training run takes ~1 hour on one V100 GPU.
Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `placeholder_token` in your prompt.

```python

from torch import autocast
from diffusers import StableDiffusionPipeline

model_id = "path-to-your-trained-model"
pipe = StableDiffusionPipeline.from_pretrained(model_id,torch_dtype=torch.float16).to("cuda")

prompt = "A <cat-toy> backpack"

with autocast("cuda"):
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]

image.save("cat-backpack.png")
```