Skip to content

Commit 56613f9

Browse files
patrickvonplatenPrathik Rao
authored andcommitted
[Docs] Advertise fp16 instead of autocast (huggingface#740)
up
1 parent 6a28ab3 commit 56613f9

File tree

9 files changed

+40
-38
lines changed

9 files changed

+40
-38
lines changed

README.md

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -74,11 +74,14 @@ You need to accept the model license before downloading or using the Stable Diff
7474

7575
### Text-to-Image generation with Stable Diffusion
7676

77+
We recommend using the model in [half-precision (`fp16`)](https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/) as it gives almost always the same results as full
78+
precision while being roughly twice as fast and requiring half the amount of GPU RAM.
79+
7780
```python
7881
# make sure you're logged in with `huggingface-cli login`
7982
from diffusers import StableDiffusionPipeline
8083

81-
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
84+
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_type=torch.float16, revision="fp16")
8285
pipe = pipe.to("cuda")
8386

8487
prompt = "a photo of an astronaut riding a horse on mars"
@@ -105,8 +108,8 @@ prompt = "a photo of an astronaut riding a horse on mars"
105108
image = pipe(prompt).images[0]
106109
```
107110

108-
If you are limited by GPU memory, you might want to consider using the model in `fp16` as
109-
well as chunking the attention computation.
111+
If you are limited by GPU memory, you might want to consider chunking the attention computation in addition
112+
to using `fp16`.
110113
The following snippet should result in less than 4GB VRAM.
111114

112115
```python
@@ -122,7 +125,7 @@ pipe.enable_attention_slicing()
122125
image = pipe(prompt).images[0]
123126
```
124127

125-
Finally, if you wish to use a different scheduler, you can simply instantiate
128+
If you wish to use a different scheduler, you can simply instantiate
126129
it before the pipeline and pass it to `from_pretrained`.
127130

128131
```python
@@ -148,6 +151,24 @@ image = pipe(prompt).images[0]
148151
image.save("astronaut_rides_horse.png")
149152
```
150153

154+
If you want to run Stable Diffusion on CPU or you want to have maximum precision on GPU,
155+
please run the model in the default *full-precision* setting:
156+
157+
```python
158+
# make sure you're logged in with `huggingface-cli login`
159+
from diffusers import StableDiffusionPipeline
160+
161+
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
162+
163+
# disable the following line if you run on CPU
164+
pipe = pipe.to("cuda")
165+
166+
prompt = "a photo of an astronaut riding a horse on mars"
167+
image = pipe(prompt).images[0]
168+
169+
image.save("astronaut_rides_horse.png")
170+
```
171+
151172
### Image-to-Image text-guided generation with Stable Diffusion
152173

153174
The `StableDiffusionImg2ImgPipeline` lets you pass a text prompt and an initial image to condition the generation of new images.

docs/source/api/pipelines/overview.mdx

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -98,15 +98,13 @@ logic including pre-processing, an unrolled diffusion loop, and post-processing
9898

9999
```python
100100
# make sure you're logged in with `huggingface-cli login`
101-
from torch import autocast
102101
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler
103102

104103
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
105104
pipe = pipe.to("cuda")
106105

107106
prompt = "a photo of an astronaut riding a horse on mars"
108-
with autocast("cuda"):
109-
image = pipe(prompt).images[0]
107+
image = pipe(prompt).images[0]
110108

111109
image.save("astronaut_rides_horse.png")
112110
```
@@ -116,7 +114,6 @@ image.save("astronaut_rides_horse.png")
116114
The `StableDiffusionImg2ImgPipeline` lets you pass a text prompt and an initial image to condition the generation of new images.
117115

118116
```python
119-
from torch import autocast
120117
import requests
121118
from PIL import Image
122119
from io import BytesIO
@@ -138,8 +135,7 @@ init_image = init_image.resize((768, 512))
138135

139136
prompt = "A fantasy landscape, trending on artstation"
140137

141-
with autocast("cuda"):
142-
images = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5).images
138+
images = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5).images
143139

144140
images[0].save("fantasy_landscape.png")
145141
```
@@ -157,7 +153,6 @@ The `StableDiffusionInpaintPipeline` lets you edit specific parts of an image by
157153
```python
158154
from io import BytesIO
159155

160-
from torch import autocast
161156
import requests
162157
import PIL
163158

@@ -181,8 +176,7 @@ pipe = StableDiffusionInpaintPipeline.from_pretrained(
181176
).to(device)
182177

183178
prompt = "a cat sitting on a bench"
184-
with autocast("cuda"):
185-
images = pipe(prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75).images
179+
images = pipe(prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75).images
186180

187181
images[0].save("cat_on_bench.png")
188182
```

docs/source/optimization/fp16.mdx

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,14 +68,18 @@ Despite the precision loss, in our experience the final image results look the s
6868

6969
## Half precision weights
7070

71-
To save more GPU memory, you can load the model weights directly in half precision. This involves loading the float16 version of the weights, which was saved to a branch named `fp16`, and telling PyTorch to use the `float16` type when loading them:
71+
To save more GPU memory and get even more speed, you can load and run the model weights directly in half precision. This involves loading the float16 version of the weights, which was saved to a branch named `fp16`, and telling PyTorch to use the `float16` type when loading them:
7272

7373
```Python
7474
pipe = StableDiffusionPipeline.from_pretrained(
7575
"CompVis/stable-diffusion-v1-4",
7676
revision="fp16",
7777
torch_dtype=torch.float16,
7878
)
79+
pipe = pipe.to("cuda")
80+
81+
prompt = "a photo of an astronaut riding a horse on mars"
82+
image = pipe(prompt).images[0]
7983
```
8084

8185
## Sliced attention for additional memory savings
@@ -101,8 +105,7 @@ pipe = pipe.to("cuda")
101105

102106
prompt = "a photo of an astronaut riding a horse on mars"
103107
pipe.enable_attention_slicing()
104-
with torch.autocast("cuda"):
105-
image = pipe(prompt).images[0]
108+
image = pipe(prompt).images[0]
106109
```
107110

108111
There's a small performance penalty of about 10% slower inference times, but this method allows you to use Stable Diffusion in as little as 3.2 GB of VRAM!

docs/source/training/text_inversion.mdx

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -109,16 +109,14 @@ A full training run takes ~1 hour on one V100 GPU.
109109
Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `placeholder_token` in your prompt.
110110

111111
```python
112-
from torch import autocast
113112
from diffusers import StableDiffusionPipeline
114113

115114
model_id = "path-to-your-trained-model"
116115
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
117116

118117
prompt = "A <cat-toy> backpack"
119118

120-
with autocast("cuda"):
121-
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
119+
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
122120

123121
image.save("cat-backpack.png")
124122
```

docs/source/using-diffusers/img2img.mdx

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@ specific language governing permissions and limitations under the License.
1515
The [`StableDiffusionImg2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images.
1616

1717
```python
18-
from torch import autocast
1918
import requests
2019
from PIL import Image
2120
from io import BytesIO
@@ -37,8 +36,7 @@ init_image = init_image.resize((768, 512))
3736

3837
prompt = "A fantasy landscape, trending on artstation"
3938

40-
with autocast("cuda"):
41-
images = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5).images
39+
images = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5).images
4240

4341
images[0].save("fantasy_landscape.png")
4442
```

docs/source/using-diffusers/inpaint.mdx

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@ The [`StableDiffusionInpaintPipeline`] lets you edit specific parts of an image
1717
```python
1818
from io import BytesIO
1919

20-
from torch import autocast
2120
import requests
2221
import PIL
2322

@@ -41,8 +40,7 @@ pipe = StableDiffusionInpaintPipeline.from_pretrained(
4140
).to(device)
4241

4342
prompt = "a cat sitting on a bench"
44-
with autocast("cuda"):
45-
images = pipe(prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75).images
43+
images = pipe(prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75).images
4644

4745
images[0].save("cat_on_bench.png")
4846
```

examples/dreambooth/README.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -125,18 +125,14 @@ accelerate launch train_dreambooth.py \
125125
Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `identifier`(e.g. sks in above example) in your prompt.
126126

127127
```python
128-
129-
from torch import autocast
130128
from diffusers import StableDiffusionPipeline
131129
import torch
132130

133131
model_id = "path-to-your-trained-model"
134132
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
135133

136134
prompt = "A photo of sks dog in a bucket"
137-
138-
with autocast("cuda"):
139-
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
135+
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
140136

141137
image.save("dog-bucket.png")
142138
```

examples/dreambooth/train_dreambooth.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
import argparse
22
import math
33
import os
4-
from contextlib import nullcontext
54
from pathlib import Path
65
from typing import Optional
76

@@ -346,12 +345,10 @@ def main():
346345
sample_dataloader = accelerator.prepare(sample_dataloader)
347346
pipeline.to(accelerator.device)
348347

349-
context = torch.autocast("cuda") if accelerator.device.type == "cuda" else nullcontext
350348
for example in tqdm(
351349
sample_dataloader, desc="Generating class images", disable=not accelerator.is_local_main_process
352350
):
353-
with context:
354-
images = pipeline(example["prompt"]).images
351+
images = pipeline(example["prompt"]).images
355352

356353
for i, image in enumerate(images):
357354
image.save(class_images_dir / f"{example['index'][i] + cur_class_images}.jpg")

examples/textual_inversion/README.md

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -74,17 +74,14 @@ A full training run takes ~1 hour on one V100 GPU.
7474
Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `placeholder_token` in your prompt.
7575

7676
```python
77-
78-
from torch import autocast
7977
from diffusers import StableDiffusionPipeline
8078

8179
model_id = "path-to-your-trained-model"
8280
pipe = StableDiffusionPipeline.from_pretrained(model_id,torch_dtype=torch.float16).to("cuda")
8381

8482
prompt = "A <cat-toy> backpack"
8583

86-
with autocast("cuda"):
87-
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
84+
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
8885

8986
image.save("cat-backpack.png")
9087
```

0 commit comments

Comments
 (0)