Closed
Description
Describe the bug
There are a few Stable Diffusion 1.5 models that use a prediction type of v_prediction
rather than epsilon
. In version 0.27.0, StableDiffusionPipeline.from_single_file()
correctly detected and rendered images from such models. However, in version 0.30.0, these models are always treated as epsilon
, even when the correct prediction_type
and original_config
arguments are set.
Reproduction
You will need to download the original config file, EasyFluffV11.yaml into the current directory for this to work. After running, the file sushi.png
will show incorrect rendering.
from diffusers import StableDiffusionPipeline
import torch
model_id = 'https://huggingface.co/zatochu/EasyFluff/blob/main/EasyFluffV11.safetensors'
yaml_path = './EasyFluffV11.yaml'
pipe = StableDiffusionPipeline.from_single_file(model_id,
original_config=yaml_path,
prediction_type='v_prediction',
torch_dtype=torch.float16,
).to("cuda")
prompt = "banana sushi"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("sushi.png")
Logs
Fetching 11 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 7330.37it/s]
Loading pipeline components...: 0%| | 0/6 [00:00<?, ?it/s]Some weights of the model checkpoint were not used when initializing CLIPTextModel:
['text_model.embeddings.position_ids']
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 26.26it/s]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:01<00:00, 16.72it/s]
### System Info
- 🤗 Diffusers version: 0.30.0
- Platform: Linux-5.15.0-113-generic-x86_64-with-glibc2.35
- Running on Google Colab?: No
- Python version: 3.10.12
- PyTorch version (GPU?): 2.2.2+cu121 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.23.5
- Transformers version: 4.41.1
- Accelerate version: 0.31.0
- PEFT version: 0.11.1
- Bitsandbytes version: not installed
- Safetensors version: 0.4.3
- xFormers version: 0.0.25.post1
- Accelerator: NVIDIA GeForce RTX 4070, 12282 MiB
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
### Who can help?
@yiyixuxu @asomoza