Skip to content

sync head #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Apr 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
2c59af7
Raise warning and round down if Wan num_frames is not 4k + 1 (#11167)
a-r-r-o-w Mar 31, 2025
eb50def
[Docs] Fix environment variables in `installation.md` (#11179)
remarkablemark Mar 31, 2025
d6f4774
Add `latents_mean` and `latents_std` to `SDXLLongPromptWeightingPipel…
hlky Mar 31, 2025
e8fc8b1
Bug fix in LTXImageToVideoPipeline.prepare_latents() when latents is …
kakukakujirori Mar 31, 2025
5a6edac
[tests] no hard-coded cuda (#11186)
faaany Apr 1, 2025
df1d7b0
[WIP] Add Wan Video2Video (#11053)
DN6 Apr 1, 2025
a7f07c1
map BACKEND_RESET_MAX_MEMORY_ALLOCATED to reset_peak_memory_stats on …
yao-matrix Apr 2, 2025
4d5a96e
fix autocast (#11190)
jiqing-feng Apr 2, 2025
be0b7f5
fix: for checking mandatory and optional pipeline components (#11189)
elismasilva Apr 2, 2025
fe2b397
remove unnecessary call to `F.pad` (#10620)
bm-synth Apr 2, 2025
d8c617c
allow models to run with a user-provided dtype map instead of a singl…
hlky Apr 2, 2025
52b460f
[tests] HunyuanDiTControlNetPipeline inference precision issue on XPU…
faaany Apr 2, 2025
da857be
Revert `save_model` in ModelMixin save_pretrained and use safe_serial…
hlky Apr 2, 2025
e5c6027
[docs] `torch_dtype` map (#11194)
hlky Apr 2, 2025
54dac3a
Fix enable_sequential_cpu_offload in CogView4Pipeline (#11195)
hlky Apr 2, 2025
78c2fdc
SchedulerMixin from_pretrained and ConfigMixin Self type annotation (…
hlky Apr 2, 2025
b0ff822
Update import_utils.py (#10329)
Lakshaysharma048 Apr 2, 2025
c97b709
Add CacheMixin to Wan and LTX Transformers (#11187)
DN6 Apr 2, 2025
c4646a3
feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline (…
elismasilva Apr 2, 2025
d9023a6
[Model Card] standardize advanced diffusion training sdxl lora (#7615)
chiral-carbon Apr 3, 2025
480510a
Change KolorsPipeline LoRA Loader to StableDiffusion (#11198)
BasileLewan Apr 3, 2025
6edb774
Update Style Bot workflow (#11202)
hanouticelina Apr 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 0 additions & 34 deletions .github/workflows/pr_style_bot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,39 +13,5 @@ jobs:
uses: huggingface/huggingface_hub/.github/workflows/style-bot-action.yml@main
with:
python_quality_dependencies: "[quality]"
pre_commit_script_name: "Download and Compare files from the main branch"
pre_commit_script: |
echo "Downloading the files from the main branch"

curl -o main_Makefile https://rg.gosu.cc/huggingface/diffusers/main/Makefile
curl -o main_setup.py https://rg.gosu.cc/huggingface/diffusers/refs/heads/main/setup.py
curl -o main_check_doc_toc.py https://rg.gosu.cc/huggingface/diffusers/refs/heads/main/utils/check_doc_toc.py

echo "Compare the files and raise error if needed"

diff_failed=0
if ! diff -q main_Makefile Makefile; then
echo "Error: The Makefile has changed. Please ensure it matches the main branch."
diff_failed=1
fi

if ! diff -q main_setup.py setup.py; then
echo "Error: The setup.py has changed. Please ensure it matches the main branch."
diff_failed=1
fi

if ! diff -q main_check_doc_toc.py utils/check_doc_toc.py; then
echo "Error: The utils/check_doc_toc.py has changed. Please ensure it matches the main branch."
diff_failed=1
fi

if [ $diff_failed -eq 1 ]; then
echo "❌ Error happened as we detected changes in the files that should not be changed ❌"
exit 1
fi

echo "No changes in the files. Proceeding..."
rm -rf main_Makefile main_setup.py main_check_doc_toc.py
style_command: "make style && make quality"
secrets:
bot_token: ${{ secrets.GITHUB_TOKEN }}
48 changes: 44 additions & 4 deletions docs/source/en/api/pipelines/wan.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,46 @@ output = pipe(
export_to_video(output, "wan-i2v.mp4", fps=16)
```

### Video to Video Generation

```python
import torch
from diffusers.utils import load_video, export_to_video
from diffusers import AutoencoderKLWan, WanVideoToVideoPipeline, UniPCMultistepScheduler

# Available models: Wan-AI/Wan2.1-T2V-14B-Diffusers, Wan-AI/Wan2.1-T2V-1.3B-Diffusers
model_id = "Wan-AI/Wan2.1-T2V-1.3B-Diffusers"
vae = AutoencoderKLWan.from_pretrained(
model_id, subfolder="vae", torch_dtype=torch.float32
)
pipe = WanVideoToVideoPipeline.from_pretrained(
model_id, vae=vae, torch_dtype=torch.bfloat16
)
flow_shift = 3.0 # 5.0 for 720P, 3.0 for 480P
pipe.scheduler = UniPCMultistepScheduler.from_config(
pipe.scheduler.config, flow_shift=flow_shift
)
# change to pipe.to("cuda") if you have sufficient VRAM
pipe.enable_model_cpu_offload()

prompt = "A robot standing on a mountain top. The sun is setting in the background"
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
video = load_video(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/hiker.mp4"
)
output = pipe(
video=video,
prompt=prompt,
negative_prompt=negative_prompt,
height=480,
width=512,
guidance_scale=7.0,
strength=0.7,
).frames[0]

export_to_video(output, "wan-v2v.mp4", fps=16)
```

## Memory Optimizations for Wan 2.1

Base inference with the large 14B Wan 2.1 models can take up to 35GB of VRAM when generating videos at 720p resolution. We'll outline a few memory optimizations we can apply to reduce the VRAM required to run the model.
Expand Down Expand Up @@ -323,7 +363,7 @@ import numpy as np
from diffusers import AutoencoderKLWan, WanTransformer3DModel, WanImageToVideoPipeline
from diffusers.hooks.group_offloading import apply_group_offloading
from diffusers.utils import export_to_video, load_image
from transformers import UMT5EncoderModel, CLIPVisionMode
from transformers import UMT5EncoderModel, CLIPVisionModel

model_id = "Wan-AI/Wan2.1-I2V-14B-720P-Diffusers"
image_encoder = CLIPVisionModel.from_pretrained(
Expand Down Expand Up @@ -356,7 +396,7 @@ prompt = (
"An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in "
"the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
)
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
num_frames = 33

output = pipe(
Expand All @@ -372,7 +412,7 @@ output = pipe(
export_to_video(output, "wan-i2v.mp4", fps=16)
```

### Using a Custom Scheduler
## Using a Custom Scheduler

Wan can be used with many different schedulers, each with their own benefits regarding speed and generation quality. By default, Wan uses the `UniPCMultistepScheduler(prediction_type="flow_prediction", use_flow_sigmas=True, flow_shift=3.0)` scheduler. You can use a different scheduler as follows:

Expand Down Expand Up @@ -403,7 +443,7 @@ transformer = WanTransformer3DModel.from_single_file(ckpt_path, torch_dtype=torc
pipe = WanPipeline.from_pretrained("Wan-AI/Wan2.1-T2V-1.3B-Diffusers", transformer=transformer)
```

## Recommendations for Inference:
## Recommendations for Inference
- Keep `AutencoderKLWan` in `torch.float32` for better decoding quality.
- `num_frames` should satisfy the following constraint: `(num_frames - 1) % 4 == 0`
- For smaller resolution videos, try lower values of `shift` (between `2.0` to `5.0`) in the [Scheduler](https://huggingface.co/docs/diffusers/main/en/api/schedulers/flow_match_euler_discrete#diffusers.FlowMatchEulerDiscreteScheduler.shift). For larger resolution videos, try higher values (between `7.0` and `12.0`). The default value is `3.0` for Wan.
Expand Down
12 changes: 7 additions & 5 deletions docs/source/en/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,10 +161,10 @@ Your Python environment will find the `main` version of 🤗 Diffusers on the ne

Model weights and files are downloaded from the Hub to a cache which is usually your home directory. You can change the cache location by specifying the `HF_HOME` or `HUGGINFACE_HUB_CACHE` environment variables or configuring the `cache_dir` parameter in methods like [`~DiffusionPipeline.from_pretrained`].

Cached files allow you to run 🤗 Diffusers offline. To prevent 🤗 Diffusers from connecting to the internet, set the `HF_HUB_OFFLINE` environment variable to `True` and 🤗 Diffusers will only load previously downloaded files in the cache.
Cached files allow you to run 🤗 Diffusers offline. To prevent 🤗 Diffusers from connecting to the internet, set the `HF_HUB_OFFLINE` environment variable to `1` and 🤗 Diffusers will only load previously downloaded files in the cache.

```shell
export HF_HUB_OFFLINE=True
export HF_HUB_OFFLINE=1
```

For more details about managing and cleaning the cache, take a look at the [caching](https://huggingface.co/docs/huggingface_hub/guides/manage-cache) guide.
Expand All @@ -179,14 +179,16 @@ Telemetry is only sent when loading models and pipelines from the Hub,
and it is not collected if you're loading local files.

We understand that not everyone wants to share additional information,and we respect your privacy.
You can disable telemetry collection by setting the `DISABLE_TELEMETRY` environment variable from your terminal:
You can disable telemetry collection by setting the `HF_HUB_DISABLE_TELEMETRY` environment variable from your terminal:

On Linux/MacOS:

```bash
export DISABLE_TELEMETRY=YES
export HF_HUB_DISABLE_TELEMETRY=1
```

On Windows:

```bash
set DISABLE_TELEMETRY=YES
set HF_HUB_DISABLE_TELEMETRY=1
```
17 changes: 17 additions & 0 deletions docs/source/en/using-diffusers/loading.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,23 @@ Use the Space below to gauge a pipeline's memory requirements before you downloa
></iframe>
</div>

### Specifying Component-Specific Data Types

You can customize the data types for individual sub-models by passing a dictionary to the `torch_dtype` parameter. This allows you to load different components of a pipeline in different floating point precisions. For instance, if you want to load the transformer with `torch.bfloat16` and all other components with `torch.float16`, you can pass a dictionary mapping:

```python
from diffusers import HunyuanVideoPipeline
import torch

pipe = HunyuanVideoPipeline.from_pretrained(
"hunyuanvideo-community/HunyuanVideo",
torch_dtype={'transformer': torch.bfloat16, 'default': torch.float16},
)
print(pipe.transformer.dtype, pipe.vae.dtype) # (torch.bfloat16, torch.float16)
```

If a component is not explicitly specified in the dictionary and no `default` is provided, it will be loaded with `torch.float32`.

### Local pipeline

To load a pipeline locally, use [git-lfs](https://git-lfs.github.com/) to manually download a checkpoint to your local disk.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@
convert_unet_state_dict_to_peft,
is_wandb_available,
)
from diffusers.utils.hub_utils import load_or_create_model_card, populate_model_card
from diffusers.utils.import_utils import is_xformers_available
from diffusers.utils.torch_utils import is_compiled_module

Expand Down Expand Up @@ -101,7 +102,7 @@ def determine_scheduler_type(pretrained_model_name_or_path, revision):
def save_model_card(
repo_id: str,
use_dora: bool,
images=None,
images: list = None,
base_model: str = None,
train_text_encoder=False,
train_text_encoder_ti=False,
Expand All @@ -111,20 +112,17 @@ def save_model_card(
repo_folder=None,
vae_path=None,
):
img_str = "widget:\n"
lora = "lora" if not use_dora else "dora"
for i, image in enumerate(images):
image.save(os.path.join(repo_folder, f"image_{i}.png"))
img_str += f"""
- text: '{validation_prompt if validation_prompt else ' ' }'
output:
url:
"image_{i}.png"
"""
if not images:
img_str += f"""
- text: '{instance_prompt}'
"""

widget_dict = []
if images is not None:
for i, image in enumerate(images):
image.save(os.path.join(repo_folder, f"image_{i}.png"))
widget_dict.append(
{"text": validation_prompt if validation_prompt else " ", "output": {"url": f"image_{i}.png"}}
)
else:
widget_dict.append({"text": instance_prompt})
embeddings_filename = f"{repo_folder}_emb"
instance_prompt_webui = re.sub(r"<s\d+>", "", re.sub(r"<s\d+>", embeddings_filename, instance_prompt, count=1))
ti_keys = ", ".join(f'"{match}"' for match in re.findall(r"<s\d+>", instance_prompt))
Expand Down Expand Up @@ -169,23 +167,7 @@ def save_model_card(
to trigger concept `{key}` → use `{tokens}` in your prompt \n
"""

yaml = f"""---
tags:
- stable-diffusion-xl
- stable-diffusion-xl-diffusers
- diffusers-training
- text-to-image
- diffusers
- {lora}
- template:sd-lora
{img_str}
base_model: {base_model}
instance_prompt: {instance_prompt}
license: openrail++
---
"""

model_card = f"""
model_description = f"""
# SDXL LoRA DreamBooth - {repo_id}

<Gallery />
Expand Down Expand Up @@ -234,8 +216,25 @@ def save_model_card(

{license}
"""
with open(os.path.join(repo_folder, "README.md"), "w") as f:
f.write(yaml + model_card)
model_card = load_or_create_model_card(
repo_id_or_path=repo_id,
from_training=True,
license="openrail++",
base_model=base_model,
prompt=instance_prompt,
model_description=model_description,
widget=widget_dict,
)
tags = [
"text-to-image",
"stable-diffusion-xl",
"stable-diffusion-xl-diffusers",
"text-to-image",
"diffusers",
lora,
"template:sd-lora",
]
model_card = populate_model_card(model_card, tags=tags)


def log_validation(
Expand Down
102 changes: 101 additions & 1 deletion examples/community/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ PIXART-α Controlnet pipeline | Implementation of the controlnet model for pixar
| Stable Diffusion XL Attentive Eraser Pipeline |[[AAAI2025 Oral] Attentive Eraser](https://github.com/Anonym0u3/AttentiveEraser) is a novel tuning-free method that enhances object removal capabilities in pre-trained diffusion models.|[Stable Diffusion XL Attentive Eraser Pipeline](#stable-diffusion-xl-attentive-eraser-pipeline)|-|[Wenhao Sun](https://github.com/Anonym0u3) and [Benlei Cui](https://github.com/Benny079)|
| Perturbed-Attention Guidance |StableDiffusionPAGPipeline is a modification of StableDiffusionPipeline to support Perturbed-Attention Guidance (PAG).|[Perturbed-Attention Guidance](#perturbed-attention-guidance)|[Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/perturbed_attention_guidance.ipynb)|[Hyoungwon Cho](https://github.com/HyoungwonCho)|
| CogVideoX DDIM Inversion Pipeline | Implementation of DDIM inversion and guided attention-based editing denoising process on CogVideoX. | [CogVideoX DDIM Inversion Pipeline](#cogvideox-ddim-inversion-pipeline) | - | [LittleNyima](https://github.com/LittleNyima) |

| FaithDiff Stable Diffusion XL Pipeline | Implementation of [(CVPR 2025) FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolutionUnleashing Diffusion Priors for Faithful Image Super-resolution](https://arxiv.org/abs/2411.18824) - FaithDiff is a faithful image super-resolution method that leverages latent diffusion models by actively adapting the diffusion prior and jointly fine-tuning its components (encoder and diffusion model) with an alignment module to ensure high fidelity and structural consistency. | [FaithDiff Stable Diffusion XL Pipeline](#faithdiff-stable-diffusion-xl-pipeline) | [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/jychen9811/FaithDiff) | [Junyang Chen, Jinshan Pan, Jiangxin Dong, IMAG Lab, (Adapted by Eliseu Silva)](https://github.com/JyChen9811/FaithDiff) |
To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly.

```py
Expand Down Expand Up @@ -5334,3 +5334,103 @@ output = pipeline_for_inversion(
pipeline.export_latents_to_video(output.inverse_latents[-1], "path/to/inverse_video.mp4", fps=8)
pipeline.export_latents_to_video(output.recon_latents[-1], "path/to/recon_video.mp4", fps=8)
```
# FaithDiff Stable Diffusion XL Pipeline

[Project](https://jychen9811.github.io/FaithDiff_page/) / [GitHub](https://github.com/JyChen9811/FaithDiff/)

This the implementation of the FaithDiff pipeline for SDXL, adapted to use the HuggingFace Diffusers.

For more details see the project links above.

## Example Usage

This example upscale and restores a low-quality image. The input image has a resolution of 512x512 and will be upscaled at a scale of 2x, to a final resolution of 1024x1024. It is possible to upscale to a larger scale, but it is recommended that the input image be at least 1024x1024 in these cases. To upscale this image by 4x, for example, it would be recommended to re-input the result into a new 2x processing, thus performing progressive scaling.

````py
import random
import numpy as np
import torch
from diffusers import DiffusionPipeline, AutoencoderKL, UniPCMultistepScheduler
from huggingface_hub import hf_hub_download
from diffusers.utils import load_image
from PIL import Image

device = "cuda"
dtype = torch.float16
MAX_SEED = np.iinfo(np.int32).max

# Download weights for additional unet layers
model_file = hf_hub_download(
"jychen9811/FaithDiff",
filename="FaithDiff.bin", local_dir="./proc_data/faithdiff", local_dir_use_symlinks=False
)

# Initialize the models and pipeline
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=dtype)

model_id = "SG161222/RealVisXL_V4.0"
pipe = DiffusionPipeline.from_pretrained(
model_id,
torch_dtype=dtype,
vae=vae,
unet=None, #<- Do not load with original model.
custom_pipeline="pipeline_faithdiff_stable_diffusion_xl",
use_safetensors=True,
variant="fp16",
).to(device)

# Here we need use pipeline internal unet model
pipe.unet = pipe.unet_model.from_pretrained(model_id, subfolder="unet", variant="fp16", use_safetensors=True)

# Load aditional layers to the model
pipe.unet.load_additional_layers(weight_path="proc_data/faithdiff/FaithDiff.bin", dtype=dtype)

# Enable vae tiling
pipe.set_encoder_tile_settings()
pipe.enable_vae_tiling()

# Optimization
pipe.enable_model_cpu_offload()

# Set selected scheduler
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

#input params
prompt = "The image features a woman in her 55s with blonde hair and a white shirt, smiling at the camera. She appears to be in a good mood and is wearing a white scarf around her neck. "
upscale = 2 # scale here
start_point = "lr" # or "noise"
latent_tiled_overlap = 0.5
latent_tiled_size = 1024

# Load image
lq_image = load_image("https://huggingface.co/datasets/DEVAIEXP/assets/resolve/main/woman.png")
original_height = lq_image.height
original_width = lq_image.width
print(f"Current resolution: H:{original_height} x W:{original_width}")

width = original_width * int(upscale)
height = original_height * int(upscale)
print(f"Final resolution: H:{height} x W:{width}")

# Restoration
image = lq_image.resize((width, height), Image.LANCZOS)
input_image, width_init, height_init, width_now, height_now = pipe.check_image_size(image)

generator = torch.Generator(device=device).manual_seed(random.randint(0, MAX_SEED))
gen_image = pipe(lr_img=input_image,
prompt = prompt,
num_inference_steps=20,
guidance_scale=5,
generator=generator,
start_point=start_point,
height = height_now,
width=width_now,
overlap=latent_tiled_overlap,
target_size=(latent_tiled_size, latent_tiled_size)
).images[0]

cropped_image = gen_image.crop((0, 0, width_init, height_init))
cropped_image.save("data/result.png")
````
### Result
[<img src="https://huggingface.co/datasets/DEVAIEXP/assets/resolve/main/faithdiff_restored.PNG" width="512px" height="512px"/>](https://imgsli.com/MzY1NzE2)
Loading