Skip to content

[DC-AE] Add the official Deep Compression Autoencoder code(32x,64x,128x compression ratio); #9708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 101 commits into from
Dec 6, 2024

Conversation

lawrence-cj
Copy link
Contributor

@lawrence-cj lawrence-cj commented Oct 18, 2024

What does this PR do?

This PR will add the official DC-AE (Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models) into the diffusers lib. DC-AE first makes the Autoencoder is able to compress images into 32x, 64x, and 128x latent space without performance degradation. It's also an AE used by the powerful T2I base model SANA

Paper: https://arxiv.org/abs/2410.10733v1
Original code repo: https://github.com/mit-han-lab/efficientvit/tree/master/applications/dc_ae

Core contributor of DC-AE:
work with @chenjy2003

Core library:

We want to collaborate on this PR together with friends from HF. Feel free to contact me here. Cc: @sayakpaul

@sayakpaul sayakpaul marked this pull request as draft October 18, 2024 10:27
@Abhinay1997
Copy link
Contributor

Looking forward to this @lawrence-cj!

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments. But to make progress on this PR:

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two comments.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work.

I have left some comments but let's wait for @yiyixuxu's comments, as well before making any changes.

Yiyi, this autoencoder is going to be crucial to support efficient models like SANA: https://arxiv.org/abs/2410.10629 (which will land after this PR).

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR!
I left some comments to start with, let me know if you have any questions:)

@chenjy2003
Copy link
Contributor

@lawrence-cj @a-r-r-o-w @yiyixuxu @sayakpaul I have double checked this PR and made some minor modifications. And I will upload the converted weights soon. Could you please check if the modifications still meet the requirements of diffusers? If so, I think this PR is ready to merge. Thank you all for the efforts!

@a-r-r-o-w
Copy link
Member

@chenjy2003 Thanks, the changes look great and the outputs are still the same! I simplified those branches since none of the current checkpoints seemed to use them, but still good to have. Will merge this PR once you give us the go regarding the diffusers-format checkpoints

@chenjy2003
Copy link
Contributor

@@ -92,6 +97,7 @@
"double_blocks.0.img_attn.norm.key_norm.scale",
"model.diffusion_model.double_blocks.0.img_attn.norm.key_norm.scale",
],
"autoencoder_dc": "decoder.stages.0.op_list.0.main.conv.conv.weight",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to infer the model repo type using this key right? That still has to be added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, missed it. Adding now, but not sure how this worked before then 🤔

@@ -2198,3 +2204,250 @@ def swap_scale_shift(weight):
)

return converted_state_dict


def create_autoencoder_dc_config_from_original(original_config, checkpoint, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for new single file models let's not rely on the original configs anymore. This was for legacy support for the SD1.5/XL models with yaml configs. It's better to infer the diffusers config from the checkpoint and use that for loading.

Copy link
Member

@a-r-r-o-w a-r-r-o-w Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a little difficult here, so please lmk if you have any suggestions on what to do.

Some DCAE checkpoints have the exact same structure and configuration, except for scaling_factor. For example, dc-ae-f128c512-in-1.0-diffusers and dc-ae-f128c512-mix-1.0-diffusers` only differ in their scaling factor.

I'm unsure how we would determine this just by the model structure. Do we rely on the user passing it as a config correctly, and document this info somewhere?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine since in the snippet in the docs, we're doing the same thing just with original_config instead of config right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated usage to config now and verified that it works. Thank you for the fixes and suggestions!

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding docs!

@a-r-r-o-w
Copy link
Member

@lawrence-cj @chenjy2003 We have removed support for loading original-format Autoencoder because of some complications in this PR. @DN6 will take it up soon to add support correctly. Sorry for the delay! Just doing some final cleanup and will merge after

@a-r-r-o-w a-r-r-o-w merged commit cd89204 into huggingface:main Dec 6, 2024
15 checks passed
sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
…8x compression ratio); (#9708)

* first add a script for DC-AE;

* DC-AE init

* replace triton with custom implementation

* 1. rename file and remove un-used codes;

* no longer rely on omegaconf and dataclass

* replace custom activation with diffuers activation

* remove dc_ae attention in attention_processor.py

* iinherit from ModelMixin

* inherit from ConfigMixin

* dc-ae reduce to one file

* update downsample and upsample

* clean code

* support DecoderOutput

* remove get_same_padding and val2tuple

* remove autocast and some assert

* update ResBlock

* remove contents within super().__init__

* Update src/diffusers/models/autoencoders/dc_ae.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove opsequential

* update other blocks to support the removal of build_norm

* remove build encoder/decoder project in/out

* remove inheritance of RMSNorm2d from LayerNorm

* remove reset_parameters for RMSNorm2d

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove device and dtype in RMSNorm2d __init__

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/dc_ae.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/dc_ae.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/dc_ae.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove op_list & build_block

* remove build_stage_main

* change file name to autoencoder_dc

* move LiteMLA to attention.py

* align with other vae decode output;

* add DC-AE into init files;

* update

* make quality && make style;

* quick push before dgx disappears again

* update

* make style

* update

* update

* fix

* refactor

* refactor

* refactor

* update

* possibly change to nn.Linear

* refactor

* make fix-copies

* replace vae with ae

* replace get_block_from_block_type to get_block

* replace downsample_block_type from Conv to conv for consistency

* add scaling factors

* incorporate changes for all checkpoints

* make style

* move mla to attention processor file; split qkv conv to linears

* refactor

* add tests

* from original file loader

* add docs

* add standard autoencoder methods

* combine attention processor

* fix tests

* update

* minor fix

* minor fix

* minor fix & in/out shortcut rename

* minor fix

* make style

* fix paper link

* update docs

* update single file loading

* make style

* remove single file loading support; todo for DN6

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add abstract

---------

Co-authored-by: Junyu Chen <chenjydl2003@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: chenjy2003 <70215701+chenjy2003@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
roadmap Add to current release roadmap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants