[Core] refactor model loading

Currently, we have got two codepaths:

1. For non-sharded checkpoints we do: https://github.com/huggingface/diffusers/blob/047bf492914ddc9393070b8f73bba5ad5823eb29/src/diffusers/models/modeling_utils.py#L855
2. For sharded checkpoints we do: https://github.com/huggingface/diffusers/blob/047bf492914ddc9393070b8f73bba5ad5823eb29/src/diffusers/models/modeling_utils.py#L886

And then for the (bnb) quantized checkpoints, we merge a sharded checkpoint:
https://github.com/huggingface/diffusers/blob/047bf492914ddc9393070b8f73bba5ad5823eb29/src/diffusers/models/modeling_utils.py#L775

Essentially, we shouldn't have to merge sharded checkpoints even if it's quantized. 

This will also allow us to more generally use `keep_module_in_fp32` for sharded checkpoints. Currently, we have this logic for casting a model (which is [tested](https://github.com/huggingface/diffusers/blob/43534a8d1fd405fd0d1e74f991ab97f743bd3e59/tests/quantization/bnb/test_4bit.py#L190) thoroughly):
https://github.com/huggingface/diffusers/blob/43534a8d1fd405fd0d1e74f991ab97f743bd3e59/src/diffusers/models/modeling_utils.py#L997

When using `load_model_dict_into_meta()`, we do consider `keep_module_in_fp32`:
https://github.com/huggingface/diffusers/blob/43534a8d1fd405fd0d1e74f991ab97f743bd3e59/src/diffusers/models/model_loading_utils.py#L177

But since for sharded checkpoints, we use `load_checkpoint_and_dispatch()`, there is no way to pass `keep_module_in_fp32`:
https://huggingface.co/docs/accelerate/main/en/package_reference/big_modeling#accelerate.load_checkpoint_and_dispatch


As discussed with @SunMarc, it's better to uniformize this so that we don't have to maintain two different codepaths and rely completely on `load_model_dict_into_meta()`. Marc has kindly agreed to open a PR to attempt this (this could be done in a series of PRs if needed). But I will join if any help is needed. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core] refactor model loading #10013

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Core] refactor model loading #10013

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions