Skip to content

[Feature request] Compatibility between zero3 and pretrain_mm_mlp_adapter #1878

Open
@ZarkPanda

Description

@ZarkPanda

feature

when using --deepspeed zero3.json and --pretrain_mm_mlp_adapter at the same time, the code now doesn't support.
For the weights has already been shard, the load_state_dict in the function initialize_vision_modules doesn't work anymore.

Command:

--pretrain_mm_mlp_adapter

Log:

the size from the checkpoints is torch.tensors[4096, 4096], dismatches torch.tensors[0]

Perhaps you can add the code in the function initialize_vision_modules like:
with deepspeed.zero.GatheredParameters(
list(self.mm_projector.parameters()), modifier_rank=0):
if dist.get_rank() == 0:

This works for me. You can further verify.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions