[Feature request] Compatibility between zero3 and pretrain_mm_mlp_adapter

### feature

when using --deepspeed zero3.json and --pretrain_mm_mlp_adapter at the same time, the code now doesn't support.
For the weights has already been shard, the load_state_dict in the function initialize_vision_modules doesn't work anymore.

Command:
```
--pretrain_mm_mlp_adapter
````

Log: 
```
the size from the checkpoints is torch.tensors[4096, 4096], dismatches torch.tensors[0]
```

Perhaps you can add the code in the function initialize_vision_modules like:
with deepspeed.zero.GatheredParameters(
        list(self.mm_projector.parameters()), modifier_rank=0): 
    if dist.get_rank() == 0:        

This works for me. You can further verify.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature request] Compatibility between zero3 and pretrain_mm_mlp_adapter #1878

feature

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature request] Compatibility between zero3 and pretrain_mm_mlp_adapter #1878

Description

feature

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions