Fix EMA and make it compatible with deepspeed. #813

patil-suraj · 2022-10-12T15:07:23Z

There's an issue with current EMA in multi-gpu and deepspeed. This PR updates the EMAModel to only keep the parameters instead of copying the model which doesn't seem to work with deepspeed.

HuggingFaceDocBuilderDev · 2022-10-12T15:11:22Z

The documentation is not available anymore as the PR was closed or merged.

anton-l · 2022-10-12T15:43:10Z

examples/text_to_image/train_text_to_image.py

-        if device is not None:
-            self.averaged_model = self.averaged_model.to(device=device)
+        parameters = list(parameters)
+        self.shadow_params = [p.clone().detach() for p in parameters]


Never heard them being called shadow, but pretty creative :D

Pretty much copied it from here https://github.com/fadel/pytorch_ema/blob/master/torch_ema/ema.py#L14

anton-l

The logic looks good to me!
@patil-suraj make sure it works with fp16 (if it's an option) for the whole training run. Subtracting the decayed parameters might become numerically unstable when getting close to 0.9999

patil-suraj · 2022-10-12T15:49:27Z

Will do a run in fp16, but our fp16 is mixed precision, so params are always fp32

pink-red · 2022-10-12T15:53:22Z

Maybe this would be helpful. I've got DeepSpeed working on my 12 GB 3060 by changing this line (like in https://github.com/huggingface/diffusers/pull/735/files#diff-8702f762e46a3b5363085930b0b045de554909d32560864031ca7b12ddd349d5R555):

diff --git a/examples/text_to_image/train_text_to_image.py b/examples/text_to_image/train_text_to_image.py
index e4a91ff..4481951 100644
--- a/examples/text_to_image/train_text_to_image.py
+++ b/examples/text_to_image/train_text_to_image.py
@@ -566,7 +566,7 @@ def main():
 
                 # Predict the noise residual and compute loss
                 noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
-                loss = F.mse_loss(noise_pred, noise, reduction="mean")
+                loss = F.mse_loss(noise_pred.float(), noise.float(), reduction="mean")
 
                 # Gather the losses across all processes for logging (if we use distributed training).
                 avg_loss = accelerator.gather(loss.repeat(args.train_batch_size)).mean()

By "working" I mean that it doesn't throw an exception "Found dtype Float but expected Half" at this line:

diffusers/examples/text_to_image/train_text_to_image.py

Line 576 in 5afc2b6

accelerator.backward(loss)

I should say that I'm still training the model on the pokemon dataset, so I don't know what the actual result would be yet.

The command I've used for training is almost identical to the one in readme, I've only added accelerate launch params:

accelerate launch --use_deepspeed --zero_stage=2 --gradient_accumulation_steps=1 --offload_param_device=cpu --offload_optimizer_device=cpu train_text_to_image.py \
  --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" \
  --dataset_name="lambdalabs/pokemon-blip-captions" \
  --use_ema \
  --resolution=512 --center_crop --random_flip \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --gradient_checkpointing \
  --mixed_precision="fp16" \
  --max_train_steps=15000 \
  --learning_rate=1e-05 \
  --max_grad_norm=1 \
  --lr_scheduler="constant" --lr_warmup_steps=0 \
  --output_dir="sd-pokemon-model"

patil-suraj · 2022-10-12T15:56:31Z

Thanks a lot @pink-red , would you like to open a PR for that once this is merged. Indeed, casting it float is required for deepspeed.

pink-red · 2022-10-12T15:59:47Z

@patil-suraj No problem! 👌

patrickvonplaten · 2022-10-12T16:05:16Z

Referring my review here to @anton-l as he knows EMA much better :-)

patrickvonplaten · 2022-10-12T16:05:36Z

Also @patil-suraj let's maybe fix the code quality with make style no?

…ggingface#813) * fix ema * style * add comment about copy * style * quality

fix ema

f051f51

style

bbb7311

patil-suraj requested review from anton-l, patrickvonplaten and pcuenca October 12, 2022 15:27

anton-l reviewed Oct 12, 2022

View reviewed changes

add comment about copy

7145316

anton-l approved these changes Oct 12, 2022

View reviewed changes

patil-suraj added 2 commits October 12, 2022 18:56

style

c1f8b3b

quality

488c61a

patil-suraj merged commit 008b608 into main Oct 12, 2022

patil-suraj deleted the txt2imag2-fix-eam branch October 12, 2022 17:13

pink-red mentioned this pull request Oct 12, 2022

Fix fine-tuning compatibility with deepspeed #816

Merged

prathikr pushed a commit to prathikr/diffusers that referenced this pull request Oct 26, 2022

[train_text2image] Fix EMA and make it compatible with deepspeed. (hu…

9f8b13f

…ggingface#813) * fix ema * style * add comment about copy * style * quality

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix EMA and make it compatible with deepspeed. #813

Fix EMA and make it compatible with deepspeed. #813

Uh oh!

patil-suraj commented Oct 12, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Oct 12, 2022 •

edited

Loading

Uh oh!

anton-l Oct 12, 2022

Uh oh!

patil-suraj Oct 12, 2022

Uh oh!

anton-l left a comment

Uh oh!

patil-suraj commented Oct 12, 2022

Uh oh!

pink-red commented Oct 12, 2022

Uh oh!

patil-suraj commented Oct 12, 2022

Uh oh!

pink-red commented Oct 12, 2022

Uh oh!

patrickvonplaten commented Oct 12, 2022

Uh oh!

patrickvonplaten commented Oct 12, 2022

Uh oh!

Uh oh!

Fix EMA and make it compatible with deepspeed. #813

Fix EMA and make it compatible with deepspeed. #813

Uh oh!

Conversation

patil-suraj commented Oct 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Oct 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anton-l Oct 12, 2022

Choose a reason for hiding this comment

Uh oh!

patil-suraj Oct 12, 2022

Choose a reason for hiding this comment

Uh oh!

anton-l left a comment

Choose a reason for hiding this comment

Uh oh!

patil-suraj commented Oct 12, 2022

Uh oh!

pink-red commented Oct 12, 2022

Uh oh!

patil-suraj commented Oct 12, 2022

Uh oh!

pink-red commented Oct 12, 2022

Uh oh!

patrickvonplaten commented Oct 12, 2022

Uh oh!

patrickvonplaten commented Oct 12, 2022

Uh oh!

Uh oh!

patil-suraj commented Oct 12, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 12, 2022 •

edited

Loading