Open
Description
Outline & Motivation
The trainer uses a function optimizer_to_device
here:
pytorch-lightning/src/lightning/pytorch/strategies/strategy.py
Lines 160 to 161 in 631911c
In #19955 an issue was raised that the function moved the "step" parameter in the optimizer state to the CUDA device, causing device-to-host syncs during optimizer.step() because the "step" tensor was expected to remain on CPU. #20019 fixed this with special treatment of that key. However, good arguments were made in #19955 that this optimizer_to_device
shouldn't even be necessary in the first place (#19955 (comment)).
Pitch
Remove optimizer_to_device
and show that it is redundant by running the tests. We will still need a optimizer_to_cpu
for teardown.
Additional context
No response