Remove the `optimizer_to_device` logic if possible

### Outline & Motivation

The trainer uses a function `optimizer_to_device` here:
https://github.com/Lightning-AI/pytorch-lightning/blob/631911c00413ad028e2887d83eb264cb4822097e/src/lightning/pytorch/strategies/strategy.py#L160-L161

In #19955 an issue was raised that the function moved the "step" parameter in the optimizer state to the CUDA device, causing device-to-host syncs during optimizer.step() because the "step" tensor was expected to remain on CPU. #20019 fixed this with special treatment of that key. However, good arguments were made in #19955 that this `optimizer_to_device` shouldn't even be necessary in the first place (https://github.com/Lightning-AI/pytorch-lightning/issues/19955#issuecomment-2197353178).

### Pitch

Remove `optimizer_to_device` and show that it is redundant by running the tests. We will still need a `optimizer_to_cpu` for teardown. 

### Additional context

_No response_

cc @justusschock @awaelchli @borda

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove the `optimizer_to_device` logic if possible #20165

Outline & Motivation

Pitch

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	if trainer.state.fn == TrainerFn.FITTING:
	_optimizers_to_device(self.optimizers, self.root_device)

Remove the optimizer_to_device logic if possible #20165

Description

Outline & Motivation

Pitch

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Remove the `optimizer_to_device` logic if possible #20165