Skip to content

Remove the optimizer_to_device logic if possible #20165

Open
@awaelchli

Description

@awaelchli

Outline & Motivation

The trainer uses a function optimizer_to_device here:

if trainer.state.fn == TrainerFn.FITTING:
_optimizers_to_device(self.optimizers, self.root_device)

In #19955 an issue was raised that the function moved the "step" parameter in the optimizer state to the CUDA device, causing device-to-host syncs during optimizer.step() because the "step" tensor was expected to remain on CPU. #20019 fixed this with special treatment of that key. However, good arguments were made in #19955 that this optimizer_to_device shouldn't even be necessary in the first place (#19955 (comment)).

Pitch

Remove optimizer_to_device and show that it is redundant by running the tests. We will still need a optimizer_to_cpu for teardown.

Additional context

No response

cc @justusschock @awaelchli @Borda

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions