Fix nondeterministic tests for GPU runs #314

anton-l · 2022-09-01T12:08:11Z

This ensures that:

Stable Diffusion fast tests only run on CPU due to a device-dependent Generator
Training tests only run on CPU, to avoid setting extra determinism environment flags for CUBLAS and CUDA>10.2 which might affect other tests

HuggingFaceDocBuilderDev · 2022-09-01T12:11:35Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten · 2022-09-01T12:27:42Z

@anton-l what do you think about instead of skipping those tests when a GPU is available as is now simply forcing the device to be on CPU? This should be a bit safer no? Also I think we can be sure all GPUs have a CPU

anton-l · 2022-09-01T12:45:46Z

@patrickvonplaten if removing autocast is ok with you, then yes, that was my first thought too :D

patrickvonplaten · 2022-09-01T13:06:39Z

@patrickvonplaten if removing autocast is ok with you, then yes, that was my first thought too :D

Sure (sorry I copy-pasted that yesterday)

patrickvonplaten · 2022-09-01T13:06:46Z

Thanks for making the change!

patrickvonplaten

Thanks for fixing!

antoche · 2022-09-02T04:09:04Z

Just wading into this after the fact as I've hit similar issues in the past and this is affecting me too...

Unfortunately the device arg in the Generator is ignored by pytorch (see bug 62451), so just using a pre-seeded, cpu-bound generator is not enough to ensure determinism.

The following code allows you to get deterministic tests by enforcing random number generation to actually happen on the CPU, while still allowing model evaluation to happen on the GPU:

import torch

def randn(size, generator=None, device=None, **kwargs):
    """
    Wrapper around torch.randn providing proper reproducibility.

    Generation is done on the given generator's device, then moved to the
    given ``device``.

    Args:
        size: tensor size
        generator (torch.Generator): RNG generator
        device (torch.device): Target device for the resulting tensor
    """
    # FIXME: generator RNG device is ignored and needs to be passed to torch.randn (torch issue #62451)
    rng_device = generator.device if generator is not None else device
    image = torch.randn(size, generator=generator, device=rng_device, **kwargs)
    image = image.to(device=device)
    return image


def randn_like(tensor, generator=None, **kwargs):
    return randn(tensor.shape, layout=tensor.layout, generator=generator, device=tensor.device, **kwargs)

Replacing all uses of torch.randn and torch.randn_like with this code lets you get deterministic results when using a generator such as generator = torch.Generator(device='cpu').manual_seed(0), and you can keep device to "cuda" everywhere else.

patrickvonplaten · 2022-09-02T17:55:24Z

Just wading into this after the fact as I've hit similar issues in the past and this is affecting me too...

Unfortunately the device arg in the Generator is ignored by pytorch (see bug 62451), so just using a pre-seeded, cpu-bound generator is not enough to ensure determinism.

The following code allows you to get deterministic tests by enforcing random number generation to actually happen on the CPU, while still allowing model evaluation to happen on the GPU:
import torch

def randn(size, generator=None, device=None, **kwargs):
    """
    Wrapper around torch.randn providing proper reproducibility.

    Generation is done on the given generator's device, then moved to the
    given ``device``.

    Args:
        size: tensor size
        generator (torch.Generator): RNG generator
        device (torch.device): Target device for the resulting tensor
    """
    # FIXME: generator RNG device is ignored and needs to be passed to torch.randn (torch issue #62451)
    rng_device = generator.device if generator is not None else device
    image = torch.randn(size, generator=generator, device=rng_device, **kwargs)
    image = image.to(device=device)
    return image


def randn_like(tensor, generator=None, **kwargs):
    return randn(tensor.shape, layout=tensor.layout, generator=generator, device=tensor.device, **kwargs)
Replacing all uses of torch.randn and torch.randn_like with this code lets you get deterministic results when using a generator such as generator = torch.Generator(device='cpu').manual_seed(0), and you can keep device to "cuda" everywhere else.

Thanks - that looks very interesting! @anton-l - let's try it out no?

* Fix nondeterministic tests for GPU runs * force SD fast tests to the CPU

) This commit removes unused parameters in the v-diffusion model. It also updated the server parameters in order to make multiple requests to be handled sequentially. Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com> Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>

Fix nondeterministic tests for GPU runs

995802c

anton-l requested a review from patrickvonplaten September 1, 2022 12:08

anton-l requested review from patrickvonplaten and removed request for patrickvonplaten September 1, 2022 12:55

force SD fast tests to the CPU

8445bed

patrickvonplaten approved these changes Sep 1, 2022

View reviewed changes

anton-l merged commit 4724250 into main Sep 1, 2022

anton-l deleted the fix-nondeterm-tests branch September 2, 2022 14:10

natolambert pushed a commit that referenced this pull request Sep 7, 2022

Fix nondeterministic tests for GPU runs (#314)

305a1a1

* Fix nondeterministic tests for GPU runs * force SD fast tests to the CPU

brian6091 mentioned this pull request Nov 29, 2022

Non-reproducible inference with same seed brian6091/Dreambooth#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix nondeterministic tests for GPU runs #314

Fix nondeterministic tests for GPU runs #314

Uh oh!

anton-l commented Sep 1, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Sep 1, 2022 •

edited

Loading

Uh oh!

patrickvonplaten commented Sep 1, 2022

Uh oh!

anton-l commented Sep 1, 2022 •

edited

Loading

Uh oh!

patrickvonplaten commented Sep 1, 2022

Uh oh!

patrickvonplaten commented Sep 1, 2022

Uh oh!

patrickvonplaten left a comment

Uh oh!

antoche commented Sep 2, 2022

Uh oh!

patrickvonplaten commented Sep 2, 2022

Uh oh!

Uh oh!

Fix nondeterministic tests for GPU runs #314

Fix nondeterministic tests for GPU runs #314

Uh oh!

Conversation

anton-l commented Sep 1, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Sep 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten commented Sep 1, 2022

Uh oh!

anton-l commented Sep 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten commented Sep 1, 2022

Uh oh!

patrickvonplaten commented Sep 1, 2022

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

antoche commented Sep 2, 2022

Uh oh!

patrickvonplaten commented Sep 2, 2022

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Sep 1, 2022 •

edited

Loading

anton-l commented Sep 1, 2022 •

edited

Loading