Skip to content

Initialize CI for code quality and testing #256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 26, 2022
Merged

Initialize CI for code quality and testing #256

merged 11 commits into from
Aug 26, 2022

Conversation

anton-l
Copy link
Member

@anton-l anton-l commented Aug 26, 2022

This will enable the following workflows:

  • pr_quality.yml for PRs and pushes to main. Just runs make quality commands for now, without repo consistency checks.
  • pr_tests.yml for PRs. Runs all non-slow tests on CPU. Ideally these should all be green.
  • push_tests.yml for pushes to main. Runs all tests on GPU. These won't be green for a while, as we're fixing tests at the moment. Known bug: doesn't work for StableDiffusion tests yet, ideally needs a service account on the Hub to get the key for checkpoints.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 26, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yey!! Thanks a lot for adding this :)
LGTM

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great let's merge it!

name: Diffusers tests
runs-on: [ self-hosted, docker-gpu ]
container:
image: nvcr.io/nvidia/pytorch:22.07-py3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder whether it might be easier to here just use pip install torch? Like in https://github.com/huggingface/transformers/blob/62ceb4d661ce644ee9377ac8053cbb9afa737125/.circleci/config.yml#L126

Copy link
Contributor

@patrickvonplaten patrickvonplaten Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe even easier we define a pip install . [dev] where we could easily add 'jax' in the future?
https://github.com/huggingface/transformers/blob/62ceb4d661ce644ee9377ac8053cbb9afa737125/.github/workflows/add-model-like.yml#L37

It's also arguably a bit easier to debug, e.g. I could just ssh into it, do pip installs instead of having to pull a docker and then having to work in the docker shell

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But no expert here, just wonder if nvidia/pytorch docker is the best environment to test pytorch (+ potentially jax in the future) for CPU

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe I'd set up a new CPU machine (they're cheap) so that we can easier disentangle CPU and GPU?
Overall I'm no testing expert here, so just a couple of thoughts - feel free to proceed however you like best (and how it's most efficient for you to debug things going forward)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, let's start with a bare minimum "python:3.7" image and set up the frameworks on top for better environment control.
Ideally we'll need a custom image with torch and flax preinstalled (to avoid waiting for them to install on each run), but for now installing them from scratch is good enough.

Will test it now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I take that back, our cloud network speeds are crazy fast, will use python:3.7 for CPU tests from now on 🎉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess main here means the opened PR want to be merge into main?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct

machine_type: [ single-gpu ]
runs-on: [ self-hosted, docker-gpu, '${{ matrix.machine_type }}' ]
container:
image: nvcr.io/nvidia/pytorch:22.07-py3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know which PyTorch version this is? Couldn't really find it here: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyTorch Version 1.13.0a0+08820cb
Looks like all nvidia images are built from sources, so there are no stable versions. Although I can downgrade to pytorch:22.05-py3 with PyTorch Version 1.12.0a0+8a1a93a

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Yeah PyTorch 1.13 is nightly-build or from source so I think PyTorch 1.12 would be the better choice here. In Transformers we automatically update the docker, but I think this is a bit too much here for now. Maybe let's better just do it manually in the beginning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In transformers, it's always the latest stable release PyTorch, unlesee it breaks something (and we pin the previous one for a while).

anton-l and others added 2 commits August 26, 2022 16:09
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Copy link
Contributor

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few tiny comment/question.

name: Diffusers tests
runs-on: [ self-hosted, docker-gpu ]
container:
image: nvcr.io/nvidia/pytorch:22.07-py3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess main here means the opened PR want to be merge into main?

@ydshieh
Copy link
Contributor

ydshieh commented Aug 26, 2022

One thing that could save your time is to print some package versions. We have something like below in transformers

Python version: 3.8.10 (default, Jun 22 2022, 20:[18](https://github.com/huggingface/transformers/runs/8008380417?check_suite_focus=true#step:6:19):18) 
[GCC 9.4.0]
transformers version: 4.[22](https://github.com/huggingface/transformers/runs/8008380417?check_suite_focus=true#step:6:23).0.dev0
Torch version: 1.12.1+cu113
Cuda available: True
Cuda version: 11.3
CuDNN version: 8302
Number of GPUs available: 1
NCCL version: (2, 10, 3)
DeepSpeed version: None
TensorFlow version: 2.9.1
TF GPUs available: True
Number of TF GPUs available: 1

@anton-l
Copy link
Member Author

anton-l commented Aug 26, 2022

Addressed the review suggestions, merging this version to test further

@anton-l anton-l merged commit 11133dc into main Aug 26, 2022
@patrickvonplaten patrickvonplaten deleted the init-ci branch August 26, 2022 18:40
natolambert pushed a commit that referenced this pull request Sep 7, 2022
* Init CI

* clarify cpu

* style

* Check scripts quality too

* Drop smi for cpu tests

* Run PR tests on cpu docker envs

* Update .github/workflows/push_tests.yml

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Try minimal python container

* Print env, install stable GPU torch

* Manual torch install

* remove deprecated platform.dist()

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* Init CI

* clarify cpu

* style

* Check scripts quality too

* Drop smi for cpu tests

* Run PR tests on cpu docker envs

* Update .github/workflows/push_tests.yml

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Try minimal python container

* Print env, install stable GPU torch

* Manual torch install

* remove deprecated platform.dist()

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants