-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Initialize CI for code quality and testing #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yey!! Thanks a lot for adding this :)
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great let's merge it!
.github/workflows/pr_tests.yml
Outdated
name: Diffusers tests | ||
runs-on: [ self-hosted, docker-gpu ] | ||
container: | ||
image: nvcr.io/nvidia/pytorch:22.07-py3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonder whether it might be easier to here just use pip install torch
? Like in https://github.com/huggingface/transformers/blob/62ceb4d661ce644ee9377ac8053cbb9afa737125/.circleci/config.yml#L126
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe even easier we define a pip install . [dev]
where we could easily add 'jax' in the future?
https://github.com/huggingface/transformers/blob/62ceb4d661ce644ee9377ac8053cbb9afa737125/.github/workflows/add-model-like.yml#L37
It's also arguably a bit easier to debug, e.g. I could just ssh into it, do pip installs instead of having to pull a docker and then having to work in the docker shell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But no expert here, just wonder if nvidia/pytorch
docker is the best environment to test pytorch (+ potentially jax in the future) for CPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also maybe I'd set up a new CPU machine (they're cheap) so that we can easier disentangle CPU and GPU?
Overall I'm no testing expert here, so just a couple of thoughts - feel free to proceed however you like best (and how it's most efficient for you to debug things going forward)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let's start with a bare minimum "python:3.7" image and set up the frameworks on top for better environment control.
Ideally we'll need a custom image with torch and flax preinstalled (to avoid waiting for them to install on each run), but for now installing them from scratch is good enough.
Will test it now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I take that back, our cloud network speeds are crazy fast, will use python:3.7
for CPU tests from now on 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess main
here means the opened PR want to be merge into main
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct
machine_type: [ single-gpu ] | ||
runs-on: [ self-hosted, docker-gpu, '${{ matrix.machine_type }}' ] | ||
container: | ||
image: nvcr.io/nvidia/pytorch:22.07-py3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know which PyTorch version this is? Couldn't really find it here: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyTorch Version 1.13.0a0+08820cb
Looks like all nvidia images are built from sources, so there are no stable versions. Although I can downgrade to pytorch:22.05-py3
with PyTorch Version 1.12.0a0+8a1a93a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Yeah PyTorch 1.13 is nightly-build or from source so I think PyTorch 1.12 would be the better choice here. In Transformers we automatically update the docker, but I think this is a bit too much here for now. Maybe let's better just do it manually in the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In transformers
, it's always the latest stable release PyTorch, unlesee it breaks something (and we pin the previous one for a while).
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few tiny comment/question.
.github/workflows/pr_tests.yml
Outdated
name: Diffusers tests | ||
runs-on: [ self-hosted, docker-gpu ] | ||
container: | ||
image: nvcr.io/nvidia/pytorch:22.07-py3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess main
here means the opened PR want to be merge into main
?
One thing that could save your time is to print some package versions. We have something like below in
|
Addressed the review suggestions, merging this version to test further |
* Init CI * clarify cpu * style * Check scripts quality too * Drop smi for cpu tests * Run PR tests on cpu docker envs * Update .github/workflows/push_tests.yml Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Try minimal python container * Print env, install stable GPU torch * Manual torch install * remove deprecated platform.dist() Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Init CI * clarify cpu * style * Check scripts quality too * Drop smi for cpu tests * Run PR tests on cpu docker envs * Update .github/workflows/push_tests.yml Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Try minimal python container * Print env, install stable GPU torch * Manual torch install * remove deprecated platform.dist() Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
This will enable the following workflows:
pr_quality.yml
for PRs and pushes tomain
. Just runsmake quality
commands for now, without repo consistency checks.pr_tests.yml
for PRs. Runs all non-slow tests on CPU. Ideally these should all be green.push_tests.yml
for pushes tomain
. Runs all tests on GPU. These won't be green for a while, as we're fixing tests at the moment. Known bug: doesn't work for StableDiffusion tests yet, ideally needs a service account on the Hub to get the key for checkpoints.