Initialize CI for code quality and testing #256

anton-l · 2022-08-26T12:21:02Z

This will enable the following workflows:

pr_quality.yml for PRs and pushes to main. Just runs make quality commands for now, without repo consistency checks.
pr_tests.yml for PRs. Runs all non-slow tests on CPU. Ideally these should all be green.
push_tests.yml for pushes to main. Runs all tests on GPU. These won't be green for a while, as we're fixing tests at the moment. Known bug: doesn't work for StableDiffusion tests yet, ideally needs a service account on the Hub to get the key for checkpoints.

HuggingFaceDocBuilderDev · 2022-08-26T12:24:45Z

The documentation is not available anymore as the PR was closed or merged.

patil-suraj

Yey!! Thanks a lot for adding this :)
LGTM

.github/workflows/pr_quality.yml

.github/workflows/pr_tests.yml

patrickvonplaten

Great let's merge it!

patrickvonplaten · 2022-08-26T13:44:30Z

.github/workflows/pr_tests.yml

+    name: Diffusers tests
+    runs-on: [ self-hosted, docker-gpu ]
+    container:
+      image: nvcr.io/nvidia/pytorch:22.07-py3


wonder whether it might be easier to here just use pip install torch? Like in https://github.com/huggingface/transformers/blob/62ceb4d661ce644ee9377ac8053cbb9afa737125/.circleci/config.yml#L126

Or maybe even easier we define a pip install . [dev] where we could easily add 'jax' in the future?
https://github.com/huggingface/transformers/blob/62ceb4d661ce644ee9377ac8053cbb9afa737125/.github/workflows/add-model-like.yml#L37

It's also arguably a bit easier to debug, e.g. I could just ssh into it, do pip installs instead of having to pull a docker and then having to work in the docker shell

But no expert here, just wonder if nvidia/pytorch docker is the best environment to test pytorch (+ potentially jax in the future) for CPU

Also maybe I'd set up a new CPU machine (they're cheap) so that we can easier disentangle CPU and GPU?
Overall I'm no testing expert here, so just a couple of thoughts - feel free to proceed however you like best (and how it's most efficient for you to debug things going forward)

Agreed, let's start with a bare minimum "python:3.7" image and set up the frameworks on top for better environment control.
Ideally we'll need a custom image with torch and flax preinstalled (to avoid waiting for them to install on each run), but for now installing them from scratch is good enough.

Will test it now

Ok, I take that back, our cloud network speeds are crazy fast, will use python:3.7 for CPU tests from now on 🎉

I guess main here means the opened PR want to be merge into main?

Yes, correct

.github/workflows/push_tests.yml

patrickvonplaten · 2022-08-26T13:54:07Z

.github/workflows/push_tests.yml

+        machine_type: [ single-gpu ]
+    runs-on: [ self-hosted, docker-gpu, '${{ matrix.machine_type }}' ]
+    container:
+      image: nvcr.io/nvidia/pytorch:22.07-py3


Do you know which PyTorch version this is? Couldn't really find it here: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch

PyTorch Version 1.13.0a0+08820cb
Looks like all nvidia images are built from sources, so there are no stable versions. Although I can downgrade to pytorch:22.05-py3 with PyTorch Version 1.12.0a0+8a1a93a

I think Yeah PyTorch 1.13 is nightly-build or from source so I think PyTorch 1.12 would be the better choice here. In Transformers we automatically update the docker, but I think this is a bit too much here for now. Maybe let's better just do it manually in the beginning.

In transformers, it's always the latest stable release PyTorch, unlesee it breaks something (and we pin the previous one for a while).

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

ydshieh

Just a few tiny comment/question.

.github/workflows/push_tests.yml

.github/workflows/pr_tests.yml

ydshieh · 2022-08-26T14:24:59Z

.github/workflows/pr_tests.yml

+    name: Diffusers tests
+    runs-on: [ self-hosted, docker-gpu ]
+    container:
+      image: nvcr.io/nvidia/pytorch:22.07-py3


I guess main here means the opened PR want to be merge into main?

ydshieh · 2022-08-26T14:29:34Z

One thing that could save your time is to print some package versions. We have something like below in transformers

Python version: 3.8.10 (default, Jun 22 2022, 20:[18](https://github.com/huggingface/transformers/runs/8008380417?check_suite_focus=true#step:6:19):18) 
[GCC 9.4.0]
transformers version: 4.[22](https://github.com/huggingface/transformers/runs/8008380417?check_suite_focus=true#step:6:23).0.dev0
Torch version: 1.12.1+cu113
Cuda available: True
Cuda version: 11.3
CuDNN version: 8302
Number of GPUs available: 1
NCCL version: (2, 10, 3)
DeepSpeed version: None
TensorFlow version: 2.9.1
TF GPUs available: True
Number of TF GPUs available: 1

anton-l · 2022-08-26T15:34:52Z

Addressed the review suggestions, merging this version to test further

* Init CI * clarify cpu * style * Check scripts quality too * Drop smi for cpu tests * Run PR tests on cpu docker envs * Update .github/workflows/push_tests.yml Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Try minimal python container * Print env, install stable GPU torch * Manual torch install * remove deprecated platform.dist() Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

anton-l added 2 commits August 26, 2022 14:15

Init CI

610a65e

clarify cpu

c03d666

anton-l requested review from patrickvonplaten and patil-suraj August 26, 2022 12:21

style

2cb3a34

patil-suraj approved these changes Aug 26, 2022

View reviewed changes

.github/workflows/pr_quality.yml Outdated Show resolved Hide resolved

anton-l added 2 commits August 26, 2022 14:47

Check scripts quality too

4033c0f

Drop smi for cpu tests

422d26a