|
1 |
| -<img src="./images/birds.png" width="500px"></img> |
| 1 | +# DALL-E in Pytorch |
2 | 2 |
|
3 |
| -** current best, trained by <a href="https://github.com/kobiso">Kobiso</a> ** |
| 3 | +<p align='center'> |
| 4 | + <a href="https://colab.research.google.com/gist/afiaka87/b29213684a1dd633df20cab49d05209d/train_dalle_pytorch.ipynb"> |
| 5 | + <img alt="Train DALL-E w/ DeepSpeed" src="https://colab.research.google.com/assets/colab-badge.svg"> |
| 6 | + </a> |
| 7 | + <a href="https://discord.gg/dall-e"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=discord&logoColor=white"></a></br> |
| 8 | + <a href="https://github.com/robvanvolt/DALLE-models">Released DALLE Models</a></br> |
| 9 | + <a href="https://github.com/rom1504/dalle-service">Web-Hostable DALLE Checkpoints</a></br> |
4 | 10 |
|
5 |
| -## DALL-E in Pytorch |
| 11 | + <a href="https://www.youtube.com/watch?v=j4xgkjWlfL4">Yannic Kilcher's video</a> |
| 12 | +<p> |
| 13 | +Implementation / replication of <a href="https://openai.com/blog/dall-e/">DALL-E</a> (<a href="https://arxiv.org/abs/2102.12092">paper</a>), OpenAI's Text to Image Transformer, in Pytorch. It will also contain <a href="https://openai.com/blog/clip/">CLIP</a> for ranking the generations. |
6 | 14 |
|
7 |
| -Implementation / replication of <a href="https://openai.com/blog/dall-e/">DALL-E</a> (<a href="https://arxiv.org/abs/2102.12092">paper</a>), OpenAI's Text to Image Transformer, in Pytorch. It will also contain <a href="https://openai.com/blog/clip/">CLIP</a> for ranking the generations. |
| 15 | +--- |
8 | 16 |
|
9 |
| -<a href="https://github.com/sdtblck">Sid</a>, <a href="http://github.com/kingoflolz">Ben</a>, and <a href="https://github.com/AranKomat">Aran</a> over at <a href="https://www.eleuther.ai/">Eleuther AI</a> are working on <a href="https://github.com/EleutherAI/DALLE-mtf">DALL-E for Mesh Tensorflow</a>! Please lend them a hand if you would like to see DALL-E trained on TPUs. |
10 | 17 |
|
11 |
| -<a href="https://www.youtube.com/watch?v=j4xgkjWlfL4">Yannic Kilcher's video</a> |
12 | 18 |
|
13 |
| -Before we replicate this, we can settle for <a href="https://github.com/lucidrains/deep-daze">Deep Daze</a> or <a href="https://github.com/lucidrains/big-sleep">Big Sleep</a> |
| 19 | +[Quick Start](https://github.com/lucidrains/DALLE-pytorch/wiki) |
14 | 20 |
|
15 |
| -[](https://colab.research.google.com/drive/1dWvA54k4fH8zAmiix3VXbg95uEIMfqQM?usp=sharing) Train in Colab |
| 21 | +<a href="https://github.com/lucidrains/deep-daze">Deep Daze</a> or <a href="https://github.com/lucidrains/big-sleep">Big Sleep</a> are great alternatives! |
16 | 22 |
|
17 | 23 | ## Status
|
| 24 | +<p align='center'> |
18 | 25 |
|
19 | 26 | - <a href="https://github.com/htoyryla">Hannu</a> has managed to train a small 6 layer DALL-E on a dataset of just 2000 landscape images! (2048 visual tokens)
|
20 | 27 |
|
21 | 28 | <img src="./images/landscape.png"></img>
|
22 | 29 |
|
23 | 30 | - <a href="https://github.com/kobiso">Kobiso</a>, a research engineer from Naver, has trained on the CUB200 dataset <a href="https://github.com/lucidrains/DALLE-pytorch/discussions/131">here</a>, using full and deepspeed sparse attention
|
24 |
| -- <a href="https://github.com/afiaka87">afiaka87</a> has managed one epoch using a 32 layer reversible DALL-E <a href="https://github.com/lucidrains/DALLE-pytorch/issues/86#issue-832121328">here</a> |
25 |
| -- <a href="https://github.com/robvanvolt">robvanvolt</a> has started a <a href="https://discord.gg/UhR4kKCSp6">Discord channel</a> for replication efforts |
26 |
| - |
27 |
| -- <a href="https://github.com/robvanvolt">TheodoreGalanos</a> has trained on 150k layouts with the following results |
28 | 31 |
|
29 |
| -<img src="./images/layouts-1.jpg" width="400px"></img> |
| 32 | +<img src="./images/birds.png" width="256"></img> |
30 | 33 |
|
31 |
| -<img src="./images/layouts-2.jpg" width="400px"></img> |
| 34 | +- (3/15/21) <a href="https://github.com/afiaka87">afiaka87</a> has managed one epoch using a reversible DALL-E and the dVaE <a href="https://github.com/lucidrains/DALLE-pytorch/issues/86#issue-832121328">here</a> |
32 | 35 |
|
| 36 | +- <a href="https://github.com/robvanvolt">TheodoreGalanos</a> has trained on 150k layouts with the following results |
| 37 | +<p> |
| 38 | + <img src="./images/layouts-1.jpg" width="256"></img> |
| 39 | + <img src="./images/layouts-2.jpg" width="256"></img> |
| 40 | +</p> |
33 | 41 | - <a href="https://github.com/rom1504">Rom1504</a> has trained on 50k fashion images with captions with a really small DALL-E (2 layers) for just 24 hours with the following results
|
34 |
| - |
35 |
| -<img src="./images/clothing.png" width="500px"></img> |
36 |
| - |
| 42 | +<p/> |
| 43 | +<img src="./images/clothing.png" width="420"></img> |
| 44 | + |
| 45 | +- <a href="https://github.com/afiaka87">afiaka87</a> trained for 6 epochs on the same dataset as before thanks to the efficient 16k VQGAN with the following <a href="https://github.com/lucidrains/DALLE-pytorch/discussions/322>discussion">results</a> |
| 46 | + |
| 47 | +<p align='centered'> |
| 48 | + <img src="https://user-images.githubusercontent.com/3994972/123564891-b6f18780-d780-11eb-9019-8a1b6178f861.png" width="420" alt-text='a photo of westwood park, san francisco, from the water in the afternoon'></img> |
| 49 | + <img src="https://user-images.githubusercontent.com/3994972/123564776-4c404c00-d780-11eb-9c8e-3356df358df3.png" width="420" alt-text='a female mannequin dressed in an olive button-down shirt and gold palazzo pants'> </img> |
| 50 | +</p> |
| 51 | + |
| 52 | +Thanks to the amazing "mega b#6696" you can generate from this checkpoint in colab - |
| 53 | +<a href="https://colab.research.google.com/drive/11V2xw1eLPfZvzW8UQyTUhqCEU71w6Pr4?usp=sharing"> |
| 54 | + <img alt="Run inference on the Afiaka checkpoint in Colab" src="https://colab.research.google.com/assets/colab-badge.svg"> |
| 55 | +</a> |
37 | 56 | ## Install
|
38 | 57 |
|
39 | 58 | ```bash
|
|
0 commit comments