You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/model_doc/vit_mae.md
+5-14Lines changed: 5 additions & 14 deletions
Original file line number
Diff line number
Diff line change
@@ -28,6 +28,9 @@ rendered properly in your Markdown viewer.
28
28
29
29
[ViTMAE](https://huggingface.co/papers/2111.06377) is a self-supervised vision model that is pretrained by masking large portions of an image (~75%). An encoder processes the visible image patches and a decoder reconstructs the missing pixels from the encoded patches and mask tokens. After pretraining, the encoder can be reused for downstream tasks like image classification or object detection — often outperforming models trained with supervised learning.
inputs = {k: v.to("cuda") for k, v in inputs.items()}
56
56
57
57
model = ViTMAEForPreTraining.from_pretrained("facebook/vit-mae-base", attn_implementation="sdpa").to("cuda")
58
58
with torch.no_grad():
@@ -61,22 +61,13 @@ with torch.no_grad():
61
61
reconstruction = outputs.logits
62
62
```
63
63
64
-
</hfoption>
65
-
<hfoptionid="transformers-cli">
66
-
67
-
<!-- This model is not currently supported via transformers-cli. -->
68
-
69
64
</hfoption>
70
65
</hfoptions>
71
66
72
67
## Notes
73
68
- ViTMAE is typically used in two stages. Self-supervised pretraining with [`ViTMAEForPreTraining`], and then discarding the decoder and fine-tuning the encoder. After fine-tuning, the weights can be plugged into a model like [`ViTForImageClassification`].
74
69
- Use [`ViTImageProcessor`] for input preparation.
75
70
76
-
```python
77
-
from transformers import ViTMAEModel
78
-
model = ViTMAEModel.from_pretrained("facebook/vit-mae-base", attn_implementation="sdpa", torch_dtype=torch.float16)
79
-
...
80
71
## Resources
81
72
82
73
- Refer to this [notebook](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/ViTMAE/ViT_MAE_visualization_demo.ipynb) to learn how to visualize the reconstructed pixels from [`ViTMAEForPreTraining`].
0 commit comments