You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Classification checkpoint names follow the pattern `mobilenet_v2_{depth_multiplier}_{resolution}`, like `mobilenet_v2_1.4_224`. `1.4` is the depth multiplier and `224` is the image resolution. Segmentation checkpoint names follow the pattern`deeplabv3_mobilenet_v2_{depth_multiplier}_{resolution}`.
88
+
-While trained on images of a specific sizes, the model architecture works with images of different sizes (minimum 32x32). The [`MobileNetV2ImageProcessor`] handles the necessary preprocessing.
89
+
-MobileNet is pretrained on [ImageNet-1k](https://huggingface.co/datasets/imagenet-1k), a dataset with 1000 classes. However, the model actually predicts 1001 classes. The additional class is an extra "background" class (index 0).
92
90
- The segmentation models use a [DeepLabV3+](https://huggingface.co/papers/1802.02611) head which is often pretrained on datasets like [PASCAL VOC](https://huggingface.co/datasets/merve/pascal-voc).
93
-
-**Padding Differences:** Similar to V1, original TensorFlow checkpoints had dynamic padding. The HF PyTorch implementation uses static padding by default. Enable dynamic padding (TF behavior) via`tf_padding=True` in [`MobileNetV2Config`].
91
+
-The original TensorFlow checkpoints determines the padding amount at inference because it depends on the input image size. To use the native PyTorch padding behavior, set`tf_padding=False` in [`MobileNetV2Config`].
94
92
```python
95
93
from transformers import MobileNetV2Config
96
94
97
-
# Example: Load config with dynamic padding enabled
- The HF implementation uses global average pooling, not the optional fixed 7x7 average pooling from the original paper.
102
-
- Extracting specific intermediate hidden states (e.g., from expansion layers 10/13) requires `output_hidden_states=True` (returning all states).
97
+
- The Transformers implementation does not support the following features.
98
+
- Uses global average pooling instead of the optional 7x7 average pooling with stride 2. For larger inputs, this gives a pooled output that is larger than a 1x1 pixel.
99
+
-`output_hidden_states=True` returns *all* intermediate hidden states. It isnot possible to extract the output from specific layers for other downstream purposes.
100
+
- Does not include the quantized models from the original checkpoints because they include "FakeQuantization" operations to unquantize the weights.
103
101
- For segmentation models, the final convolution layer of the backbone is computed even though the DeepLabV3+ head doesn't use it.
0 commit comments