Skip to content

Commit 9fd034d

Browse files
yuanjuastevhliu
andauthored
Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
1 parent e812935 commit 9fd034d

File tree

1 file changed

+8
-10
lines changed

1 file changed

+8
-10
lines changed

docs/source/en/model_doc/mobilenet_v2.md

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -81,25 +81,23 @@ print(f"The predicted class label is: {predicted_class_label}")
8181
</hfoption>
8282
</hfoptions>
8383

84-
<!-- Quantization - Not applicable -->
85-
<!-- Attention Visualization - Not applicable for this model type -->
8684

8785
## Notes
8886

89-
- **Checkpoint Naming:** Classification checkpoints often follow `mobilenet_v2_{depth_multiplier}_{resolution}`, like `mobilenet_v2_1.4_224`. Segmentation checkpoints (using DeepLabV3+ head) might have names like `deeplabv3_mobilenet_v2_{depth_multiplier}_{resolution}`.
90-
- **Variable Input Size:** Like V1, the model works with images of different sizes (minimum 32x32), handled by [`MobileNetV2ImageProcessor`].
91-
- **1001 Classes (Classification):** ImageNet-1k pretrained classification models output 1001 classes (index 0 is background).
87+
- Classification checkpoint names follow the pattern `mobilenet_v2_{depth_multiplier}_{resolution}`, like `mobilenet_v2_1.4_224`. `1.4` is the depth multiplier and `224` is the image resolution. Segmentation checkpoint names follow the pattern `deeplabv3_mobilenet_v2_{depth_multiplier}_{resolution}`.
88+
- While trained on images of a specific sizes, the model architecture works with images of different sizes (minimum 32x32). The [`MobileNetV2ImageProcessor`] handles the necessary preprocessing.
89+
- MobileNet is pretrained on [ImageNet-1k](https://huggingface.co/datasets/imagenet-1k), a dataset with 1000 classes. However, the model actually predicts 1001 classes. The additional class is an extra "background" class (index 0).
9290
- The segmentation models use a [DeepLabV3+](https://huggingface.co/papers/1802.02611) head which is often pretrained on datasets like [PASCAL VOC](https://huggingface.co/datasets/merve/pascal-voc).
93-
- **Padding Differences:** Similar to V1, original TensorFlow checkpoints had dynamic padding. The HF PyTorch implementation uses static padding by default. Enable dynamic padding (TF behavior) via `tf_padding=True` in [`MobileNetV2Config`].
91+
- The original TensorFlow checkpoints determines the padding amount at inference because it depends on the input image size. To use the native PyTorch padding behavior, set `tf_padding=False` in [`MobileNetV2Config`].
9492
```python
9593
from transformers import MobileNetV2Config
9694

97-
# Example: Load config with dynamic padding enabled
9895
config = MobileNetV2Config.from_pretrained("google/mobilenet_v2_1.4_224", tf_padding=True)
9996
```
100-
- **Unsupported Features:**
101-
- The HF implementation uses global average pooling, not the optional fixed 7x7 average pooling from the original paper.
102-
- Extracting specific intermediate hidden states (e.g., from expansion layers 10/13) requires `output_hidden_states=True` (returning all states).
97+
- The Transformers implementation does not support the following features.
98+
- Uses global average pooling instead of the optional 7x7 average pooling with stride 2. For larger inputs, this gives a pooled output that is larger than a 1x1 pixel.
99+
- `output_hidden_states=True` returns *all* intermediate hidden states. It is not possible to extract the output from specific layers for other downstream purposes.
100+
- Does not include the quantized models from the original checkpoints because they include "FakeQuantization" operations to unquantize the weights.
103101
- For segmentation models, the final convolution layer of the backbone is computed even though the DeepLabV3+ head doesn't use it.
104102

105103
## MobileNetV2Config

0 commit comments

Comments
 (0)