added mllama doc #37647

Nikil-D-Gr8 · 2025-04-21T04:59:15Z

What does this PR do?
As suggested in this issue #issue-2947704577 - this PR updates the documentation of the CLIP model, which will now be aligned with the standardized format for all the docs.
Worked on mllama, Used AI , so please let me know even if u need a complete rewrite.

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).

Please let me know if there are any changes to be done, do share references if any for those changes
Documentation: @stevhliu

-->

github-actions · 2025-04-21T04:59:28Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

stevhliu

Awesome, thanks for your contribution!

Let's also add an example with the AttentionMaskVisualizer

docs/source/en/model_doc/mllama.md

stevhliu · 2025-04-21T19:50:00Z

docs/source/en/model_doc/mllama.md


-Mllama has an extra token used as a placeholder for image positions in the text. It means that input ids and an input embedding layer will have an extra token. But since the weights for input and output embeddings are not tied, the `lm_head` layer has one less token and will fail if you want to calculate loss on image tokens or apply some logit processors. In case you are training, make sure to mask out special `"<|image|>"` tokens in the `labels` as the model should not be trained on predicting them.
+```python
+from transformers import pipeline


Lets use a real image for the example here:

import torch from transformers import pipeline pipeline = pipeline( task="image-text-to-text", model="meta-llama/Llama-3.2-11B-Vision-Instruct", device=0, torch_dtype=torch.bfloat16 ) messages = [ [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"}, {"type": "text", "text": "What does the image show?"} ] } ], ] pipeline(text=messages, return_full_text=False)

docs/source/en/model_doc/mllama.md

stevhliu · 2025-04-21T19:54:56Z

docs/source/en/model_doc/mllama.md

 import torch
-from PIL import Image
-from transformers import MllamaForConditionalGeneration, AutoProcessor
+from transformers import AutoModelForCausalLM


Use the BitsAndBytesConfig

import torch from transformers import BitsAndBytesConfig, MllamaForConditionalGeneration, AutoProcessor bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ) model = MllamaForConditionalGeneration.from_pretrained( "meta-llama/Llama-3.2-11B-Vision-Instruct", device_map="auto", torch_dtype=torch.bfloat16, attn_implementation="sdpa", quantization_config=bnb_config ) processor = AutoProcessor.from_pretrained("meta-llama/Llama-3.2-11B-Vision-Instruct") messages = [ [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"}, {"type": "text", "text": "What does the image show?"} ] } ], ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt" ).to("cuda") output = model.generate(**inputs, max_new_tokens=25) print(processor.decode(output[0]))

docs/source/en/model_doc/mllama.md

stevhliu · 2025-04-21T20:03:32Z

docs/source/en/model_doc/mllama.md

+- When training, mask out the `<|image|>` tokens in labels
+- For CUDA index errors during generation, expand the `lm_head`:
+
+```python


Indent this code block so it falls under the last list item

Remember to indent here!

Still not indented

stevhliu · 2025-04-21T20:03:43Z

docs/source/en/model_doc/mllama.md

-## MllamaForCausalLM

-[[autodoc]] MllamaForCausalLM
-    - forward
-
-## MllamaVisionModel
-
-[[autodoc]] MllamaVisionModel
-    - forward


Add these docstrings back!

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Nikil-D-Gr8 · 2025-04-22T05:55:20Z

@stevhliu Thanks a lot for the help mate, used ai a bit here and there thinking it can do a better job, looks like it just gave u more work, new to all this , will keep it organic from now,thanks, let me know if i can make any more edits tho

docs/source/en/model_doc/mllama.md

stevhliu · 2025-04-22T15:58:14Z

docs/source/en/model_doc/mllama.md


-## Usage Example
+For quantized inference, use `BitsAndBytesConfig`:


Suggested change

For quantized inference, use `BitsAndBytesConfig`:

Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.

The example below uses [bitsandbytes](../quantization/bitsandbytes) to quantize the weights to 4-bits.

docs/source/en/model_doc/mllama.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

stevhliu · 2025-04-23T17:37:51Z

docs/source/en/model_doc/mllama.md

-model_id = "meta-llama/Llama-3.2-11B-Vision"
-model = MllamaForConditionalGeneration.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16)
-processor = AutoProcessor.from_pretrained(model_id)
+<div class="flex justify-center">


We can remove this image and replace it with the quantization example in this comment.

import torch from transformers import BitsAndBytesConfig, MllamaForConditionalGeneration, AutoProcessor bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ) model = MllamaForConditionalGeneration.from_pretrained( "meta-llama/Llama-3.2-11B-Vision-Instruct", device_map="auto", torch_dtype=torch.bfloat16, attn_implementation="sdpa", quantization_config=bnb_config ) processor = AutoProcessor.from_pretrained("meta-llama/Llama-3.2-11B-Vision-Instruct") messages = [ [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"}, {"type": "text", "text": "What does the image show?"} ] } ], ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt" ).to("cuda") output = model.generate(**inputs, max_new_tokens=25) print(processor.decode(output[0]))

Separate from the AutoModel example and outside of the <hfoption> block, you should have a separate code example for quantization as shown in the code snippet above.

The image hasn't been removed yet

stevhliu · 2025-04-23T17:38:05Z

docs/source/en/model_doc/mllama.md

 ```python
 import torch
-from transformers import MllamaForConditionalGeneration, AutoProcessor
+from transformers import BitsAndBytesConfig, MllamaForConditionalGeneration, AutoProcessor


Suggested change

from transformers import BitsAndBytesConfig, MllamaForConditionalGeneration, AutoProcessor

from transformers import MllamaForConditionalGeneration, AutoProcessor

The AutoModel example shouldn't show quantization usage, so it was fine the way it was before. I was just removing BitsAndBytesConfig from the import

stevhliu · 2025-04-23T17:39:01Z

docs/source/en/model_doc/mllama.md

+- When training, mask out the `<|image|>` tokens in labels
+- For CUDA index errors during generation, expand the `lm_head`:
+
+```python


Remember to indent here!

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Nikil-D-Gr8 · 2025-04-23T18:05:18Z

is it fine now?
@stevhliu

stevhliu · 2025-04-23T18:28:00Z

docs/source/en/model_doc/mllama.md

 ```python
 import torch
-from transformers import MllamaForConditionalGeneration, AutoProcessor
+from transformers import BitsAndBytesConfig, MllamaForConditionalGeneration, AutoProcessor


The AutoModel example shouldn't show quantization usage, so it was fine the way it was before. I was just removing BitsAndBytesConfig from the import

stevhliu · 2025-04-23T18:29:28Z

docs/source/en/model_doc/mllama.md

-model_id = "meta-llama/Llama-3.2-11B-Vision"
-model = MllamaForConditionalGeneration.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16)
-processor = AutoProcessor.from_pretrained(model_id)
+<div class="flex justify-center">


Separate from the AutoModel example and outside of the <hfoption> block, you should have a separate code example for quantization as shown in the code snippet above.

The image hasn't been removed yet

stevhliu · 2025-04-23T18:29:36Z

docs/source/en/model_doc/mllama.md

+- When training, mask out the `<|image|>` tokens in labels
+- For CUDA index errors during generation, expand the `lm_head`:
+
+```python


Still not indented

stevhliu · 2025-04-23T18:29:49Z

docs/source/en/model_doc/mllama.md

+
+
+
+
+


No need to add all these extra lines at the end either

added mllama doc

f080c41

github-actions bot marked this pull request as draft April 21, 2025 04:59

stevhliu mentioned this pull request Apr 21, 2025

[Community contributions] Model cards #36979

Open

stevhliu reviewed Apr 21, 2025

View reviewed changes

Nikil-D-Gr8 and others added 8 commits April 22, 2025 11:00

Update docs/source/en/model_doc/mllama.md

1dbfe9c

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/model_doc/mllama.md

15c9f49

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/model_doc/mllama.md

3b5e405

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/model_doc/mllama.md

61a6e72

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/model_doc/mllama.md

06c05b9

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/model_doc/mllama.md

e272d4f

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

fixed the mentioned

2cabfe6

Merge branch 'main' into main

6797f69

added mllama vision model

0ba1a8d

Nikil-D-Gr8 marked this pull request as ready for review April 22, 2025 06:23

stevhliu reviewed Apr 22, 2025

View reviewed changes

Nikil-D-Gr8 and others added 5 commits April 23, 2025 11:12

Update docs/source/en/model_doc/mllama.md

eeb6da4

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/model_doc/mllama.md

1ca8672

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/model_doc/mllama.md

bd3cd00

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

changed the redundant notes

bf31f9a

Merge branch 'main' into main

b38093e

stevhliu reviewed Apr 23, 2025

View reviewed changes

Nikil-D-Gr8 and others added 2 commits April 23, 2025 23:24

Update docs/source/en/model_doc/mllama.md

2fd46c5

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

fix indent and img

25ebd74

stevhliu reviewed Apr 23, 2025

View reviewed changes

Nikil-D-Gr8 closed this May 10, 2025


		## Usage Example
		For quantized inference, use `BitsAndBytesConfig`:

-For quantized inference, use `BitsAndBytesConfig`:
+Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
+The example below uses [bitsandbytes](../quantization/bitsandbytes) to quantize the weights to 4-bits.

	from transformers import BitsAndBytesConfig, MllamaForConditionalGeneration, AutoProcessor
	from transformers import MllamaForConditionalGeneration, AutoProcessor

added mllama doc #37647

added mllama doc #37647

Uh oh!

Conversation

Nikil-D-Gr8 commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 21, 2025

Uh oh!

stevhliu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Nikil-D-Gr8 commented Apr 22, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Nikil-D-Gr8 commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Nikil-D-Gr8 commented Apr 21, 2025 •

edited

Loading

stevhliu left a comment •

edited

Loading

Nikil-D-Gr8 commented Apr 23, 2025 •

edited

Loading