-
Notifications
You must be signed in to change notification settings - Fork 29.4k
added mllama doc #37647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added mllama doc #37647
Conversation
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for your contribution!
Let's also add an example with the AttentionMaskVisualizer
|
||
Mllama has an extra token used as a placeholder for image positions in the text. It means that input ids and an input embedding layer will have an extra token. But since the weights for input and output embeddings are not tied, the `lm_head` layer has one less token and will fail if you want to calculate loss on image tokens or apply some logit processors. In case you are training, make sure to mask out special `"<|image|>"` tokens in the `labels` as the model should not be trained on predicting them. | ||
```python | ||
from transformers import pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use a real image for the example here:
import torch
from transformers import pipeline
pipeline = pipeline(
task="image-text-to-text",
model="meta-llama/Llama-3.2-11B-Vision-Instruct",
device=0,
torch_dtype=torch.bfloat16
)
messages = [
[
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"},
{"type": "text", "text": "What does the image show?"}
]
}
],
]
pipeline(text=messages, return_full_text=False)
docs/source/en/model_doc/mllama.md
Outdated
import torch | ||
from PIL import Image | ||
from transformers import MllamaForConditionalGeneration, AutoProcessor | ||
from transformers import AutoModelForCausalLM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the BitsAndBytesConfig
import torch
from transformers import BitsAndBytesConfig, MllamaForConditionalGeneration, AutoProcessor
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = MllamaForConditionalGeneration.from_pretrained(
"meta-llama/Llama-3.2-11B-Vision-Instruct",
device_map="auto",
torch_dtype=torch.bfloat16,
attn_implementation="sdpa",
quantization_config=bnb_config
)
processor = AutoProcessor.from_pretrained("meta-llama/Llama-3.2-11B-Vision-Instruct")
messages = [
[
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"},
{"type": "text", "text": "What does the image show?"}
]
}
],
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt"
).to("cuda")
output = model.generate(**inputs, max_new_tokens=25)
print(processor.decode(output[0]))
- When training, mask out the `<|image|>` tokens in labels | ||
- For CUDA index errors during generation, expand the `lm_head`: | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent this code block so it falls under the last list item
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to indent here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not indented
docs/source/en/model_doc/mllama.md
Outdated
## MllamaForCausalLM | ||
|
||
[[autodoc]] MllamaForCausalLM | ||
- forward | ||
|
||
## MllamaVisionModel | ||
|
||
[[autodoc]] MllamaVisionModel | ||
- forward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add these docstrings back!
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
@stevhliu Thanks a lot for the help mate, used ai a bit here and there thinking it can do a better job, looks like it just gave u more work, new to all this , will keep it organic from now,thanks, let me know if i can make any more edits tho |
docs/source/en/model_doc/mllama.md
Outdated
|
||
## Usage Example | ||
For quantized inference, use `BitsAndBytesConfig`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For quantized inference, use `BitsAndBytesConfig`: | |
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends. | |
The example below uses [bitsandbytes](../quantization/bitsandbytes) to quantize the weights to 4-bits. |
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
model_id = "meta-llama/Llama-3.2-11B-Vision" | ||
model = MllamaForConditionalGeneration.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16) | ||
processor = AutoProcessor.from_pretrained(model_id) | ||
<div class="flex justify-center"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this image and replace it with the quantization example in this comment.
import torch
from transformers import BitsAndBytesConfig, MllamaForConditionalGeneration, AutoProcessor
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = MllamaForConditionalGeneration.from_pretrained(
"meta-llama/Llama-3.2-11B-Vision-Instruct",
device_map="auto",
torch_dtype=torch.bfloat16,
attn_implementation="sdpa",
quantization_config=bnb_config
)
processor = AutoProcessor.from_pretrained("meta-llama/Llama-3.2-11B-Vision-Instruct")
messages = [
[
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"},
{"type": "text", "text": "What does the image show?"}
]
}
],
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt"
).to("cuda")
output = model.generate(**inputs, max_new_tokens=25)
print(processor.decode(output[0]))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate from the AutoModel
example and outside of the <hfoption>
block, you should have a separate code example for quantization as shown in the code snippet above.
The image hasn't been removed yet
```python | ||
import torch | ||
from transformers import MllamaForConditionalGeneration, AutoProcessor | ||
from transformers import BitsAndBytesConfig, MllamaForConditionalGeneration, AutoProcessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from transformers import BitsAndBytesConfig, MllamaForConditionalGeneration, AutoProcessor | |
from transformers import MllamaForConditionalGeneration, AutoProcessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AutoModel
example shouldn't show quantization usage, so it was fine the way it was before. I was just removing BitsAndBytesConfig
from the import
- When training, mask out the `<|image|>` tokens in labels | ||
- For CUDA index errors during generation, expand the `lm_head`: | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to indent here!
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
is it fine now? |
```python | ||
import torch | ||
from transformers import MllamaForConditionalGeneration, AutoProcessor | ||
from transformers import BitsAndBytesConfig, MllamaForConditionalGeneration, AutoProcessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AutoModel
example shouldn't show quantization usage, so it was fine the way it was before. I was just removing BitsAndBytesConfig
from the import
model_id = "meta-llama/Llama-3.2-11B-Vision" | ||
model = MllamaForConditionalGeneration.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16) | ||
processor = AutoProcessor.from_pretrained(model_id) | ||
<div class="flex justify-center"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate from the AutoModel
example and outside of the <hfoption>
block, you should have a separate code example for quantization as shown in the code snippet above.
The image hasn't been removed yet
- When training, mask out the `<|image|>` tokens in labels | ||
- For CUDA index errors during generation, expand the `lm_head`: | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not indented
|
||
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to add all these extra lines at the end either
What does this PR do?
As suggested in this issue #issue-2947704577 - this PR updates the documentation of the CLIP model, which will now be aligned with the standardized format for all the docs.
Worked on mllama, Used AI , so please let me know even if u need a complete rewrite.
Please let me know if there are any changes to be done, do share references if any for those changes
Documentation: @stevhliu
-->