"AttributeError: 'NoneType' object has no attribute 'shape' in llava_arch.py prepare_inputs_labels_for_multimodal" after fine tuning 

### Describe the issue

Issue:

Command:
```Installation commands:
!pip install --upgrade git+https://github.com/huggingface/transformers.git
!pip install accelerate bitsandbytes scipy gradio sentencepiece einops
!pip install --upgrade git+https://github.com/haotian-liu/LLaVA.git@v1.6.0
Inference script:
import torch
from PIL import Image
from transformers import AutoTokenizer, CLIPImageProcessor
from llava.model.language_model.llava_llama import LlavaLlamaForCausalLM
from llava.conversation import conv_templates
from llava.mm_utils import process_images, tokenizer_image_token
from google.colab import drive
import os
from huggingface_hub import login


# =======================
# ✅ STEP 3: Set model and image path
# =======================
# For HF hub model:
model_path = "samundiswary/AgrifinanceLORA"  # 👈 Replace this with your actual repo ID
# If you're using RunPod or local path:
# model_path = "/workspace/Llava_finetune/lora_output"  ← only if not using HF Hub
image_path = "/content/drive/MyDrive/Multimodal_Dataset/images/tables/2021/table_2021_121.png"
question = "What is shown in table?"

# =======================
# ✅ STEP 4: Load tokenizer, processor, model
# =======================
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
image_processor = CLIPImageProcessor.from_pretrained("openai/clip-vit-large-patch14")

model = LlavaLlamaForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    device_map="auto"
)
model.eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# =======================
# ✅ STEP 5: Load and move vision tower to CUDA
# =======================
vision_tower = model.get_vision_tower()
if hasattr(vision_tower, "load_model"):
    vision_tower.load_model()
vision_tower.to(torch.device("cuda"))

# =======================
# ✅ STEP 6: Preprocess image
# =======================
raw_image = Image.open(image_path).convert("RGB")
# Process image to get the tensor ready for the vision tower
image_tensor = process_images([raw_image], image_processor, model.config)[0]
image_tensor = image_tensor.to(device=device, dtype=torch.float16)
if image_tensor.dim() == 3:
    image_tensor = image_tensor.unsqueeze(0)


# =======================
# ✅ STEP 7: Prepare prompt
# =======================
conv = conv_templates["llava_v1"].copy()
conv.append_message(conv.roles[0], question)
conv.append_message(conv.roles[1], None)
prompt = conv.get_prompt()

input_ids = tokenizer_image_token(prompt, tokenizer, return_tensors="pt").unsqueeze(0).to(torch.device("cuda"))
print("Prompt:", prompt)
print("Type of input_ids:", input_ids.dtype)
print("Shape of input_ids before generate:", input_ids.shape)
print("Shape of image_tensor:", image_tensor.shape)
print("Type of image_tensor:", image_tensor.dtype)
# =======================
# ✅ STEP 8: Inference
# =======================
with torch.no_grad():
  output_ids = model.generate(
     input_ids=input_ids,
     images=image_tensor,
     image_sizes=[image_tensor.shape[-2:]],  # Important fix
     max_new_tokens=256,
     do_sample=False,
     temperature=0.7
    )
decoded_output = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
print("\n🧾 Answer:", output_ids)

PASTE THE COMMANDS HERE.
```
Error:The AttributeError: 'NoneType' object has no attribute 'shape' occurs in llava_arch.py line 150.
Log: 
```
PASTE THE LOGS HERE.
```
Prompt: A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: <image>
This graph shows the wholesale price of Rice. What is wholesale price ffor the month Jan in year 2022? ASSISTANT:
Type of input_ids: torch.int64
Shape of input_ids before generate: torch.Size([1, 74])
Shape of image_tensor: torch.Size([3, 224, 224])
Type of image_tensor: torch.float16
--- prepare_inputs_labels_for_multimodal called ---
Vision tower is None: False
Images is None: False
Images type: <class 'torch.Tensor'>
Images shape: torch.Size([3, 224, 224])
Screenshots:

![Image](https://github.com/user-attachments/assets/b2d1c2de-ed15-4a53-906e-7c926a7f991f)

You may attach screenshots if it better explains the issue.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"AttributeError: 'NoneType' object has no attribute 'shape' in llava_arch.py prepare_inputs_labels_for_multimodal" after fine tuning #1889

Describe the issue

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

"AttributeError: 'NoneType' object has no attribute 'shape' in llava_arch.py prepare_inputs_labels_for_multimodal" after fine tuning #1889

Description

Describe the issue

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions