Skip to content

"AttributeError: 'NoneType' object has no attribute 'shape' in llava_arch.py prepare_inputs_labels_for_multimodal" after fine tuning  #1889

Open
@samundiswary-cloud

Description

@samundiswary-cloud

Describe the issue

Issue:

Command:

!pip install --upgrade git+https://github.com/huggingface/transformers.git
!pip install accelerate bitsandbytes scipy gradio sentencepiece einops
!pip install --upgrade git+https://github.com/haotian-liu/LLaVA.git@v1.6.0
Inference script:
import torch
from PIL import Image
from transformers import AutoTokenizer, CLIPImageProcessor
from llava.model.language_model.llava_llama import LlavaLlamaForCausalLM
from llava.conversation import conv_templates
from llava.mm_utils import process_images, tokenizer_image_token
from google.colab import drive
import os
from huggingface_hub import login


# =======================
# ✅ STEP 3: Set model and image path
# =======================
# For HF hub model:
model_path = "samundiswary/AgrifinanceLORA"  # 👈 Replace this with your actual repo ID
# If you're using RunPod or local path:
# model_path = "/workspace/Llava_finetune/lora_output"  ← only if not using HF Hub
image_path = "/content/drive/MyDrive/Multimodal_Dataset/images/tables/2021/table_2021_121.png"
question = "What is shown in table?"

# =======================
# ✅ STEP 4: Load tokenizer, processor, model
# =======================
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
image_processor = CLIPImageProcessor.from_pretrained("openai/clip-vit-large-patch14")

model = LlavaLlamaForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    device_map="auto"
)
model.eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# =======================
# ✅ STEP 5: Load and move vision tower to CUDA
# =======================
vision_tower = model.get_vision_tower()
if hasattr(vision_tower, "load_model"):
    vision_tower.load_model()
vision_tower.to(torch.device("cuda"))

# =======================
# ✅ STEP 6: Preprocess image
# =======================
raw_image = Image.open(image_path).convert("RGB")
# Process image to get the tensor ready for the vision tower
image_tensor = process_images([raw_image], image_processor, model.config)[0]
image_tensor = image_tensor.to(device=device, dtype=torch.float16)
if image_tensor.dim() == 3:
    image_tensor = image_tensor.unsqueeze(0)


# =======================
# ✅ STEP 7: Prepare prompt
# =======================
conv = conv_templates["llava_v1"].copy()
conv.append_message(conv.roles[0], question)
conv.append_message(conv.roles[1], None)
prompt = conv.get_prompt()

input_ids = tokenizer_image_token(prompt, tokenizer, return_tensors="pt").unsqueeze(0).to(torch.device("cuda"))
print("Prompt:", prompt)
print("Type of input_ids:", input_ids.dtype)
print("Shape of input_ids before generate:", input_ids.shape)
print("Shape of image_tensor:", image_tensor.shape)
print("Type of image_tensor:", image_tensor.dtype)
# =======================
# ✅ STEP 8: Inference
# =======================
with torch.no_grad():
  output_ids = model.generate(
     input_ids=input_ids,
     images=image_tensor,
     image_sizes=[image_tensor.shape[-2:]],  # Important fix
     max_new_tokens=256,
     do_sample=False,
     temperature=0.7
    )
decoded_output = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
print("\n🧾 Answer:", output_ids)

PASTE THE COMMANDS HERE.

Error:The AttributeError: 'NoneType' object has no attribute 'shape' occurs in llava_arch.py line 150.
Log:

PASTE THE LOGS HERE.

Prompt: A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER:
This graph shows the wholesale price of Rice. What is wholesale price ffor the month Jan in year 2022? ASSISTANT:
Type of input_ids: torch.int64
Shape of input_ids before generate: torch.Size([1, 74])
Shape of image_tensor: torch.Size([3, 224, 224])
Type of image_tensor: torch.float16
--- prepare_inputs_labels_for_multimodal called ---
Vision tower is None: False
Images is None: False
Images type: <class 'torch.Tensor'>
Images shape: torch.Size([3, 224, 224])
Screenshots:

Image

You may attach screenshots if it better explains the issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions