Description
Describe the issue
Issue:
Command:
!pip install --upgrade git+https://github.com/huggingface/transformers.git
!pip install accelerate bitsandbytes scipy gradio sentencepiece einops
!pip install --upgrade git+https://github.com/haotian-liu/LLaVA.git@v1.6.0
Inference script:
import torch
from PIL import Image
from transformers import AutoTokenizer, CLIPImageProcessor
from llava.model.language_model.llava_llama import LlavaLlamaForCausalLM
from llava.conversation import conv_templates
from llava.mm_utils import process_images, tokenizer_image_token
from google.colab import drive
import os
from huggingface_hub import login
# =======================
# ✅ STEP 3: Set model and image path
# =======================
# For HF hub model:
model_path = "samundiswary/AgrifinanceLORA" # 👈 Replace this with your actual repo ID
# If you're using RunPod or local path:
# model_path = "/workspace/Llava_finetune/lora_output" ← only if not using HF Hub
image_path = "/content/drive/MyDrive/Multimodal_Dataset/images/tables/2021/table_2021_121.png"
question = "What is shown in table?"
# =======================
# ✅ STEP 4: Load tokenizer, processor, model
# =======================
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
image_processor = CLIPImageProcessor.from_pretrained("openai/clip-vit-large-patch14")
model = LlavaLlamaForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
device_map="auto"
)
model.eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# =======================
# ✅ STEP 5: Load and move vision tower to CUDA
# =======================
vision_tower = model.get_vision_tower()
if hasattr(vision_tower, "load_model"):
vision_tower.load_model()
vision_tower.to(torch.device("cuda"))
# =======================
# ✅ STEP 6: Preprocess image
# =======================
raw_image = Image.open(image_path).convert("RGB")
# Process image to get the tensor ready for the vision tower
image_tensor = process_images([raw_image], image_processor, model.config)[0]
image_tensor = image_tensor.to(device=device, dtype=torch.float16)
if image_tensor.dim() == 3:
image_tensor = image_tensor.unsqueeze(0)
# =======================
# ✅ STEP 7: Prepare prompt
# =======================
conv = conv_templates["llava_v1"].copy()
conv.append_message(conv.roles[0], question)
conv.append_message(conv.roles[1], None)
prompt = conv.get_prompt()
input_ids = tokenizer_image_token(prompt, tokenizer, return_tensors="pt").unsqueeze(0).to(torch.device("cuda"))
print("Prompt:", prompt)
print("Type of input_ids:", input_ids.dtype)
print("Shape of input_ids before generate:", input_ids.shape)
print("Shape of image_tensor:", image_tensor.shape)
print("Type of image_tensor:", image_tensor.dtype)
# =======================
# ✅ STEP 8: Inference
# =======================
with torch.no_grad():
output_ids = model.generate(
input_ids=input_ids,
images=image_tensor,
image_sizes=[image_tensor.shape[-2:]], # Important fix
max_new_tokens=256,
do_sample=False,
temperature=0.7
)
decoded_output = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
print("\n🧾 Answer:", output_ids)
PASTE THE COMMANDS HERE.
Error:The AttributeError: 'NoneType' object has no attribute 'shape' occurs in llava_arch.py line 150.
Log:
PASTE THE LOGS HERE.
Prompt: A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER:
This graph shows the wholesale price of Rice. What is wholesale price ffor the month Jan in year 2022? ASSISTANT:
Type of input_ids: torch.int64
Shape of input_ids before generate: torch.Size([1, 74])
Shape of image_tensor: torch.Size([3, 224, 224])
Type of image_tensor: torch.float16
--- prepare_inputs_labels_for_multimodal called ---
Vision tower is None: False
Images is None: False
Images type: <class 'torch.Tensor'>
Images shape: torch.Size([3, 224, 224])
Screenshots:
You may attach screenshots if it better explains the issue.