Skip to content

Processors do not pass on return_tensors to tokenizers properly. #38341

Closed
@shuheng-liu

Description

@shuheng-liu

System Info

  • transformers version: 4.52.3
  • Platform: macOS-15.4.1-arm64-arm-64bit-Mach-O
  • Python version: 3.13.0
  • Huggingface_hub version: 0.32.0
  • Safetensors version: 0.5.3
  • Accelerate version: not installed
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (GPU?): 2.7.0 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:

Who can help?

@ArthurZucker @zucchini-nlp

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Install pytorch, pillow, and transformers=4.52.3 using pip.
  2. Execute the following script:
import torch
from transformers import AutoProcessor
processor = AutoProcessor.from_pretrained("google/paligemma-3b-pt-224")
batch_features = processor(
    text="<image> What's in this image?",
    images=torch.zeros(3, 224, 224),
    suffix="Nothing",
    return_tensors="pt"
)

This yields an AttributeError with transformers==4.52.3

  File "/private/tmp/venv/lib/python3.13/site-packages/transformers/models/paligemma/processing_paligemma.py", line 313, in __call__
    labels = inputs["input_ids"].masked_fill(inputs["token_type_ids"] == 0, -100)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'masked_fill'

Expected behavior

The batch_features should be created without error.

There seems to be a recent bug in the __call__ method of many processors, including, e.g., PaliGemmaProcessor
This is likely caused by

return_tensors = output_kwargs["text_kwargs"].pop("return_tensors", None)
inputs = self.tokenizer(
input_strings,
text_pair=suffix,
return_token_type_ids=return_token_type_ids,
**output_kwargs["text_kwargs"],
)

which was changed in commit 32eca71

I believe the intention was to call .get() instead of .pop() on text_kwargs on line 301. Calling .pop() modifies text_kwargs in-place and hence the tokenizer would return inputs["input_ids"] as list instead of pytorch tensors. The masked_fill call below will fail when it's a list.

if return_token_type_ids:
labels = inputs["input_ids"].masked_fill(inputs["token_type_ids"] == 0, -100)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions