Skip to content

Hacky tensor parallel for Vision Models #791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

Ph0rk0z
Copy link

@Ph0rk0z Ph0rk0z commented May 25, 2025

I was frustrated with TP not working on pixtral-large (its slower than qwen235b) and messed around a little bit. The model still sees images and generates text. No obvious side effects were observed, but I will test more along with other arch. Torch asserts because stuff is in inference mode and this bypasses the check with a regular copy. Somehow it's also faster by like .10 t/s.

I will also try it on qwen-VL at some point. Mainly this is here for anyone else who likes to chat with memes but wants higher speeds. Still have to test on long context too. I only did a handful of images. Maybe at 32k ctx it blows up or goes OOM. Everyone feel free to tell my why this is a horrible idea :P

update: I have used up to 20k context on pixtral and have tested qwen2 VL 72b. It's working as well. 1MB images eat your context, who would have thought...

@Ph0rk0z Ph0rk0z changed the title Hacky tensor parallel for Pixtral Large Hacky tensor parallel for Vision Models May 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant