Hacky tensor parallel for Vision Models #791

Ph0rk0z · 2025-05-25T15:31:49Z

I was frustrated with TP not working on pixtral-large (its slower than qwen235b) and messed around a little bit. The model still sees images and generates text. No obvious side effects were observed, but I will test more along with other arch. Torch asserts because stuff is in inference mode and this bypasses the check with a regular copy. Somehow it's also faster by like .10 t/s.

I will also try it on qwen-VL at some point. Mainly this is here for anyone else who likes to chat with memes but wants higher speeds. Still have to test on long context too. I only did a handful of images. Maybe at 32k ctx it blows up or goes OOM. Everyone feel free to tell my why this is a horrible idea :P

update: I have used up to 20k context on pixtral and have tested qwen2 VL 72b. It's working as well. 1MB images eat your context, who would have thought...

Seems faster than through torch by a tiny bit.

Ph0rk0z added 3 commits May 25, 2025 15:18

Pixtral TP architecture.py

764e257

Bypass Torch Restrictions for inference mode.

a85c4fb

Seems faster than through torch by a tiny bit.

architecture.py - Tested qwen2 VL 72b TP and it works.

4430729

Ph0rk0z changed the title ~~Hacky tensor parallel for Pixtral Large~~ Hacky tensor parallel for Vision Models May 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Hacky tensor parallel for Vision Models #791

Hacky tensor parallel for Vision Models #791

Uh oh!

Ph0rk0z commented May 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Hacky tensor parallel for Vision Models #791

Are you sure you want to change the base?

Hacky tensor parallel for Vision Models #791

Uh oh!

Conversation

Ph0rk0z commented May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Ph0rk0z commented May 25, 2025 •

edited

Loading