qwen2-vl

Here are 21 public repositories matching this topic...

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4v, Phi4, ...) (AAAI 2025).

Updated Jun 20, 2025
Python

roboflow / maestro

Star

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

transformers vqa objectdetection captioning fine-tuning multimodal vision-and-language phi-3-vision paligemma florence-2 qwen2-vl

Updated Jun 17, 2025
Python

2U1 / Qwen2-VL-Finetune

Star

An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.

chatbot multimodal vision-language vision-language-model qwen2-vl qwen2-5

Updated Jun 18, 2025
Python

PaddlePaddle / PaddleMIX

Star

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Updated Jun 19, 2025
Python

NetEase-Media / grps_trtllm

Star

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.

Updated May 14, 2025
Python

drive-bench / toolkit

Star

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

autonomous-driving chatgpt vision-language-models phi-3 internvl qwen2-vl driving-with-language

Updated Feb 22, 2025
Python

arcstep / illufly

Star

✨🦋 illufly - 【幻蝶】基于记忆蒸馏、资料检索的自我进化智能体

agent ai growth openai multiagent gpt rag llm longtext qwen qwen2 dashscope glm-4 zhipu qwen2-vl illufly

Updated Jun 5, 2025
Python

soulteary / dify-with-qwen-vl

Star

视频理解：千问视频多模态模型 & Dify

dify qwen2 qwen2-vl

Updated Sep 2, 2024
Python

fireicewolf / wd-llm-caption-cli

Star

A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha,meta Llama 3.2 Vision Instruct and Qwen2 VL Instruct models.

image-caption wd14 llama3-vision florence-2 qwen2-vl joy-caption

Updated Mar 18, 2025
Python

col14m / cadrille

Star

cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning

pytorch cad vlm cadquery llm qwen2-vl

Updated Jun 12, 2025
Python

see2023 / autoXHS

Star

基于多模态大模型的智能搜索助手，通过AI技术实现小红书平台的智能化信息检索和知识整合|An intelligent search assistant based on multimodal large models, enabling smart information retrieval and knowledge integration on the Xiaohongshu platform.

spider selenium-webdriver xiaohongshu llm qwen2-vl

Updated Nov 6, 2024
Python

ZachcZhang / Qwen2-VL-inference

Star

An open-source server implementation for inference Qwen2-VL series model using fastapi.

inference fastapi huggingface mllm qwen2-vl

Updated Nov 20, 2024
Python

Valdanitooooo / chat_with_qwen2_vl_test

Star

qwen2-vl

Updated Dec 27, 2024
Python

zhangguanghao523 / CMMCoT

Star

Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

mcot cot chain-of-thought mllm multimodel-large-language-model qwen2-vl

Updated Apr 24, 2025
Python

PRITHIVSAKTHIUR / Aya-Vision-Ocr-vs-Qwen2VL-Ocr

Star

Messy Handwriting OCR Comparison Between Aya-Vision-8B and Qwen2VL-OCR-2B

ocr image-to-text huggingface-transformers vision-transformer qwen2-vl aya-vision

Updated Mar 22, 2025
Python

Madhur-1 / RevealVLLMSafetyEval

Star

RevealVLLMSafetyEval is a comprehensive pipeline for evaluating Vision-Language Models (VLMs) on their compliance with harm-related policies. It automates the creation of adversarial multi-turn datasets and the evaluation of model responses, supporting responsible AI development and red-teaming efforts.

red-teaming responsible-ai llava vllm vision-language-models qwen2 responsible-ai-techniques llama3 phi3 gpt-4o qwen2-vl pixtral adversarial-evaluation multimodal-safety

Updated May 12, 2025
Python

PRITHIVSAKTHIUR / Qwen2.5-VL-Video-Understanding

Star

The Qwen2.5-VL-7B-Instruct model is a multimodal AI model developed by Alibaba Cloud that excels at understanding both text and images. It's a Vision-Language Model (VLM) designed to handle various visual understanding tasks, including image understanding, video analysis, and even multilingual support.

torch gradio opencv-python video-understanding huggingface-transformers vision-language-model qwen2-vl qwen2-5-vl

Updated Jun 10, 2025
Python

851543 / Qwen2.5-VL-Server

Star

Qwen2.5-VL-Server

python3 pytorch fastapi qwen2-vl

Updated Mar 9, 2025
Python

PRITHIVSAKTHIUR / Core-OCR

Star

A specialized optical character recognition (OCR) application built on advanced vision-language models, designed for document-level OCR, long-context understanding, and mathematical LaTeX formatting. Supports both image and video processing with multiple state-of-the-art model

torch gradio torchvision huggingface-transformers vision-transformer vision-language-model qwen2-vl qwen2-5-vl

Updated Jun 19, 2025
Python

KhadgaA / Amazon-ML-Challenge

Star

This repo contains the winning code for Amazon ML Challenge 2024. The challenge was to develop a Machine Learning model to extract product entity details directly from the product images.

computer-vision vqa visual-question-answering amazon-ml-challenge vision-language-model llama-factory qwen2-vl

Updated Nov 30, 2024
Python

Improve this page

Add a description, image, and links to the qwen2-vl topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the qwen2-vl topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen2-vl

Here are 21 public repositories matching this topic...

modelscope / ms-swift

roboflow / maestro

2U1 / Qwen2-VL-Finetune

PaddlePaddle / PaddleMIX

NetEase-Media / grps_trtllm

drive-bench / toolkit

arcstep / illufly

soulteary / dify-with-qwen-vl

fireicewolf / wd-llm-caption-cli

col14m / cadrille

see2023 / autoXHS

ZachcZhang / Qwen2-VL-inference

Valdanitooooo / chat_with_qwen2_vl_test

zhangguanghao523 / CMMCoT

PRITHIVSAKTHIUR / Aya-Vision-Ocr-vs-Qwen2VL-Ocr

Madhur-1 / RevealVLLMSafetyEval

PRITHIVSAKTHIUR / Qwen2.5-VL-Video-Understanding

851543 / Qwen2.5-VL-Server

PRITHIVSAKTHIUR / Core-OCR

KhadgaA / Amazon-ML-Challenge

Improve this page

Add this topic to your repo