BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models
-
Updated
Jun 21, 2025 - Python
BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models
Effortless data labeling with AI support from Segment Anything and other awesome models.
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
OpenMMLab Pre-training Toolbox and Benchmark
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
🥂 Gracefully face hCaptcha challenge with multimodal large language model.
Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Official Pytorch Implementation for "Text2LIVE: Text-Driven Layered Image and Video Editing" (ECCV 2022 Oral)
CLIP + FFT/DWT/RGB = text to image/video
[CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Keras beit,caformer,CMT,CoAtNet,convnext,davit,dino,efficientdet,edgenext,efficientformer,efficientnet,eva,fasternet,fastervit,fastvit,flexivit,gcvit,ghostnet,gpvit,hornet,hiera,iformer,inceptionnext,lcnet,levit,maxvit,mobilevit,moganet,nat,nfnets,pvt,swin,tinynet,tinyvit,uniformer,volo,vanillanet,yolor,yolov7,yolov8,yolox,gpt2,llama2, alias kecam
Add a description, image, and links to the clip topic page so that developers can more easily learn about it.
To associate your repository with the clip topic, visit your repo's landing page and select "manage topics."