clip

Star

Here are 480 public repositories matching this topic...

mikel-brostrom / boxmot

Sponsor

Star

BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models

Updated Jun 21, 2025
Python

CVHub520 / X-AnyLabeling

Sponsor

Star

Effortless data labeling with AI support from Segment Anything and other awesome models.

Updated Jun 21, 2025
Python

OFA-Sys / Chinese-CLIP

Star

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

nlp computer-vision deep-learning transformers pytorch chinese pretrained-models multi-modal clip coreml-models contrastive-loss vision-language multi-modal-learning image-text-retrieval vision-and-language-pre-training

Updated Aug 6, 2024
Python

marqo-ai / marqo

Star

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

Updated Jun 20, 2025
Python

open-mmlab / mmpretrain

Star

OpenMMLab Pre-training Toolbox and Benchmark

deep-learning pytorch image-classification resnet pretrained-models clip mae mobilenet moco multimodal self-supervised-learning constrastive-learning beit vision-transformer swin-transformer masked-image-modeling convnext

Updated Nov 1, 2024
Python

pharmapsychotic / clip-interrogator

Star

Image to prompt with BLIP and CLIP

pytorch clip

Updated May 15, 2024
Python

open-compass / VLMEvalKit

Star

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

computer-vision evaluation pytorch gemini openai vqa vit gpt multi-modal clip claude openai-api gpt4 large-language-models llm chatgpt llava qwen gpt-4v

Updated Jun 21, 2025
Python

cambrian-mllm / cambrian

Star

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

computer-vision chatbot representation-learning clip dino large-language-models llms instruction-tuning mllm multimodal-large-language-models

Updated Oct 30, 2024
Python

QIN2DIM / hcaptcha-challenger

Star

🥂 Gracefully face hCaptcha challenge with multimodal large language model.

agent captcha gemini openai yolo clip captcha-solving captcha-solver ai-agents hcaptcha playwright hcaptcha-solver llm chatgpt

Updated Jun 18, 2025
Python

roboflow / awesome-openai-vision-api-experiments

Star

Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

computer-vision openai classification clip zero-shot chatgpt segment-anything open-vocabulary-detection open-vocabulary-segmentation grounding-dino

Updated Jan 14, 2025
Python

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

chatbot llama clip mulit-modal vision-language vicuna gpt-4 vision-language-pretraining llava video-chatboat video-conversation

Updated Mar 29, 2025
Python

unum-cloud / uform

Star

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Updated Jan 3, 2025
Python

ArrowLuo / CLIP4Clip

Star

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

search retrieval ranking clip multimodality multimodal-learning multimodal activitynet retrieval-model msvd msrvtt video-text-retrieval lsmdc didemo video-clip-retrieval

Updated Apr 12, 2024
Python

omerbt / Text2LIVE

Star

Official Pytorch Implementation for "Text2LIVE: Text-Driven Layered Image and Video Editing" (ECCV 2022 Oral)

image-editing generative-model image-manipulation clip video-editing single-image eccv2022 text2live text-driven-editing single-video

Updated Mar 9, 2023
Python

eps696 / aphantasia

Star

CLIP + FFT/DWT/RGB = text to image/video

clip text-to-image text-to-video

Updated Feb 13, 2025
Python

pengsongyou / openscene

Star

[CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies

clip point-clouds semantic-segmentation scannet point-cloud-segmentation nuscenes matterport3d 3d-scene-understanding llm cvpr2023

Updated Oct 27, 2023
Python

PaddlePaddle / PaddleMIX

Star

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Updated Jun 19, 2025
Python

Sense-GVT / DeCLIP

Star

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

multi-model clip big-model zero-shot self-supervised image-text vision-language-pretraining

Updated Sep 19, 2022
Python

SkalskiP / awesome-foundation-and-multimodal-models

Sponsor

Star

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

nlp computer-vision image-captioning clip blip multimodal zero-shot-detection foundational-models llava segment-anything open-vocabulary-detection open-vocabulary-segmentation grounding-dino

Updated Feb 29, 2024
Python

leondgarse / keras_cv_attention_models

Star

Keras beit,caformer,CMT,CoAtNet,convnext,davit,dino,efficientdet,edgenext,efficientformer,efficientnet,eva,fasternet,fastervit,fastvit,flexivit,gcvit,ghostnet,gpvit,hornet,hiera,iformer,inceptionnext,lcnet,levit,maxvit,mobilevit,moganet,nat,nfnets,pvt,swin,tinynet,tinyvit,uniformer,volo,vanillanet,yolor,yolov7,yolov8,yolox,gpt2,llama2, alias kecam

recognition tensorflow model detection keras tf2 imagenet attention coco clip tf visualizing ddpm stable-diffusion segment-anything

Updated Apr 21, 2025
Python

Improve this page

Add a description, image, and links to the clip topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the clip topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clip

Here are 480 public repositories matching this topic...

mikel-brostrom / boxmot

CVHub520 / X-AnyLabeling

OFA-Sys / Chinese-CLIP

marqo-ai / marqo

open-mmlab / mmpretrain

pharmapsychotic / clip-interrogator

open-compass / VLMEvalKit

cambrian-mllm / cambrian

QIN2DIM / hcaptcha-challenger

roboflow / awesome-openai-vision-api-experiments

mbzuai-oryx / Video-ChatGPT

unum-cloud / uform

ArrowLuo / CLIP4Clip

omerbt / Text2LIVE

eps696 / aphantasia

pengsongyou / openscene

PaddlePaddle / PaddleMIX

Sense-GVT / DeCLIP

SkalskiP / awesome-foundation-and-multimodal-models

leondgarse / keras_cv_attention_models

Improve this page

Add this topic to your repo