#

multimodal

Here are 35 public repositories matching this topic...

mediar-ai / screenpipe

AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording

machine-learning ai computer-vision ml agi vision agents multimodal llm

Updated Jun 6, 2025
TypeScript

big-AGI

enricoros / big-AGI

AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

ui beam agi openai gpt mistral multimodal groq openai-api gpt-4 large-language-models stable-diffusion generative-ai chatgpt chatgpt-ui gpt-5 anthropic

Updated Jun 20, 2025
TypeScript

pyspur

PySpur-Dev / pyspur

A visual playground for agentic workflows: Iterate over your agents 10x faster

Updated May 12, 2025
TypeScript

firebase / genkit

An open source framework for building AI-powered apps with familiar code-centric patterns. Genkit makes it easy to develop, integrate, and test AI features with observability and evaluations. Genkit works with various models and platforms.

machine-learning ai agents multimodal rag vector-database embedders llm genkit

Updated Jun 20, 2025
TypeScript

alan-ai / alan-sdk-ionic

The Self-Coding System for Your App — Alan AI SDK for Ionic

machine-learning text-to-speech sdk ionic chatbot voice voice-commands speech-recognition voice-control voice-assistant conversational-ai vui multimodal voice-interface voice-ai alan-studio alan-ionic-sdk

Updated Apr 22, 2025
TypeScript

llm-jp / awesome-japanese-llm

日本語LLMまとめ - Overview of Japanese LLMs

japanese generative-model japanese-language language-models language-model generative-models multimodal vision-and-language vision-language foundation-models large-language-models llm llms generative-ai large-language-model vision-language-model japanese-llm japanese-language-model llm-japanese

Updated Jun 21, 2025
TypeScript

xtreme1

xtreme1-io / xtreme1

Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.

computer-vision image-annotation annotation point-cloud image-classification annotation-tool 3d-annotation labeling-tool multimodal lidar-object-tracking image-labelling-tool lidar-object-detection lidar-camera-fusion lidar-annotation rlhf

Updated Feb 14, 2025
TypeScript

vignshwarar / AI-Employe

Create browser automation as if you were teaching a human using GPT-4 Vision.

productivity automation rpa automation-testing multimodal gpt-4

Updated Feb 19, 2024
TypeScript

xlang-ai / OSWorld-G

Scaling Computer-Use Grounding via UI Decomposition and Synthesis

agent benchmark natural-language-processing gui models dataset vlm rpa multimodal large-action-model

Updated Jun 18, 2025
TypeScript

digimon-engine

CohumanSpace / digimon-engine

Digimon Engine — Multi-Agent, Multi-Player Framework for AI-Native Games and Agentic Metaverse

open-source machine-learning multiplayer mcp artificial-intelligence agents swarm-intelligence multimodal metaverse-infrastructure agentic-framework ai-native-game mcp-server digimon-engine

Updated Apr 20, 2025
TypeScript

outspeed-ai / voice-devtools

Developer tools to debug and build realtime voice agents. Supports multiple models.

nodejs javascript open-source webrtc multimodal voice-ai

Updated May 26, 2025
TypeScript

tychenjiajun / art

AI-PP3 is a command-line tool that uses artificial intelligence to analyze RAW photos and generate optimized processing profiles (PP3 files) for RawTherapee.

cli ai computer-vision photography dng image-processing photo-editing cr2 nef raw batch-processing pp3 arw rawtherapee multimodal

Updated Jun 21, 2025
TypeScript

fastrtc-demo

rohanprichard / fastrtc-demo

A simple POC of FastRTC, a framework to use voice mode in python!

voice-assistant conversational-ai voice-activity-detection multimodal fastapi huggingface generative-ai fastrtc

Updated Apr 7, 2025
TypeScript

ronantakizawa / osintimage.ai

AI-Powered OSINT Image Analysis

osint gemini multimodal

Updated Jan 18, 2024
TypeScript

sinhaGuild / storyboard-ai

An opinionated hybrid boilerplate with python backend and react-ts frontend, dockerized for deployment. Uses language model chaining to sequentially generate multi-modal (images and text) content from micro prompts.

azure openai multimodal dalle

Updated May 2, 2023
TypeScript

jacobmarks / audio-retrieval-plugin

FiftyOne Plugin for searching images by audio clip using ImageBind and Qdrant

react javascript python machine-learning plugins mui replicate multimodal vector-search fiftyone qdrant imagebind

Updated Nov 1, 2023
TypeScript

weaviate-tutorials / next-multimodal-search-demo

a Weaviate multimodal search demo

search nextjs multimodal weaviate vector-database generative-ai imagebind weaviate-starter

Updated Apr 9, 2025
TypeScript

C-W-D-Harshit / lume-ai

AI-powered multimodal chat app with real-time responses, file support, token tracking, and dark mode. Built with Next.js. Open source under MIT.

typescript ai nextjs chatbot openai multimodal anthropic openrouter vercelaisdk

Updated Jan 23, 2025
TypeScript

iamsrikanthnani / gemini

Gemini is an open-source application powered by the Google Gemini Vision API. It enables users to identify and learn about objects captured by their camera through a simple and interactive experience. Just say 'Hey Gemini' and show an object to the camera and say!

ai machine speech-synthesis gemini video-processing vision speech-recognition google-api multimodal generative-ai google-gemini gemini-ai

Updated Jan 3, 2024
TypeScript

jacobmarks / concept-interpolation

Interpolate between two text concepts using a CLIP model and FiftyOne Plugins!

react python plugins multimodal fiftyone

Updated Apr 4, 2024
TypeScript

Improve this page

Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."