AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
-
Updated
Jun 6, 2025 - TypeScript
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
An open source framework for building AI-powered apps with familiar code-centric patterns. Genkit makes it easy to develop, integrate, and test AI features with observability and evaluations. Genkit works with various models and platforms.
The Self-Coding System for Your App — Alan AI SDK for Ionic
日本語LLMまとめ - Overview of Japanese LLMs
Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.
Create browser automation as if you were teaching a human using GPT-4 Vision.
Scaling Computer-Use Grounding via UI Decomposition and Synthesis
Digimon Engine — Multi-Agent, Multi-Player Framework for AI-Native Games and Agentic Metaverse
Developer tools to debug and build realtime voice agents. Supports multiple models.
AI-PP3 is a command-line tool that uses artificial intelligence to analyze RAW photos and generate optimized processing profiles (PP3 files) for RawTherapee.
A simple POC of FastRTC, a framework to use voice mode in python!
An opinionated hybrid boilerplate with python backend and react-ts frontend, dockerized for deployment. Uses language model chaining to sequentially generate multi-modal (images and text) content from micro prompts.
FiftyOne Plugin for searching images by audio clip using ImageBind and Qdrant
a Weaviate multimodal search demo
AI-powered multimodal chat app with real-time responses, file support, token tracking, and dark mode. Built with Next.js. Open source under MIT.
Gemini is an open-source application powered by the Google Gemini Vision API. It enables users to identify and learn about objects captured by their camera through a simple and interactive experience. Just say 'Hey Gemini' and show an object to the camera and say!
Interpolate between two text concepts using a CLIP model and FiftyOne Plugins!
Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.
To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."