GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
-
Updated
May 27, 2025 - C++
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
High-speed Large Language Model Serving for Local Deployment
Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
LLMs as Copilots for Theorem Proving in Lean
Framework for running AI locally on mobile devices and wearables. Hardware-aware C/C++ backend with wrappers for Flutter & React Native. Kotlin & Swift coming soon.
prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters
A highly optimized LLM inference acceleration engine for Llama and its variants.
Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)
DEEPPOWERS is a Fully Homomorphic Encryption (FHE) framework built for MCP (Model Context Protocol), aiming to provide end-to-end privacy protection and high-efficiency computation for the upstream and downstream ecosystem of the MCP protocol.
A high-performance inference system for large language models, designed for production environments.
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
CPU inference for the DeepSeek family of large language models in C++
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
Run generative AI models in sophgo BM1684X/BM1688
LLM in Godot
High-speed and easy-use LLM serving framework for local deployment
Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.
To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."