A powerful code indexing and semantic search tool with multi-platform support. Index your entire codebase and perform intelligent semantic searches powered by vector databases and AI embeddings.
In the AI-first development era, traditional keyword-based search is no longer sufficient for modern software development:
- AI-Powered IDEs like Cursor and GitHub Copilot are transforming development workflows
- Growing demand for intelligent code assistance and semantic understanding
- Modern codebases contain millions of lines across hundreds of files, making manual navigation inefficient
- Regex and keyword-based search miss contextual relationships
- Developers waste time navigating large codebases manually
- Knowledge transfer between team members is inefficient
- Traditional search tools can't bridge the gap between human intent and code implementation
CodeIndexer bridges the gap between human understanding and code discovery through:
- Semantic search with natural language queries like "find authentication functions"
- AI-powered understanding of code meaning and relationships
- Universal integration across multiple platforms and development environments
π‘ Find code by describing functionality, not just keywords - Discover existing solutions before writing duplicate code.
- π Semantic Code Search: Ask questions like "find functions that handle user authentication" instead of guessing keywords
- π Intelligent Indexing: Automatically index entire codebases and build semantic vector databases with contextual understanding
- π― Context-Aware Discovery: Find related code snippets based on meaning, not just text matching
- π Developer Productivity: Significantly reduce time spent searching for relevant code and discovering existing solutions
- π§ Embedding Providers: Support for OpenAI and VoyageAI as embedding providers
- πΎ Vector Storage: Integrated with Milvus/Zilliz Cloud for efficient storage and retrieval
- π οΈ VSCode Integration: Built-in VSCode extension for seamless development workflow
- π€ MCP Support: Model Context Protocol integration for AI agent interactions
- π Progress Tracking: Real-time progress feedback during indexing operations
- π¨ Customizable: Configurable file extensions, ignore patterns, and chunk sizes
CodeIndexer is a monorepo containing three main packages:
@code-indexer/core
: Core indexing engine with embedding and vector database integration- VSCode Extension: Semantic Code Search extension for Visual Studio Code
@code-indexer/mcp
: Model Context Protocol server for AI agent integration
- Embedding Providers: OpenAI, VoyageAI
- Vector Databases: Milvus (gRPC & RESTful API) or Zilliz Cloud(fully managed vector database as a service)
- Languages: TypeScript, JavaScript, Python, Java, C++, C#, Go, Rust, PHP, Ruby, Swift, Kotlin, Scala, Markdown
- Development Tools: VSCode, Model Context Protocol
- Node.js >= 20.0.0
- pnpm >= 10.0.0
- Milvus database
- OpenAI or VoyageAI API key
# Install dependencies
pnpm install
# Build all packages
pnpm build
import { CodeIndexer, MilvusVectorDatabase } from '@code-indexer/core';
// Initialize vector database
const vectorDatabase = new MilvusVectorDatabase({
address: 'localhost:19530'
});
// Create indexer instance
const indexer = new CodeIndexer({
vectorDatabase,
chunkSize: 1000,
chunkOverlap: 200
});
// Index your codebase
const stats = await indexer.indexCodebase('./your-project');
console.log(`Indexed ${stats.indexedFiles} files, ${stats.totalChunks} chunks`);
// Perform semantic search
const results = await indexer.semanticSearch('./your-project', 'vector database operations', 5);
results.forEach(result => {
console.log(`File: ${result.relativePath}`);
console.log(`Score: ${(result.score * 100).toFixed(2)}%`);
console.log(`Content: ${result.content.substring(0, 100)}...`);
});
π Each package has its own detailed documentation and usage examples. Click the links below to learn more.
Core indexing engine that provides the fundamental functionality for code indexing and semantic search. Handles embedding generation, vector storage, and search operations.
Model Context Protocol (MCP) server that enables AI assistants and agents to interact with CodeIndexer through a standardized protocol. Exposes indexing and search capabilities via MCP tools.
Visual Studio Code extension that integrates CodeIndexer directly into your IDE. Provides an intuitive interface for semantic code search and navigation.
# Clone repository
git clone https://github.com/zilliztech/CodeIndexer.git
cd CodeIndexer
# Install dependencies
pnpm install
# Start development mode
pnpm dev
# Build all packages
pnpm build
# Build specific package
pnpm build:core
pnpm build:vscode
pnpm build:mcp
# Basic usage example
pnpm example:basic
# Development with file watching
cd examples/basic-usage
pnpm dev
# Required: Embedding provider API key
OPENAI_API_KEY=your-openai-api-key
# or
VOYAGEAI_API_KEY=your-voyageai-api-key
# Optional: Milvus configuration
MILVUS_ADDRESS=localhost:19530
MILVUS_TOKEN=your-milvus-token
By default, CodeIndexer supports:
- Programming languages:
.ts
,.tsx
,.js
,.jsx
,.py
,.java
,.cpp
,.c
,.h
,.hpp
,.cs
,.go
,.rs
,.php
,.rb
,.swift
,.kt
,.scala
,.m
,.mm
- Documentation:
.md
,.markdown
Common directories and files are automatically ignored:
node_modules/**
,dist/**
,build/**
.git/**
,.vscode/**
,.idea/**
*.log
,*.min.js
,*.map
Check the /examples
directory for complete usage examples:
- Basic Usage: Simple indexing and search example
We welcome contributions! Please see our Contributing Guide for details on how to get started.
Package-specific contributing guides:
- AST-based code analysis for improved understanding
- Support for additional embedding providers
- Agent-based interactive search mode
- Enhanced code chunking strategies
- Search result ranking optimization
This project is licensed under the MIT License - see the LICENSE file for details.