Convert any source code repository into a searchable knowledge base with automatic chunking, embedding generation, and intelligent search capabilities. Now with MCP (Model Context Protocol) support for Claude Code and Cursor integration!
📦 Now available on npm! Install with: npm install -g @vezlo/src-to-kb
After installation, you'll have access to these commands:
src-to-kb- Generate knowledge base from source codesrc-to-kb-search- Search the knowledge basesrc-to-kb-api- Start REST API server with Swagger docssrc-to-kb-mcp- Start MCP server for IDE integrationsrc-to-kb-mcp-install- Auto-configure Claude Code/Cursor
- 📁 Multi-language Support: JavaScript, TypeScript, Python, Java, C++, Go, Rust, and more
- 🎯 Answer Modes: Three modes for different users - End User (simple), Developer (technical), Copilot (code-focused)
- 🌐 REST API: Full-featured API with Swagger documentation for integration with external services
- 🔍 Smart Chunking: Intelligent code splitting with configurable overlap
- 🧹 Code Cleaning: Optional comment removal and whitespace normalization
- 🔢 Embeddings: Optional OpenAI embeddings for semantic search
- 📊 Statistics: Comprehensive analysis of your codebase
- 🚀 Fast Processing: Efficient file scanning and processing
- 💾 Structured Storage: Organized JSON output for easy integration
- 🤖 MCP Server: Direct integration with Claude Code, Cursor, and other MCP-compatible tools
- 💡 AI-Powered Search: Uses OpenAI GPT-5 (latest reasoning model) for intelligent query understanding and helpful answers
- 🔐 API Authentication: Optional API key authentication for secure access
- 🌐 External Server Integration: Send code to external servers for processing and search via REST API
# Install globally
npm install -g @vezlo/src-to-kb
# Generate KB from your project
src-to-kb ./my-nextjs-app --output ./my-kb
# Start API server
src-to-kb-api
# Search your codebase
src-to-kb-search search "How does routing work?" --mode developerThat's it! Your codebase is now searchable with AI assistance.
Experience external server integration immediately with our production-ready assistant-server:
# Generate knowledge base using assistant-server
USE_EXTERNAL_KB=true EXTERNAL_KB_URL=https://your-assistant-server.com/api/knowledge/items src-to-kb ./your-repo
# Search using assistant-server
USE_EXTERNAL_KB=true EXTERNAL_KB_URL=https://your-assistant-server.com/api/search src-to-kb-search search "how does authentication work?"Assistant Server: vezlo/assistant-server - Production-ready Node.js/TypeScript API server with vector search and Docker deployment
For production deployments or custom servers:
📖 Complete Guide: External Server Setup Guide
Process your repository with default settings:
# If installed globally via npm
src-to-kb /path/to/your/repo
# Or using the script directly
node kb-generator.js /path/to/your/reposrc-to-kb /path/to/your/repo --output ./my-knowledge-base# Set your OpenAI API key
export OPENAI_API_KEY=your-api-key-here
# Generate with embeddings
src-to-kb /path/to/your/repo --embeddingsChoose the right answer mode for your needs:
# First generate a knowledge base
src-to-kb ./your-project --output ./project-kb
# Search with different modes:
# End User Mode - Simple, non-technical answers
src-to-kb-search search "how do I reset password?" --kb ./project-kb --mode enduser
# Developer Mode - Technical details and architecture (default)
src-to-kb-search search "authentication flow" --kb ./project-kb --mode developer
# Copilot Mode - Code examples and implementation patterns
src-to-kb-search search "implement user login" --kb ./project-kb --mode copilot
# View available modes
src-to-kb-search modesThe search tool adapts its responses based on who's asking:
| Mode | For | Description | Example Use Case |
|---|---|---|---|
enduser |
Non-technical users | Simple explanations without technical jargon, focuses on features and capabilities | Product managers, business stakeholders asking about features |
developer |
Software developers | Full technical details including architecture, dependencies, and implementation details | Engineers understanding codebase structure and design patterns |
copilot |
Coding assistance | Code examples, snippets, and implementation patterns ready to use | Developers looking for code to copy/adapt for their implementation |
# CEO asks: "What payment methods do we support?"
src-to-kb-search search "payment methods" --mode enduser
# Returns: Simple list of supported payment options
# Developer asks: "How is payment processing implemented?"
src-to-kb-search search "payment processing" --mode developer
# Returns: Technical details about payment gateway integration, API endpoints, error handling
# Developer needs: "Show me payment integration code"
src-to-kb-search search "payment integration" --mode copilot
# Returns: Actual code snippets for payment implementation- Filtering: Each mode filters results differently (e.g., end users don't see test files)
- AI Prompts: Custom prompts guide AI to give appropriate responses
- Formatting: Answers are formatted based on the audience (code blocks for developers, plain text for end users)
- Context: Technical depth is adjusted (high for developers, low for end users)
# Install globally from npm registry
npm install -g @vezlo/src-to-kb
# Now use the commands anywhere on your system
src-to-kb /path/to/repo # Generate knowledge base
src-to-kb-search search "your query" # Search knowledge base
src-to-kb-mcp # Start MCP server for Claude/Cursor# Run directly without installing
npx @vezlo/src-to-kb /path/to/repo
npx @vezlo/src-to-kb-search search "your query"
npx @vezlo/src-to-kb-mcp# Add as a project dependency
npm install @vezlo/src-to-kb
# Use with npx in your project
npx src-to-kb /path/to/repo# Clone the repository
git clone https://github.com/vezlo/src-to-kb.git
cd src-to-kb
# Install dependencies
npm install
# Run directly
node kb-generator.js /path/to/repoUsage: node kb-generator.js <repository-path> [options]
Options:
--output, -o Output directory (default: ./knowledge-base)
--chunk-size Chunk size in characters (default: 1000)
--chunk-overlap Overlap between chunks (default: 200)
--max-file-size Maximum file size in MB (default: 10)
--embeddings Generate OpenAI embeddings (requires OPENAI_API_KEY)
--no-comments Exclude comments from code
--exclude Additional paths to exclude (comma-separated)
--extensions File extensions to include (comma-separated)
# 1. Generate knowledge base from your frontend code
src-to-kb ./frontend/ --output ./frontend-kb
# 2. Different users asking different questions:
# Product Manager asks about features
src-to-kb-search search "password reset feature" --kb ./frontend-kb --mode enduser
# Developer investigates technical implementation
src-to-kb-search search "authentication flow" --kb ./frontend-kb --mode developer
# Developer needs code examples
src-to-kb-search search "login component implementation" --kb ./frontend-kb --mode copilot
# 3. Get statistics about the codebase
src-to-kb-search stats --kb ./frontend-kb
# 4. List all TypeScript files
src-to-kb-search type TypeScript --kb ./frontend-kb
# 5. View available answer modes
src-to-kb-search modes# Using npm package
src-to-kb /path/to/repo --output ./repo-kb --embeddings
# Or with npx
npx @vezlo/src-to-kb /path/to/repo --output ./repo-kb --embeddingssrc-to-kb /path/to/repo --extensions .js,.ts,.jsx,.tsxsrc-to-kb /path/to/repo --exclude tests,build,dist,coveragesrc-to-kb /path/to/large-repo \
--chunk-size 2000 \
--chunk-overlap 400 \
--max-file-size 20Run the included test suite to verify functionality:
# Run comprehensive tests
node test.js
# This will:
# 1. Create a test repository with sample files
# 2. Process it into a knowledge base
# 3. Verify the output structure
# 4. Test chunking on large files
# 5. Verify language detectionThe Source-to-KB REST API provides programmatic access to all functionality with comprehensive Swagger documentation.
# Start with defaults (port 3000, no authentication)
src-to-kb-api
# With custom port and API key
PORT=8080 API_KEY=your-secret-key src-to-kb-api
# With all options
PORT=8080 API_KEY=secret OPENAI_API_KEY=sk-... src-to-kb-apiOnce started, visit: http://localhost:3000/api/v1/docs for interactive Swagger UI
POST /api/v1/knowledge-bases- Create new knowledge basePOST /api/v1/search- Search with mode selectionGET /api/v1/modes- List available answer modesGET /api/v1/statistics/{id}- Get KB statisticsPOST /api/v1/process-file- Process single file
// Create knowledge base
const response = await fetch('http://localhost:3000/api/v1/knowledge-bases', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-API-Key': 'your-api-key'
},
body: JSON.stringify({
name: 'My Project',
sourcePath: '/path/to/project',
options: { chunkSize: 1500 }
})
});
// Search with mode
const searchResponse = await fetch('http://localhost:3000/api/v1/search', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-API-Key': 'your-api-key'
},
body: JSON.stringify({
query: 'authentication',
knowledgeBaseId: 'abc123',
mode: 'developer'
})
});For complete API documentation, see API_DOCUMENTATION.md
# 1. Install the package globally
npm install -g @vezlo/src-to-kb
# 2. Find your global npm installation path
npm list -g @vezlo/src-to-kb --depth=0
# 3. Add to Claude Code (replace the path with your npm global path)
# For macOS/Linux with nvm:
claude mcp add src-to-kb -- node ~/.nvm/versions/node/v22.6.0/lib/node_modules/@vezlo/src-to-kb/mcp-server.mjs
# For macOS/Linux without nvm:
claude mcp add src-to-kb -- node /usr/local/lib/node_modules/@vezlo/src-to-kb/mcp-server.mjs
# For Windows:
claude mcp add src-to-kb -- node %APPDATA%\npm\node_modules\@vezlo\src-to-kb\mcp-server.mjs
# With OpenAI API key for embeddings:
claude mcp add src-to-kb --env OPENAI_API_KEY=your-key -- node [your-path]/mcp-server.mjs# Try with npx (may not work on all systems)
claude mcp add src-to-kb -- npx -y @vezlo/src-to-kb src-to-kb-mcp# Check if installed
claude mcp list
# Remove if needed
claude mcp remove src-to-kb
# Get server details
claude mcp get src-to-kb- Restart Claude Code completely
- Test by asking Claude:
- "Generate a knowledge base for this project"
- "Search for authentication implementations"
- "What languages does this codebase use?"
- "Find files similar to config.js"
See MCP_SETUP.md for manual setup and MCP_TOOLS_GUIDE.md for detailed tool documentation.
The search tool supports three different answer modes to tailor responses based on your needs:
enduser: Simplified answers for non-technical users, focusing on features and capabilitiesdeveloper: Detailed technical answers including architecture and implementation details (default)copilot: Code-focused answers with examples and patterns for implementation
# Examples with different modes
src-to-kb-search search "how to use API?" --mode enduser # Simple explanation
src-to-kb-search search "authentication flow" --mode developer # Technical details
src-to-kb-search search "login implementation" --mode copilot # Code examples
# List available modes
src-to-kb-search modesWhen OPENAI_API_KEY is set, searches use GPT-5 (OpenAI's latest reasoning model) for intelligent answers:
# Set your OpenAI API key
export OPENAI_API_KEY=your-api-key-here
# Get intelligent, context-aware answers with mode selection
src-to-kb-search search "how does authentication work?" --kb ./project-kb --mode developer
src-to-kb-search search "where is password reset?" --kb ./project-kb --mode enduserWithout an API key, the tool provides basic keyword search:
# Basic search with pattern matching
src-to-kb-search search "authentication" --kb ./project-kb
# Find all JavaScript files
src-to-kb-search type JavaScript --kb ./project-kb
# Show statistics
src-to-kb-search stats --kb ./project-kb
# Find similar files
src-to-kb-search similar src/index.js --kb ./project-kb# Specify knowledge base path
src-to-kb-search search "query" --kb ./my-knowledge-base
# Select answer mode
src-to-kb-search search "query" --mode enduser|developer|copilot
# Show detailed evidence
src-to-kb-search search "query" --verbose
# Get raw search results (old format)
src-to-kb-search search "query" --rawThe generator creates the following directory structure:
knowledge-base/
├── documents/ # Document metadata (without content)
│ ├── doc_xxx.json
│ └── ...
├── chunks/ # Document chunks for searching
│ ├── doc_xxx.json
│ └── ...
├── embeddings/ # OpenAI embeddings (if enabled)
│ ├── doc_xxx.json
│ └── ...
└── metadata/ # Summary and statistics
└── summary.json
Each document contains:
{
"id": "doc_1234567890_abc123",
"path": "/full/path/to/file.js",
"relativePath": "src/file.js",
"fileName": "file.js",
"extension": ".js",
"size": 2048,
"checksum": "sha256-hash",
"metadata": {
"createdAt": "2024-01-01T00:00:00.000Z",
"modifiedAt": "2024-01-01T00:00:00.000Z",
"lines": 100,
"language": "JavaScript",
"type": "code"
}
}Each chunk contains:
{
"id": "doc_xxx_chunk_0",
"index": 0,
"content": "chunk content here...",
"startLine": 1,
"endLine": 25,
"size": 1000
}Transform your frontend codebase into a searchable knowledge base with AI-powered assistance:
# 1. Generate knowledge base from your project
src-to-kb /path/to/nextjs-app --output ./nextjs-kb
# 2. Start the API server
src-to-kb-api
# 3. Query your codebase
curl -X POST http://localhost:3000/api/v1/search \
-H "Content-Type: application/json" \
-d '{"query": "How is authentication implemented?", "knowledgeBaseId": "your-kb-id", "mode": "developer"}'// components/CodeSearch.jsx
import { useState } from 'react';
export default function CodeSearch() {
const [query, setQuery] = useState('');
const [result, setResult] = useState(null);
const search = async () => {
const response = await fetch('http://localhost:3000/api/v1/search', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
query,
knowledgeBaseId: 'your-kb-id',
mode: 'developer'
})
});
const data = await response.json();
setResult(data);
};
return (
<div>
<input
value={query}
onChange={(e) => setQuery(e.target.value)}
placeholder="Ask about your codebase..."
/>
<button onClick={search}>Search</button>
{result && <div>{result.answer}</div>}
</div>
);
}- 🎓 Onboarding Assistant: Help new developers understand your codebase
- 📖 In-App Documentation: Provide context-aware help within your application
- 🔍 Code Review Helper: Find similar patterns and best practices
- 🤖 Development Copilot: Get AI suggestions based on your existing code
- 📊 Code Analytics Dashboard: Visualize codebase statistics and complexity
# GitHub Actions example
name: Update Knowledge Base
on: [push]
jobs:
update-kb:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- run: npm install -g @vezlo/src-to-kb
- run: src-to-kb . --output ./kb
# Upload KB as artifact or deploy to serverFor production environments:
# Start with authentication and custom port
API_KEY=secure-key PORT=8080 src-to-kb-api
# Use with Docker
docker run -p 3000:3000 -e API_KEY=secret vezlo/src-to-kb-api- Code Documentation: Generate searchable documentation from your codebase
- AI Training: Prepare code for fine-tuning or RAG systems
- Code Analysis: Analyze patterns and structure across large repositories
- Knowledge Extraction: Extract domain knowledge from source code
- Code Search: Build intelligent code search systems
- IDE Integration: Use directly in Claude Code or Cursor for code understanding
- Team Knowledge Sharing: Create searchable knowledge bases for team onboarding
- Processes ~1000 files/minute on average hardware
- Memory efficient - streams large files
- Parallel chunk processing
- Configurable file size limits
- JavaScript (.js, .jsx)
- TypeScript (.ts, .tsx)
- Python (.py)
- Java (.java)
- C/C++ (.c, .cpp, .h, .hpp)
- C# (.cs)
- Go (.go)
- Rust (.rs)
- Ruby (.rb)
- PHP (.php)
- Swift (.swift)
- Kotlin (.kt)
- Scala (.scala)
- And many more...
Also processes:
- JSON (.json)
- YAML (.yaml, .yml)
- XML (.xml)
- Markdown (.md)
- HTML/CSS (.html, .css, .scss)
- SQL (.sql)
-
Chunking Strategy:
- Use smaller chunks (500-1000) for precise search
- Use larger chunks (2000-3000) for more context
-
Overlap:
- 10-20% overlap helps maintain context between chunks
- Increase overlap for code with many dependencies
-
Exclusions:
- Always exclude node_modules, vendor, dist directories
- Consider excluding auto-generated files
-
File Size:
- Default 10MB limit prevents processing of large binaries
- Increase for legitimate large source files
const { KnowledgeBaseGenerator } = require('./kb-generator');
async function generateKB() {
const generator = new KnowledgeBaseGenerator({
outputPath: './my-kb',
chunkSize: 1500,
generateEmbeddings: true,
openaiApiKey: 'your-api-key'
});
generator.on('fileProcessed', (data) => {
console.log(`Processed: ${data.file}`);
});
const result = await generator.processRepository('/path/to/repo');
console.log(`Generated KB with ${result.documents.length} documents`);
}
generateKB();This software is dual-licensed:
- Non-Commercial Use: Free under AGPL-3.0 license
- Commercial Use: Requires a commercial license - contact us for details
See LICENSE file for full details.
Feel free to submit issues and enhancement requests!