A FastAPI-based backend service that enables users to upload text-based documents and interact with their content through natural language queries using Retrieval-Augmented Generation (RAG).
- π Document Upload: Support for PDF, DOCX, and TXT files
- π Semantic Search: Uses Hugging Face embeddings for intelligent document retrieval
- π€ AI-Powered Q&A: Answers questions based on your uploaded documents
- πΎ Local Vector Storage: ChromaDB for efficient semantic search
- π Fast & Modern: Built with FastAPI for high performance
- Architecture Guide - Detailed system architecture and design decisions
- Contributing Guide - Guidelines for contributing to the project
- Security Guide - API key management and security best practices
- Testing Guide - How to run and write tests
- Examples - Usage examples and sample documents
- Setup Scripts - Automated setup instructions
- Changelog - Version history and release notes
Current Version: 1.0.0
This project follows Semantic Versioning with single source of truth in the VERSION file.
# Check current version
python scripts/version.py
# Bump versions
python scripts/version.py patch # 1.0.0 β 1.0.1 (bug fixes)
python scripts/version.py minor # 1.0.0 β 1.1.0 (new features)
python scripts/version.py major # 1.0.0 β 2.0.0 (breaking changes)
# Set specific version
python scripts/version.py set 2.0.0Single Source of Truth:
- Version defined in:
VERSIONfile - Auto-read by:
src/__init__.py,pyproject.toml, FastAPI - Exposed via:
GET /healthendpoint - See: CHANGELOG.md for release history
βββββββββββββββ
β Client β
ββββββββ¬βββββββ
β
βΌ
βββββββββββββββ
β FastAPI β
β Endpoints β
ββββββββ¬βββββββ
β
ββββββββββββββββ¬βββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββ ββββββββββββββ ββββββββββββββ
β Upload β β Query β β Health β
β Router β β Router β β Endpoint β
ββββββββ¬ββββββ ββββββββ¬ββββββ ββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββββββββββββ
β RAG Engine β
β - Document Chunking β
β - Embedding Generation β
β - Semantic Retrieval β
β - LLM Answer Generation β
ββββββββ¬ββββββββββββββββββββββ
β
ββββββββββββββββ¬βββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββ ββββββββββββββ ββββββββββββββ
β ChromaDB β β Hugging β β Sentence β
β Vector β β Face β βTransformersβ
β Store β β LLM β β Embeddings β
ββββββββββββββ ββββββββββββββ ββββββββββββββ
- Python 3.9+
- 4GB+ RAM recommended
- (Optional) Hugging Face API key for larger models
git clone <repo-url>
cd ai-knowledge-api
chmod +x scripts/setup.sh
./scripts/setup.shgit clone <repo-url>
cd ai-knowledge-api
scripts\setup.batgit clone <repo-url>
cd ai-knowledge-apipython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtcp .env.example .envEdit .env with your settings:
HF_API_KEY=huggingface_key_here
MODEL_EMBEDDING=sentence-transformers/all-MiniLM-L6-v2
MODEL_LLM=microsoft/phi-2Note: See
docs/SECURITY.mdfor best practices on managing API keys securely.
The only required variable is HF_API_KEY if you plan to use Hugging Face models that need authentication.
uvicorn src.main:app --reload --host 0.0.0.0 --port 7860The API will be available at:
- Interactive Docs: http://localhost:7860
- ReDoc: http://localhost:7860/redoc
- Health Check: http://localhost:7860/health
GET /health
Check if the API is running.
curl http://localhost:7860/healthResponse:
{
"status": "ok",
"message": "AI Knowledge API is running"
}POST /upload/text
Upload text content directly.
curl -X POST http://localhost:7860/upload/text \
-H "Content-Type: application/json" \
-d '{
"text": "Artificial Intelligence is transforming the world. Machine learning enables computers to learn from data."
}'Response:
{
"message": "Document indexed successfully",
"chunks_stored": 1
}POST /upload
Upload a document file.
# Upload a text file
curl -X POST http://localhost:7860/upload \
-F "file=@document.txt"
# Upload a PDF file
curl -X POST http://localhost:7860/upload \
-F "file=@document.pdf"
# Upload with text form data
curl -X POST http://localhost:7860/upload \
-F "text=Your document content here"Supported File Types:
.txt- Plain text.pdf- PDF documents.docx- Microsoft Word documents
POST /query
Ask a question about uploaded documents.
curl -X POST http://localhost:7860/query \
-H "Content-Type: application/json" \
-d '{
"question": "What is artificial intelligence?"
}'Response:
{
"answer": "Artificial Intelligence is a field that focuses on creating systems capable of learning and making decisions...",
"context": [
"Artificial Intelligence is transforming the world.",
"Machine learning enables computers to learn from data."
]
}GET /query/stats
Get information about indexed documents.
curl http://localhost:7860/query/statsResponse:
{
"total_chunks": 15,
"status": "ready"
}DELETE /query/clear
Remove all indexed documents.
curl -X DELETE http://localhost:7860/query/clearResponse:
{
"message": "Database cleared successfully",
"total_chunks": 0
}# Make sure the server is running first
uvicorn src.main:app --reload
# In another terminal, run tests
python tests/test_api.pySee tests/README.md for more details on testing.
import requests
# Base URL
BASE_URL = "http://localhost:7860"
# 1. Check health
response = requests.get(f"{BASE_URL}/health")
print(response.json())
# 2. Upload text
response = requests.post(
f"{BASE_URL}/upload/text",
json={
"text": "Python is a high-level programming language. It is widely used for web development, data science, and AI."
}
)
print(response.json())
# 3. Query
response = requests.post(
f"{BASE_URL}/query",
json={
"question": "What is Python used for?"
}
)
print(response.json())Try uploading the sample document:
curl -X POST http://localhost:7860/upload \
-F "file=@examples/sample_document.txt"See examples/README.md for more usage examples.
docker build -t ai-knowledge-api .docker run -p 7860:7860 \
-e HF_API_KEY=your_key \
-v $(pwd)/data:/app/data \
ai-knowledge-api- Create a new Space on Hugging Face
- Select Docker as SDK
- Push your code to the Space repository:
git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE
git push hf main- Add secrets in Space settings:
HF_API_KEY: Your Hugging Face API key
ai-knowledge-api/
βββ π Configuration Files (root)
β βββ requirements.txt # Python dependencies
β βββ Dockerfile # Docker configuration
β βββ docker-compose.yml # Docker Compose setup
β βββ .env.example # Environment variables template
β βββ .gitignore
β βββ LICENSE
β βββ README.md
β
βββ π» src/ # Application source code
β βββ __init__.py
β βββ README.md # Source code documentation
β βββ main.py # FastAPI application entry point
β β
β βββ routers/ # API route handlers
β β βββ __init__.py
β β βββ upload.py # Document upload endpoints
β β βββ query.py # Query endpoints
β β
β βββ services/ # Business logic services
β β βββ __init__.py
β β βββ embeddings.py # Embedding generation service
β β βββ rag_engine.py # RAG logic and vector database
β β βββ document_processor.py # Document text extraction
β β
β βββ models/ # Data models
β βββ __init__.py
β βββ schemas.py # Pydantic models
β
βββ πΎ data/ # Application data
β βββ vectors/ # ChromaDB storage (persisted)
β
βββ π docs/ # Documentation files
β βββ README.md # Documentation overview
β βββ ARCHITECTURE.md # System architecture details
β βββ CONTRIBUTING.md # Contribution guidelines
β βββ SECURITY.md # Security best practices
β
βββ π§ scripts/ # Setup and utility scripts
β βββ README.md # Scripts documentation
β βββ setup.sh # Linux/Mac setup script
β βββ setup.bat # Windows setup script
β
βββ π§ͺ tests/ # Test files
β βββ README.md # Testing documentation
β βββ test_api.py # API endpoint tests
β
βββ π examples/ # Sample files and usage examples
β βββ README.md # Examples documentation
β βββ sample_document.txt # Sample document for testing
β
βββ π logs/ # Application logs (git-ignored)
βββ api.log # Server logs
| Variable | Description | Default |
|---|---|---|
HF_API_KEY |
Hugging Face API key | - |
DB_DIR |
Vector database directory | ./data/vectors |
MODEL_EMBEDDING |
Embedding model | sentence-transformers/all-MiniLM-L6-v2 |
MODEL_LLM |
Language model for Q&A | microsoft/phi-2 |
You can change the models in .env:
Embedding Models (smaller = faster, larger = more accurate):
sentence-transformers/all-MiniLM-L6-v2(default, 80MB)sentence-transformers/all-mpnet-base-v2(better quality, 420MB)
LLM Models:
microsoft/phi-2(default, works without GPU)mistralai/Mistral-7B-Instruct-v0.1(better quality, requires GPU)
- File Upload Limits: Implement file size limits in production
- Rate Limiting: Add rate limiting for API endpoints
- Authentication: Add API key authentication for production use
- Input Sanitization: Already implemented for file types
- CORS: Configure specific origins in production
Solution: Use smaller models or increase system RAM.
MODEL_LLM=microsoft/phi-2Solution:
- Models are loaded lazily on first use (expected initial delay)
- Consider using GPU acceleration
- Reduce
top_kin retrieval
Solution: Reinstall dependencies
pip install --upgrade -r requirements.txt- Use GPU: Install PyTorch with CUDA for faster inference
- Batch Processing: Upload multiple documents at once
- Caching: Models are cached after first load
- Quantization: Use quantized models for lower memory usage
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is licensed under the MIT License.
- FastAPI - Modern web framework
- LangChain - Document processing utilities
- Hugging Face - Pre-trained models
- ChromaDB - Vector database
- Sentence Transformers - Embedding models
For issues and questions:
- Open an issue on GitHub
- Check the FastAPI documentation
- Visit Hugging Face documentation
Built with β€οΈ using FastAPI and Hugging Face