Skip to content

A FastAPI-based backend service that enables users to upload text-based documents and interact with their content through natural language queries using Retrieval-Augmented Generation (RAG)

License

Notifications You must be signed in to change notification settings

wescosta/ai-knowledge-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 AI Knowledge API

A FastAPI-based backend service that enables users to upload text-based documents and interact with their content through natural language queries using Retrieval-Augmented Generation (RAG).

✨ Features

  • πŸ“„ Document Upload: Support for PDF, DOCX, and TXT files
  • πŸ” Semantic Search: Uses Hugging Face embeddings for intelligent document retrieval
  • πŸ€– AI-Powered Q&A: Answers questions based on your uploaded documents
  • πŸ’Ύ Local Vector Storage: ChromaDB for efficient semantic search
  • πŸš€ Fast & Modern: Built with FastAPI for high performance

πŸ“š Documentation

πŸ“Œ Version

Current Version: 1.0.0

This project follows Semantic Versioning with single source of truth in the VERSION file.

Version Management

# Check current version
python scripts/version.py

# Bump versions
python scripts/version.py patch   # 1.0.0 β†’ 1.0.1 (bug fixes)
python scripts/version.py minor   # 1.0.0 β†’ 1.1.0 (new features)
python scripts/version.py major   # 1.0.0 β†’ 2.0.0 (breaking changes)

# Set specific version
python scripts/version.py set 2.0.0

Single Source of Truth:

  • Version defined in: VERSION file
  • Auto-read by: src/__init__.py, pyproject.toml, FastAPI
  • Exposed via: GET /health endpoint
  • See: CHANGELOG.md for release history

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Client    β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  FastAPI    β”‚
β”‚  Endpoints  β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β–Ό              β–Ό              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Upload   β”‚ β”‚   Query    β”‚ β”‚   Health   β”‚
β”‚   Router   β”‚ β”‚   Router   β”‚ β”‚  Endpoint  β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚              β”‚
       β–Ό              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      RAG Engine            β”‚
β”‚  - Document Chunking       β”‚
β”‚  - Embedding Generation    β”‚
β”‚  - Semantic Retrieval      β”‚
β”‚  - LLM Answer Generation   β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β–Ό              β–Ό              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  ChromaDB  β”‚ β”‚ Hugging    β”‚ β”‚ Sentence   β”‚
β”‚   Vector   β”‚ β”‚   Face     β”‚ β”‚Transformersβ”‚
β”‚    Store   β”‚ β”‚    LLM     β”‚ β”‚ Embeddings β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“‹ Prerequisites

  • Python 3.9+
  • 4GB+ RAM recommended
  • (Optional) Hugging Face API key for larger models

πŸš€ Quick Start

Option 1: Automated Setup (Recommended)

Linux/Mac:

git clone <repo-url>
cd ai-knowledge-api
chmod +x scripts/setup.sh
./scripts/setup.sh

Windows:

git clone <repo-url>
cd ai-knowledge-api
scripts\setup.bat

Option 2: Manual Setup

1. Clone the Repository

git clone <repo-url>
cd ai-knowledge-api

2. Create Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Configure Environment

cp .env.example .env

Edit .env with your settings:

HF_API_KEY=huggingface_key_here
MODEL_EMBEDDING=sentence-transformers/all-MiniLM-L6-v2
MODEL_LLM=microsoft/phi-2

Note: See docs/SECURITY.md for best practices on managing API keys securely.

The only required variable is HF_API_KEY if you plan to use Hugging Face models that need authentication.

5. Run the API

uvicorn src.main:app --reload --host 0.0.0.0 --port 7860

The API will be available at:

πŸ“š API Endpoints

Health Check

GET /health

Check if the API is running.

curl http://localhost:7860/health

Response:

{
  "status": "ok",
  "message": "AI Knowledge API is running"
}

Upload Document (JSON)

POST /upload/text

Upload text content directly.

curl -X POST http://localhost:7860/upload/text \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Artificial Intelligence is transforming the world. Machine learning enables computers to learn from data."
  }'

Response:

{
  "message": "Document indexed successfully",
  "chunks_stored": 1
}

Upload Document (File)

POST /upload

Upload a document file.

# Upload a text file
curl -X POST http://localhost:7860/upload \
  -F "file=@document.txt"

# Upload a PDF file
curl -X POST http://localhost:7860/upload \
  -F "file=@document.pdf"

# Upload with text form data
curl -X POST http://localhost:7860/upload \
  -F "text=Your document content here"

Supported File Types:

  • .txt - Plain text
  • .pdf - PDF documents
  • .docx - Microsoft Word documents

Query Documents

POST /query

Ask a question about uploaded documents.

curl -X POST http://localhost:7860/query \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is artificial intelligence?"
  }'

Response:

{
  "answer": "Artificial Intelligence is a field that focuses on creating systems capable of learning and making decisions...",
  "context": [
    "Artificial Intelligence is transforming the world.",
    "Machine learning enables computers to learn from data."
  ]
}

Get Statistics

GET /query/stats

Get information about indexed documents.

curl http://localhost:7860/query/stats

Response:

{
  "total_chunks": 15,
  "status": "ready"
}

Clear Database

DELETE /query/clear

Remove all indexed documents.

curl -X DELETE http://localhost:7860/query/clear

Response:

{
  "message": "Database cleared successfully",
  "total_chunks": 0
}

πŸ§ͺ Testing

Run Automated Tests

# Make sure the server is running first
uvicorn src.main:app --reload

# In another terminal, run tests
python tests/test_api.py

See tests/README.md for more details on testing.

Manual Testing with Python

import requests

# Base URL
BASE_URL = "http://localhost:7860"

# 1. Check health
response = requests.get(f"{BASE_URL}/health")
print(response.json())

# 2. Upload text
response = requests.post(
    f"{BASE_URL}/upload/text",
    json={
        "text": "Python is a high-level programming language. It is widely used for web development, data science, and AI."
    }
)
print(response.json())

# 3. Query
response = requests.post(
    f"{BASE_URL}/query",
    json={
        "question": "What is Python used for?"
    }
)
print(response.json())

Using Example Files

Try uploading the sample document:

curl -X POST http://localhost:7860/upload \
  -F "file=@examples/sample_document.txt"

See examples/README.md for more usage examples.

🐳 Docker Deployment

Build Docker Image

docker build -t ai-knowledge-api .

Run Container

docker run -p 7860:7860 \
  -e HF_API_KEY=your_key \
  -v $(pwd)/data:/app/data \
  ai-knowledge-api

☁️ Hugging Face Spaces Deployment

  1. Create a new Space on Hugging Face
  2. Select Docker as SDK
  3. Push your code to the Space repository:
git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE
git push hf main
  1. Add secrets in Space settings:
    • HF_API_KEY: Your Hugging Face API key

πŸ“ Project Structure

ai-knowledge-api/
β”œβ”€β”€ πŸ“„ Configuration Files (root)
β”‚   β”œβ”€β”€ requirements.txt       # Python dependencies
β”‚   β”œβ”€β”€ Dockerfile             # Docker configuration
β”‚   β”œβ”€β”€ docker-compose.yml     # Docker Compose setup
β”‚   β”œβ”€β”€ .env.example           # Environment variables template
β”‚   β”œβ”€β”€ .gitignore
β”‚   β”œβ”€β”€ LICENSE
β”‚   └── README.md
β”‚
β”œβ”€β”€ πŸ’» src/                    # Application source code
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ README.md              # Source code documentation
β”‚   β”œβ”€β”€ main.py                # FastAPI application entry point
β”‚   β”‚
β”‚   β”œβ”€β”€ routers/               # API route handlers
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ upload.py          # Document upload endpoints
β”‚   β”‚   └── query.py           # Query endpoints
β”‚   β”‚
β”‚   β”œβ”€β”€ services/              # Business logic services
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ embeddings.py      # Embedding generation service
β”‚   β”‚   β”œβ”€β”€ rag_engine.py      # RAG logic and vector database
β”‚   β”‚   └── document_processor.py  # Document text extraction
β”‚   β”‚
β”‚   └── models/                # Data models
β”‚       β”œβ”€β”€ __init__.py
β”‚       └── schemas.py         # Pydantic models
β”‚
β”œβ”€β”€ πŸ’Ύ data/                   # Application data
β”‚   └── vectors/               # ChromaDB storage (persisted)
β”‚
β”œβ”€β”€ πŸ“š docs/                   # Documentation files
β”‚   β”œβ”€β”€ README.md              # Documentation overview
β”‚   β”œβ”€β”€ ARCHITECTURE.md        # System architecture details
β”‚   β”œβ”€β”€ CONTRIBUTING.md        # Contribution guidelines
β”‚   └── SECURITY.md            # Security best practices
β”‚
β”œβ”€β”€ πŸ”§ scripts/                # Setup and utility scripts
β”‚   β”œβ”€β”€ README.md              # Scripts documentation
β”‚   β”œβ”€β”€ setup.sh               # Linux/Mac setup script
β”‚   └── setup.bat              # Windows setup script
β”‚
β”œβ”€β”€ πŸ§ͺ tests/                  # Test files
β”‚   β”œβ”€β”€ README.md              # Testing documentation
β”‚   └── test_api.py            # API endpoint tests
β”‚
β”œβ”€β”€ πŸ“‹ examples/               # Sample files and usage examples
β”‚   β”œβ”€β”€ README.md              # Examples documentation
β”‚   └── sample_document.txt    # Sample document for testing
β”‚
└── πŸ“Š logs/                   # Application logs (git-ignored)
    └── api.log                # Server logs

πŸ”§ Configuration

Environment Variables

Variable Description Default
HF_API_KEY Hugging Face API key -
DB_DIR Vector database directory ./data/vectors
MODEL_EMBEDDING Embedding model sentence-transformers/all-MiniLM-L6-v2
MODEL_LLM Language model for Q&A microsoft/phi-2

Customizing Models

You can change the models in .env:

Embedding Models (smaller = faster, larger = more accurate):

  • sentence-transformers/all-MiniLM-L6-v2 (default, 80MB)
  • sentence-transformers/all-mpnet-base-v2 (better quality, 420MB)

LLM Models:

  • microsoft/phi-2 (default, works without GPU)
  • mistralai/Mistral-7B-Instruct-v0.1 (better quality, requires GPU)

πŸ”’ Security Considerations

  • File Upload Limits: Implement file size limits in production
  • Rate Limiting: Add rate limiting for API endpoints
  • Authentication: Add API key authentication for production use
  • Input Sanitization: Already implemented for file types
  • CORS: Configure specific origins in production

πŸ› Troubleshooting

Issue: Out of Memory

Solution: Use smaller models or increase system RAM.

MODEL_LLM=microsoft/phi-2

Issue: Slow Response Times

Solution:

  1. Models are loaded lazily on first use (expected initial delay)
  2. Consider using GPU acceleration
  3. Reduce top_k in retrieval

Issue: Import Errors

Solution: Reinstall dependencies

pip install --upgrade -r requirements.txt

πŸ“ˆ Performance Optimization

  1. Use GPU: Install PyTorch with CUDA for faster inference
  2. Batch Processing: Upload multiple documents at once
  3. Caching: Models are cached after first load
  4. Quantization: Use quantized models for lower memory usage

🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License.

πŸ™ Acknowledgments

  • FastAPI - Modern web framework
  • LangChain - Document processing utilities
  • Hugging Face - Pre-trained models
  • ChromaDB - Vector database
  • Sentence Transformers - Embedding models

πŸ“ž Support

For issues and questions:


Built with ❀️ using FastAPI and Hugging Face

About

A FastAPI-based backend service that enables users to upload text-based documents and interact with their content through natural language queries using Retrieval-Augmented Generation (RAG)

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published