A full-stack AI-powered research assistant that accepts voice queries, searches academic papers, generates summaries, and persists conversations to Notion. Built with Whisper (ASR), Llama 3.2 (LLM), HuggingFace Transformers (Summarization), and CosyVoice (TTS).
- π€ Multi-Modal Input: Voice (audio) or text queries
- π Academic Search: Semantic search over ArXiv papers
- π Auto-Summarization: Condense research papers with HuggingFace transformers
- π¬ Conversational Context: Follow-up questions with session memory
- π Notion Integration: Persist conversations and summaries to Notion database
- π Text-to-Speech: Multiple TTS backends (system, pyttsx3, CosyVoice)
- π₯οΈ Hardware Agnostic: Auto-detects CUDA/MPS/CPU and optimizes accordingly
- π― Function Calling: LLM intelligently routes to tools (search, summarize, etc.)
- β‘ RESTful API: FastAPI backend with OpenAPI documentation
- π¨ Interactive UI: Streamlit frontend with audio I/O
- Demo
- Architecture
- Installation
- Quick Start
- Usage
- API Documentation
- Configuration
- Testing
- Project Structure
- Contributing
- License
User (voice): "What is quantum entanglement?"
β [Whisper transcribes]
Assistant: Searching ArXiv...
β [Finds 3 relevant papers]
β [Summarizes findings]
Assistant (voice): "Quantum entanglement is a phenomenon where particles..."
β [Saves to Notion]
β
Conversation synced to Notion
User: "Tell me more about the second paper"
β [Uses session context to understand "second paper"]
Assistant: "The second paper, 'Quantum Teleportation...', discusses..."
See ARCHITECTURE.md for detailed system design and workflow diagrams.
User Interface (Streamlit)
β
FastAPI Backend
β
βββββ΄ββββ
β β
LLM Session Manager
β
Function Router
β
βββββ΄βββββββββ¬βββββββ
β β β
Search Summarize Notion
(ArXiv) (HF) (API)
- ASR: Whisper (base/large-v3) for speech-to-text
- LLM: Llama 3.2 via Ollama for query understanding and function calling
- Search: ArXiv API for academic paper retrieval
- Summarization: HuggingFace BART for text condensation
- TTS: System (macOS 'say'), pyttsx3, or CosyVoice for speech synthesis
- Persistence: Notion API for conversation storage
- Frontend: Streamlit with audio input/output
- Backend: FastAPI with async endpoints
- macOS M3/M4 (development) or Linux with NVIDIA GPU (production)
- Python 3.10
- Conda/Miniconda
- Ollama (for LLM)
# 1. Clone repository
git clone https://github.com/christinezhaogmail/ai-research-assistant.git
cd ai-research-assistant
# 2. Create conda environment
conda env create -f requirements/env_mac.yml
conda activate ai-research-assistant-mac
# 3. Install Ollama
brew install ollama
ollama serve &
ollama pull llama3.2
# 4. Configure environment
cp .env.example .env
# Edit .env with your settings
# 5. Run tests
python test/test_all.py
# 6. Start application
python backend.py &
streamlit run frontend.pyFor detailed installation instructions, see INSTALLATION.md.
python backend.pyThe API will be available at:
- API: http://localhost:8000
- Docs: http://localhost:8000/docs (interactive Swagger UI)
- Health: http://localhost:8000/health
streamlit run frontend.pyThe UI will be available at: http://localhost:8501
- Open http://localhost:8501
- Click "Record your question" to use voice input
- Or type your question in the text box
- View response with audio playback
- Click "Sync to Notion" to save conversation
# Text query
curl -X POST http://localhost:8000/ask \
-F "text=What is quantum entanglement?"
# With session ID
curl -X POST http://localhost:8000/ask \
-F "text=Tell me more" \
-F "session_id=abc-123-def"
# Sync to Notion
curl -X POST http://localhost:8000/notion-sync \
-F "session_id=abc-123-def"
# Check status
curl http://localhost:8000/status- Enable voice mode in the sidebar
- Select TTS backend: system (fastest), pyttsx3, or cosyvoice
- Click "Record your question"
- Speak your query
- Wait for transcription (Whisper)
- View response with audio playback
- Type your question in the chat input
- Press Enter
- View response with details (function calls, timing)
The assistant maintains conversation context:
Query 1: "What is quantum entanglement?"
β Returns 3 papers
Query 2: "Tell me more about the second paper"
β Uses context to understand "second paper"
Query 3: "What about applications?"
β Continues the conversation thread
- Set up Notion integration (see Configuration)
- Have a conversation
- Click "Sync to Notion" or call
/notion-syncendpoint - View in Notion with summary and full transcript
Returns system health and available services.
{
"status": "healthy",
"services": {
"llm": "ollama/llama3.2",
"stt": "whisper",
"tts": "system",
"tools": ["search_arxiv", "summarize"],
"notion_sync": true
}
}Returns session status and information.
{
"status": "healthy",
"session": {
"session_id": "abc-123-def",
"query_count": 3,
"message_count": 6,
"created_at": "2024-01-17T10:00:00"
}
}Main research assistant endpoint.
Request:
curl -X POST http://localhost:8000/ask \
-F "text=What is quantum entanglement?" \
-F "session_id=abc-123" \
-F "include_summary=true"Response:
{
"success": true,
"session_id": "abc-123-def",
"query_text": "What is quantum entanglement?",
"response_text": "Found 3 papers on quantum entanglement...",
"summary": "Quantum entanglement is a phenomenon...",
"is_function_call": true,
"function_name": "search_arxiv",
"function_args": {"query": "quantum entanglement", "limit": 3},
"processing_time": 2.5,
"query_count": 1
}Sync conversation to Notion.
Request:
curl -X POST http://localhost:8000/notion-sync \
-F "session_id=abc-123-def" \
-F "include_summary=true"Response:
{
"success": true,
"session_id": "abc-123-def",
"notion_url": "https://notion.so/page-xyz",
"message": "Session synced successfully"
}For complete API documentation, visit http://localhost:8000/docs when the backend is running.
Create a .env file in the project root:
# LLM Configuration
OLLAMA_BASE_URL=http://localhost:11434
LLM_MODEL=llama3.2
LLM_TEMPERATURE=0.7
# ASR Configuration
WHISPER_MODEL=base # or large-v3 on GPU
# TTS Configuration
TTS_BACKEND=system # or pyttsx3, cosyvoice
COSYVOICE_PATH=/home/jovyan/CosyVoice
COSYVOICE_MODEL_DIR=/home/jovyan/CosyVoice/pretrained_models/CosyVoice-300M-SFT
# Notion Integration (Optional)
NOTION_TOKEN=ntn_xyz123...
NOTION_DATABASE_ID=abc123def456...
# ArXiv Search
ARXIV_MAX_RESULTS=3
# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
STREAMLIT_PORT=8501
# Logging
LOG_LEVEL=INFO-
Create Integration:
- Go to https://www.notion.so/my-integrations
- Click "New integration"
- Copy the "Internal Integration Token"
-
Create Database:
- Create a new database in Notion
- Add properties:
Session ID(Text)Date(Date)Query Count(Number)
- Share database with your integration
-
Get Database ID:
- Open database in browser
- Copy ID from URL:
notion.so/workspace/DATABASE_ID?v=...
-
Set Environment Variables:
export NOTION_TOKEN="ntn_xyz..." export NOTION_DATABASE_ID="abc123..."
python test/test_all.py# Hardware detection
python test/test_hardware.py
# Session management
python test/test_session_manager.py
# Academic search
python test/test_search.py
# Summarization (downloads model first run)
python test/test_summarize.py
# Notion sync (requires credentials)
python test/test_notion.py
# API models
python test/test_api_models.pySee test/README.md for detailed testing documentation.
ai-research-assistant/
βββ backend.py # FastAPI backend entry point
βββ frontend.py # Streamlit frontend
βββ api.py # Pydantic data models
βββ config.py # Configuration management
βββ llm_service.py # LLM integration (Ollama)
βββ function_router.py # Function call routing
βββ agent_tools.py # LangChain tools (search_arxiv, summarize)
βββ audio_service.py # Legacy audio services
β
βββ models/ # AI Model layer
β βββ __init__.py
β βββ asr.py # VoiceTranscriber (Whisper)
β βββ tts.py # VoiceSynthesizer (CosyVoice/system)
β
βββ tools/ # Tool layer
β βββ __init__.py
β βββ search.py # AcademicSearch (ArXiv)
β βββ summarize.py # ContentSummarizer (HuggingFace)
β βββ notion.py # NotionSync (Notion API)
β
βββ utils/ # Utility layer
β βββ __init__.py
β βββ hardware.py # Hardware detection (CUDA/MPS/CPU)
β βββ logger.py # Logging and tool call wrapping
β βββ session_manager.py # Session and conversation management
β
βββ test/ # Test suite
β βββ README.md
β βββ test_all.py
β βββ test_hardware.py
β βββ test_session_manager.py
β βββ test_search.py
β βββ test_summarize.py
β βββ test_notion.py
β βββ test_api_models.py
β
βββ logs/ # Application logs
βββ .env # Environment variables (not in git)
βββ .env.example # Example environment file
βββ requirements.txt # Python dependencies
βββ requirements/env_mac.yml # Conda environment (macOS)
βββ requirements/env_server.yml # Conda environment (GPU server)
βββ ARCHITECTURE.md # System architecture documentation
βββ INSTALLATION.md # Detailed installation guide
βββ README.md # This file
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests before committing
python test/test_all.py
# Format code
black .
isort .This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Whisper for speech recognition
- Meta Llama for language understanding
- HuggingFace for summarization models
- ArXiv for academic paper access
- Notion for knowledge management
- FastAPI for API framework
- Streamlit for web interface
For issues, questions, or suggestions:
- π§ Email: christine.hiaiperf@gmail.com
- π Issues: GitHub Issues
- π Documentation: Full docs
- Multi-language support
- PubMed and Semantic Scholar integration
- Vector database for semantic caching
- Voice cloning with reference audio
- Mobile app (React Native)
- Docker containerization
- Cloud deployment templates (AWS, GCP, Azure)
- Citation management integration (Zotero, Mendeley)
Built with β€οΈ for researchers and AI enthusiasts