A powerful search comparison tool that aggregates and analyzes results from multiple academic search engines, with advanced query understanding and result comparison capabilities.
Complete documentation suite for all stakeholders:
- ONBOARDING.md - START HERE - Complete setup and onboarding guide
- DOCUMENTATION.md - Detailed user guides and workflows
- API_REFERENCE.md - Complete API documentation with examples
- MAINTENANCE_GUIDE.md - Operations, troubleshooting, and monitoring
- QUICK_REFERENCE.md - Quick reference for common tasks
- π¬ Research Users: Start with ONBOARDING.md β User Guide in DOCUMENTATION.md
- βοΈ SciX Engineers: ONBOARDING.md β Developer Guide in DOCUMENTATION.md β MAINTENANCE_GUIDE.md
- π§ͺ Research Scientists: ONBOARDING.md β Scientist Guide in DOCUMENTATION.md
- π Operations Teams: MAINTENANCE_GUIDE.md for deployment and monitoring
- Architecture Overview: System design and component interactions
- API Integration: Complete REST API documentation with examples
- LLM Configuration: Query intent and natural language processing
- Performance Tuning: Caching, monitoring, and optimization
# Start both frontend and backend (with logging)
./startup_with_logs.sh
# Stop servers when done
./stop_servers.sh
# Access the application
# Frontend: http://localhost:3001
# Backend API: http://localhost:8001
# API Docs: http://localhost:8001/api/docsThis tool provides a unified interface for searching across multiple academic search engines, including:
- NASA Astrophysics Data System (ADS)
- Google Scholar
- Semantic Scholar
- Web of Science
It features intelligent query understanding, result comparison, and caching mechanisms to provide efficient and relevant search results.
- Coordinates search operations across different search engines
- Handles fallback mechanisms when primary search methods fail
- Computes similarity metrics between results from different sources
- Provides paper detail retrieval across multiple sources
- Interprets and transforms user queries using LLM-based intent detection
- Components:
service.py: Main query intent service implementationllm_service.py: Handles LLM interactions for query understandingcache_service.py: Caches query transformations and resultsdocumentation_service.py: Provides documentation for query transformations
- Manages interactions with lightweight open-source LLMs
- Supports multiple providers (Ollama, HuggingFace, OpenAI)
- Handles prompt formatting and response processing
- Implements query transformation logic
- Handles interactions with NASA's Astrophysics Data System
- Manages API authentication and request formatting
- Processes ADS-specific search results
- Interfaces with Google Scholar
- Implements fallback mechanisms using Scholarly or direct HTML scraping
- Handles proxy management for rate limiting
- Manages interactions with Semantic Scholar API
- Processes academic paper metadata and citations
- Interfaces with Web of Science API
- Handles authentication and result processing
- Implements LRU (Least Recently Used) caching with TTL support
- Caches query transformations and search results
- Improves performance and reduces redundant processing
- Applies various boost factors to search results
- Considers citation count, publication recency, and document type
- Enhances result relevance and ranking
- Integrates with Quepid API for search evaluation
- Manages cases, judgments, and result evaluation
- Provides search quality metrics
backend/
βββ app/
β βββ api/ # API models and endpoints
β βββ core/ # Core configuration and settings
β βββ routes/ # API route definitions
β βββ services/ # Service implementations
β β βββ llm/ # LLM-related services
β β βββ query_intent/ # Query intent services
β βββ utils/ # Utility functions
βββ tests/ # Test suite
βββ scripts/ # Utility scripts
-
Unified Search Interface
- Single interface for multiple academic search engines
- Consistent result formatting across sources
- Fallback mechanisms for reliability
-
Intelligent Query Understanding
- LLM-based query intent detection
- Query transformation for improved results
- Support for astronomy-specific terminology
-
Result Comparison
- Similarity metrics between results
- Citation count analysis
- Publication date comparison
- Document type analysis
- Frontend Title Overlap Calculation: Client-side title matching with normalization for accurate overlap counts
-
Performance Optimization
- LRU caching with TTL
- Query transformation caching
- Result caching
- Proxy management for rate limiting
-
Search Quality Evaluation
- Integration with Quepid for search evaluation
- Result ranking analysis
- Quality metrics computation
The application uses environment variables for configuration. Key settings include:
ADS_API_KEY: NASA ADS API keyLLM_PROVIDER: LLM service provider (ollama, huggingface, openai) - defaults to ollamaLLM_MODEL_NAME: Model name for the LLM service - defaults to phi:2.7bLLM_TEMPERATURE: Temperature setting for LLM generationLLM_MAX_TOKENS: Maximum tokens for LLM generation
Note: Ollama must be installed and running. The ./startup_with_logs.sh script automatically configures all services.
CACHE_TTL: Cache time-to-live in secondsCACHE_MAX_SIZE: Maximum cache size
- Python 3.8+
- Node.js and npm/pnpm
- API keys for required services
- Ollama installed and running (
ollama serve)
- Clone the repository
- Install dependencies:
pip install -r requirements.txt cd frontend && npm install
- Set up environment variables
- Run the application:
./startup_with_logs.sh
Run tests using pytest:
pytest backend/tests/- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.