An advanced academic literature analysis system that integrates with the Semantic Scholar API to fetch, process, analyze, and visualize research papers and their relationships. The system provides paper recommendations, calculates author metrics, and offers interactive visualizations through a Streamlit dashboard.
- Paper Analysis: Fetch and analyze academic papers with their metadata
- Author Metrics: Calculate and track author h-indices and citation counts
- Paper Recommendations: Get intelligent paper recommendations based on topic similarity
- Interactive Dashboard: Visualize paper relationships and metrics through Streamlit
- Robust Data Management: MySQL-based storage with proper indexing and relationships
- Docker Integration: Containerized database setup for easy deployment
An enhanced version of the automated literature survey tool, building upon VirtualPatientEngine's original project. This version adds advanced features including interactive visualization, comprehensive author metrics, and enhanced paper recommendations.
All original features from VirtualPatientEngine's project:
- Automated literature surveys using Semantic Scholar's Recommendation API
- Zotero integration for reference management
- Weekly automatic updates
- Category-based and single-paper recommendations
Plus new advanced features:
- Interactive Dashboard: Visualize paper relationships and metrics through Streamlit
- Enhanced Author Metrics: Advanced h-index calculations and citation analysis
- Comprehensive Database: MySQL-based storage with proper indexing
- Advanced Paper Processing: Improved recommendation algorithms
- Docker Integration: Containerized setup for easy deployment
Literature Survey System
βββ Data Collection Layer (Semantic Scholar API Integration)
βββ Data Processing Layer (Paper & Author Analysis)
βββ Storage Layer (MySQL Database)
βββ Visualization Layer (Streamlit Dashboard)
Article
&Author
classes: Advanced data modelingDataFetcher
: Enhanced API interactionsDatabaseManager
: Robust MySQL operationsStreamlitDashboard
: Interactive visualizationsTopic
: Sophisticated paper organization
- Python 3.9+
- Docker and Docker Compose
- MySQL 8.0+
- Semantic Scholar API access
- Git
- Zotero account (optional)
- Click
Use this template
to create your repository - Prepare
query.csv
with your topics and papers - Upload to the
data
folder - Wait for automated deployment
- Configure GitHub Pages for website access
-
Clone the Repository
git clone https://github.com/ansh-info/literatureSurvey.git cd literatureSurvey
-
Set Up Environment
python -m venv env source env/bin/activate # On Windows: .\env\Scripts\activate pip install -r requirements.txt
-
Configure Database
cp .env.example .env ./manage.sh start
-
Run the Application
python code/literature_fetch_recommendation_api.py # Run the Application streamlit run app/app.py
-
Initialize Database
- The schema will be automatically initialized using
01_schema.sql
- Verify database creation:
docker exec -it scholar_db mysql -u scholar_user -p scholar_db
- The schema will be automatically initialized using
- CSV input processing (paper URLs and topics)
- Semantic Scholar API data fetching
- Author metrics calculation
- Paper relationships analysis
- Recommendation generation
- Data storage and indexing
- Visualization generation
- H-index: Weighted average of author h-indices
- Citation Impact: Analysis of paper citation counts
- Topic Relevance: Based on paper relationships
- Author Influence: Combination of citations and h-index
- Paper relationship networks
- Citation trends over time
- Author collaboration networks
- Topic-based paper clustering
- Interactive metric dashboards
Configure Semantic Scholar API settings in utils.py
:
FIELDS = "paperId,url,authors,journal,title,..."
The system implements robust error handling:
- API rate limiting management
- Database connection retry logic
- Data integrity checks
- Transaction management
- Comprehensive error logging
-
Data Input
- Prepare CSV file with paper URLs and topics
- Configure processing parameters
- Run data collection script
-
Analysis
- Access dashboard at
http://localhost:8501
- Select topics to analyze
- View paper relationships and metrics
- Export analysis results
- Access dashboard at
-
Maintenance
# Update paper metrics python update_h_indices.py # Manage database
- Advanced caching mechanism
- Enhanced recommendation algorithms
- Citation network visualization
- Batch processing optimization
- Extended API integration
- Advanced search functionality
- Export capabilities
- Dark/light theme toggle
- AI-based recommendations for individual papers
- Recommendations based on paper categories and relationships
Tables:
- topics (id, name)
- papers (id, title, abstract, url, h_index, etc.)
- authors (id, name, h_index, citation_count)
- paper_authors (paper_id, author_id, order)
- topic_papers (topic_id, paper_id, type, use_for_recommendation)
- paper_recommendations (source_id, recommended_id, order)
MYSQL_DATABASE=scholar_db
MYSQL_USER=scholar_user
MYSQL_PASSWORD=scholar_pass
MYSQL_ROOT_PASSWORD=rootpass
MYSQL_PORT=3306
ZOTERO_API_KEY=your_api_key
LIBRARY_ID=your_library_id
TEST_COLLECTION_KEY=your_collection_key
- Interactive paper relationship networks
- Citation trend analysis
- Author collaboration visualization
- Topic-based clustering
- Advanced metric calculations
- Automated weekly updates
- Fork the repository
- Create feature branch (
git checkout -b feature/AmazingFeature
) - Commit changes (
git commit -m 'Add AmazingFeature'
) - Push to branch (
git push origin feature/AmazingFeature
) - Open Pull Request
Please report bugs and request features via GitHub Issues.
This project is licensed under the MIT License - see the LICENSE file for details.
- VirtualPatientEngine for the original literature survey tool
- Semantic Scholar API
- Streamlit
- MySQL
- Zotero
- MkDocs