🔍 Literature Survey Engine

An advanced academic literature analysis system that integrates with the Semantic Scholar API to fetch, process, analyze, and visualize research papers and their relationships. The system provides paper recommendations, calculates author metrics, and offers interactive visualizations through a Streamlit dashboard.

🌟 Key Features

Paper Analysis: Fetch and analyze academic papers with their metadata
Author Metrics: Calculate and track author h-indices and citation counts
Paper Recommendations: Get intelligent paper recommendations based on topic similarity
Interactive Dashboard: Visualize paper relationships and metrics through Streamlit
Robust Data Management: MySQL-based storage with proper indexing and relationships
Docker Integration: Containerized database setup for easy deployment

An enhanced version of the automated literature survey tool, building upon VirtualPatientEngine's original project. This version adds advanced features including interactive visualization, comprehensive author metrics, and enhanced paper recommendations.

🌟 New Features

All original features from VirtualPatientEngine's project:

Automated literature surveys using Semantic Scholar's Recommendation API
Zotero integration for reference management
Weekly automatic updates
Category-based and single-paper recommendations

Plus new advanced features:

Interactive Dashboard: Visualize paper relationships and metrics through Streamlit
Enhanced Author Metrics: Advanced h-index calculations and citation analysis
Comprehensive Database: MySQL-based storage with proper indexing
Advanced Paper Processing: Improved recommendation algorithms
Docker Integration: Containerized setup for easy deployment

🏗️ System Architecture

Component Overview

Literature Survey System
├── Data Collection Layer (Semantic Scholar API Integration)
├── Data Processing Layer (Paper & Author Analysis)
├── Storage Layer (MySQL Database)
└── Visualization Layer (Streamlit Dashboard)

Core Components

Article & Author classes: Advanced data modeling
DataFetcher: Enhanced API interactions
DatabaseManager: Robust MySQL operations
StreamlitDashboard: Interactive visualizations
Topic: Sophisticated paper organization

📋 Prerequisites

Python 3.9+
Docker and Docker Compose
MySQL 8.0+
Semantic Scholar API access
Git
Zotero account (optional)

🚀 Getting Started

For Non-Developers

Click Use this template to create your repository
Prepare query.csv with your topics and papers
Upload to the data folder
Wait for automated deployment
Configure GitHub Pages for website access

For Developers

Clone the Repository

git clone https://github.com/ansh-info/literatureSurvey.git
cd literatureSurvey

Set Up Environment

python -m venv env
source env/bin/activate  # On Windows: .\env\Scripts\activate
pip install -r requirements.txt

Configure Database
```
cp .env.example .env
./manage.sh start
```

Run the Application

python code/literature_fetch_recommendation_api.py

# Run the Application
streamlit run app/app.py

Initialize Database
- The schema will be automatically initialized using 01_schema.sql
- Verify database creation:
```
docker exec -it scholar_db mysql -u scholar_user -p scholar_db
```

📊 Features in Detail

Paper Processing Pipeline

CSV input processing (paper URLs and topics)
Semantic Scholar API data fetching
Author metrics calculation
Paper relationships analysis
Recommendation generation
Data storage and indexing
Visualization generation

Metric Calculations

H-index: Weighted average of author h-indices
Citation Impact: Analysis of paper citation counts
Topic Relevance: Based on paper relationships
Author Influence: Combination of citations and h-index

Data Visualization

Paper relationship networks
Citation trends over time
Author collaboration networks
Topic-based paper clustering
Interactive metric dashboards

API Configuration

Configure Semantic Scholar API settings in utils.py:

FIELDS = "paperId,url,authors,journal,title,..."

🛡️ Error Handling

The system implements robust error handling:

API rate limiting management
Database connection retry logic
Data integrity checks
Transaction management
Comprehensive error logging

📝 Usage

Data Input
- Prepare CSV file with paper URLs and topics
- Configure processing parameters
- Run data collection script
Analysis
- Access dashboard at http://localhost:8501
- Select topics to analyze
- View paper relationships and metrics
- Export analysis results

Maintenance

# Update paper metrics
python update_h_indices.py

# Manage database

🚧 Future Improvements

Advanced caching mechanism
Enhanced recommendation algorithms
Citation network visualization
Batch processing optimization
Extended API integration
Advanced search functionality
Export capabilities
Dark/light theme toggle
AI-based recommendations for individual papers
Recommendations based on paper categories and relationships

💾 Database Schema

Tables:
- topics (id, name)
- papers (id, title, abstract, url, h_index, etc.)
- authors (id, name, h_index, citation_count)
- paper_authors (paper_id, author_id, order)
- topic_papers (topic_id, paper_id, type, use_for_recommendation)
- paper_recommendations (source_id, recommended_id, order)

🔧 Configuration

Environment Variables

MYSQL_DATABASE=scholar_db
MYSQL_USER=scholar_user
MYSQL_PASSWORD=scholar_pass
MYSQL_ROOT_PASSWORD=rootpass
MYSQL_PORT=3306
ZOTERO_API_KEY=your_api_key
LIBRARY_ID=your_library_id
TEST_COLLECTION_KEY=your_collection_key

📊 Features in Detail

Interactive paper relationship networks
Citation trend analysis
Author collaboration visualization
Topic-based clustering
Advanced metric calculations
Automated weekly updates

🤝 Contributing

Fork the repository
Create feature branch (git checkout -b feature/AmazingFeature)
Commit changes (git commit -m 'Add AmazingFeature')
Push to branch (git push origin feature/AmazingFeature)
Open Pull Request

🐞 Bugs and Feature Requests

Please report bugs and request features via GitHub Issues.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

VirtualPatientEngine for the original literature survey tool
Semantic Scholar API
Streamlit
MySQL
Zotero
MkDocs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

🔍 Literature Survey Engine

🌟 Key Features

🌟 New Features

🏗️ System Architecture

Component Overview

Core Components

📋 Prerequisites

🚀 Getting Started

For Non-Developers

For Developers

📊 Features in Detail

Paper Processing Pipeline

Metric Calculations

Data Visualization

API Configuration

🛡️ Error Handling

📝 Usage

🚧 Future Improvements

💾 Database Schema

🔧 Configuration

Environment Variables

📊 Features in Detail

🤝 Contributing

🐞 Bugs and Feature Requests

📄 License

🙏 Acknowledgments

Files

README.md

Latest commit

History

README.md

File metadata and controls

🔍 Literature Survey Engine

🌟 Key Features

🌟 New Features

🏗️ System Architecture

Component Overview

Core Components

📋 Prerequisites

🚀 Getting Started

For Non-Developers

For Developers

📊 Features in Detail

Paper Processing Pipeline

Metric Calculations

Data Visualization

API Configuration

🛡️ Error Handling

📝 Usage

🚧 Future Improvements

💾 Database Schema

🔧 Configuration

Environment Variables

📊 Features in Detail

🤝 Contributing

🐞 Bugs and Feature Requests

📄 License

🙏 Acknowledgments