A "Set-and-Forget" Autonomous Research Assistant.
I built this agent because I didn't want to miss critical papers in niche fields (e.g., Flow Chemistry & Nanomaterials). Traditional keyword alerts are too noisy, and manual searching is too slow.
This tool runs automatically (for me, every 15 days). It scans global scientific output, reads the abstracts using a Large Language Model (Google Gemini), and emails a curated digest of only the papers that actually matter.
The system follows a "Funnel Architecture" to balance cost, speed, and accuracy. Unlike fragile web scrapers that break when Google Scholar changes its layout, this system uses a robust, open data pipeline.
-
The Wide Net (OpenAlex API):
- Queries OpenAlex, a massive open catalog of over 250 million scholarly works.
- Uses broad boolean logic (e.g.,
"Flow Chemistry" OR "Microfluidics") to ensure zero relevant papers are missed. - Fetches papers from the last days to capture the latest output.
-
The Memory (SQLite):
- Checks a local database (
seen_papers.db) to ensure you never see the same paper twice, even if search windows overlap.
- Checks a local database (
-
The Brain (Gemini 2.5):
- Relevant abstracts are sent to Google's Gemini model.
- The AI acts as a "Reviewer," scoring the paper from 0-10 based on strict scientific criteria defined in the config (e.g., "Must be inorganic nanoparticles," "Reject biological reviews").
-
The Delivery (Gmail):
- If (and only if) high-scoring papers are found, it composes a rich HTML briefing and sends it to your inbox via the Gmail API.
Follow these steps to run the agent on your own machine for testing.
git clone [https://github.com/lorencig/Science-Assistant-Agent.git](https://github.com/lorencig/Science-Assistant-Agent.git)
cd Science-Assistant-Agentpip install -r requirements.txtOption A: For Local Testing Create a .env file in the root directory:
GEMINI_API_KEY="your_google_ai_studio_key"
EMAIL_RECIPIENT="your_email@gmail.com"
# Gmail OAuth Credentials
GMAIL_CLIENT_ID="your_client_id"
GMAIL_CLIENT_SECRET="your_client_secret"
GMAIL_REFRESH_TOKEN="your_refresh_token"Option B: For GitHub Actions (Production) To make this run automatically in the cloud, you must store these keys in GitHub's secure vault:
- Go to your repository on GitHub.
- Navigate to Settings > Secrets and variables > Actions.
- Click New repository secret.
- Add each of the 5 variables listed above (GEMINI_API_KEY, GMAIL_REFRESH_TOKEN, etc.) individually.
python main.pyThe logic for what papers get selected is entirely contained in config.py. The system is modular, allowing you to track multiple different research "Sessions" simultaneously.
To change what the agent looks for, edit the SESSIONS list:
{
"id": "my_new_topic",
"title": "🧬 Genomic Sequencing",
"api_filters": [
# Broad keywords for the OpenAlex API
'default.search:("genomics" OR "CRISPR")'
],
"system_prompt": """
ROLE: Senior Geneticist.
TASK: Score papers on novel CRISPR off-target detection methods.
SCORING: 10 for wet-lab validation, 0 for pure reviews.
"""
}This project is designed to run serverless using GitHub Actions. It runs entirely free of charge on the standard tier.
The workflow is defined in .github/workflows/daily_digest.yml and is scheduled on the 1st and 15th of every month at 07:00 UTC.
To deploy:
- Push your code to GitHub.
- Go to your repository Settings > Secrets and variables > Actions.
- Add the 5 secrets from your
.envfile (GEMINI_API_KEY,GMAIL_REFRESH_TOKEN, etc.). - The bot will automatically wake up on the next scheduled run.
To allow the script to send emails without storing your password (insecure) or requiring a browser login every time (breaks on servers), use OAuth 2.0 with a Refresh Token.
1. Google Cloud Console Setup:
- Create a new project in the Google Cloud Console.
- Enable the Gmail API.
- Create OAuth 2.0 Credentials (Type: Desktop App) and download the
credentials.jsonfile. - Crucial: Set the App Status to "Production" (Publish App) in the OAuth consent screen settings to prevent the token from expiring every 7 days.
2. Token Generation Script:
- Place
credentials.jsonin thetools/folder. - Run the helper script included in this repo:
python tools/get_gmail_token.py- This opens a browser window to authorize the app.
- The script will print a
GMAIL_REFRESH_TOKENto the terminal. Save this token, along with your Client ID and Secret, into your.envfile (locally) and GitHub Secrets (for production).
- OpenAlex: Used for bibliographic data. OpenAlex is a fully open catalog of the global research system.
- Google Gemini: Used for the reasoning and filtering engine.
***
### Next Step
Would you like me to generate the **Python code for the `tools/get_gmail_token.py` script** mentioned in the Appendix, so you can include it in the repository immediately?