Skip to content

Recommend new arxiv papers of your interest daily according to your Zotero libarary.

License

Notifications You must be signed in to change notification settings

TideDra/zotero-arxiv-daily

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

51 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

logo

Zotero-arXiv-Daily

Status Stars GitHub Issues GitHub Pull Requests License


Recommend new arxiv papers of your interest daily according to your Zotero libarary.

Important

Please keep an eye on this repo, and merge your forked repo in time when there is any update of this upstream, in order to enjoy new features and fix found bugs.

๐Ÿง About

Track new scientific researches of your interest by just forking (and staring) this repo!๐Ÿ˜Š

Zotero-arXiv-Daily finds arxiv papers that may attract you based on the context of your Zotero libarary, and then sends the result to your mailbox๐Ÿ“ฎ. It can be deployed as Github Action Workflow with zero cost, no installation, and few configuration of Github Action environment variables for daily automatic delivery.

โœจ Features

  • Totally free! All the calculation can be done in the Github Action runner locally within its quota (for public repo).
  • AI-generated TL;DR for you to quickly pick up target papers.
  • Links of PDF and code implementation (if any) presented in the e-mail.
  • List of papers sorted by relevance with your recent research interest.
  • Fast deployment via fork this repo and set environment variables in the Github Action Page.
  • Support LLM API for generating TL;DR of papers.
  • Ignore unwanted Zotero papers using gitignore-style pattern.

๐Ÿ“ท Screenshot

screenshot

๐Ÿš€ Usage

Quick Start

  1. Fork (and star๐Ÿ˜˜) this repo. fork

  2. Set Github Action environment variables. secrets

Below are all the secrets you need to set. They are invisible to anyone including you once they are set, for security.

Key Required Type Description Example
ZOTERO_ID โœ… str User ID of your Zotero account. Get your ID from here. 12345678
ZOTERO_KEY โœ… str An Zotero API key with read access. Get a key from here. AB5tZ877P2j7Sm2Mragq041H
ARXIV_QUERY โœ… str The search query for retrieving arxiv papers. Refer to the official document for details. The example queries papers about AI, CV, NLP, ML. Find the abbr of your research area from here. cat:cs.AI OR cat:cs.CV OR cat:cs.LG OR cat:cs.CL
SMTP_SERVER โœ… str The SMTP server that sends the email. I recommend to utilize a seldom-used email for this. Ask your email provider (Gmail, QQ, Outlook, ...) for its SMTP server smtp.qq.com
SMTP_PORT โœ… int The port of SMTP server. 465
SENDER โœ… str The email account of the SMTP server that sends you email. abc@qq.com
SENDER_PASSWORD โœ… str The password of the sender account. Note that it's not necessarily the password for logging in the e-mail client, but the authentication code for SMTP service. Ask your email provider for this. abcdefghijklmn
RECEIVER โœ… str The e-mail address that receives the paper list. abc@outlook.com
MAX_PAPER_NUM int The maximum number of the papers presented in the email. This value directly affects the execution time of this workflow, because it takes about 70s to generate TL;DR for one paper. -1 means to present all the papers retrieved. 50
SEND_EMPTY bool Whether to send an empty email even if no new papers today. False
USE_LLM_API bool Whether to use the LLM API in the cloud or to use local LLM. If set to 1, the API is used. Else if set to 0, the workflow will download and deploy an open-source LLM. Default to 0. 0
OPENAI_API_KEY str API Key when using the API to access LLMs. You can get FREE API for using advanced open source LLMs in SiliconFlow. sk-xxx
OPENAI_API_BASE str API URL when using the API to access LLMs. If not filled in, the default is the OpenAI URL. https://api.siliconflow.cn/v1
MODEL_NAME str Model name when using the API to access LLMs. If not filled in, the default is gpt-4o. Qwen/Qwen2.5-7B-Instruct is recommended when using SiliconFlow. Qwen/Qwen2.5-7B-Instruct

There are also some public variables (Repository Variables) you can set, which are easy to edit. vars

Key Required Type Description Example
ZOTERO_IGNORE str Gitignore-style patterns marking the Zotero collections that should be ignored. One rule one line. Learn more about gitignore. AI Agent/
**/survey
!LLM/survey
REPOSITORY str The repository that provides the workflow. If set, the value can only be TideDra/zotero-arxiv-daily, in which case, the workflow always pulls the latest code from this upstream repo, so that you don't need to sync your forked repo upon each update, unless the workflow file is changed. TideDra/zotero-arxiv-daily
REF str The specified ref of the workflow to run. Only valid when REPOSITORY is set to TideDra/zotero-arxiv-daily. Currently supported values include main for stable version, dev for development version which has new features and potential bugs. main

That's all! Now you can test the workflow by manually triggering it: test

Note

The Test-Workflow Action is the debug version of the main workflow (Send-emails-daily), which always retrieve 5 arxiv papers regardless of the date. While the main workflow will be automatically triggered everyday and retrieve new papers released yesterday. There is no new arxiv paper at weekends and holiday, in which case you may see "No new papers found" in the log of main workflow.

Then check the log and the receiver email after it finishes.

By default, the main workflow runs on 22:00 UTC everyday. You can change this time by editting the workflow config .github/workflows/main.yml.

Local Running

Supported by uv, this workflow can easily run on your local device if uv is installed:

# set all the environment variables
# export ZOTERO_ID=xxxx
# ...
cd zotero-arxiv-daily
uv run main.py

Important

The workflow will download and run an LLM (Qwen2.5-3B, the file size of which is about 3G). Make sure your network and hardware can handle it.

Warning

Other package managers like pip or conda are not tested. You can still use them to install this workflow because there is a pyproject.toml, while potential problems exist.

๐Ÿš€ Sync with the latest version

This project is in active development. You can subscribe this repo via Watch so that you can be notified once we publish new release.

Watch

๐Ÿ“– How it works

Zotero-arXiv-Daily firstly retrieves all the papers in your Zotero libarary and all the papers released in the previous day, via corresponding API. Then it calculates the embedding of each paper's abstract via an embedding model. The score of a paper is its weighted average similarity over all your Zotero papers (newer paper added to the libarary has higher weight).

The TLDR of each paper is generated by a lightweight LLM (Qwen2.5-3b-instruct-q4_k_m), given its title, abstract, introduction, and conclusion (if any). The introduction and conclusion are extracted from the source latex file of the paper.

๐Ÿ“Œ Limitations

  • The recommendation algorithm is very simple, it may not accurately reflect your interest. Welcome better ideas for improving the algorithm!
  • This workflow deploys an LLM on the cpu of Github Action runner, and it takes about 70s to generate a TLDR for one paper. High MAX_PAPER_NUM can lead the execution time exceed the limitation of Github Action runner (6h per execution for public repo, and 2000 mins per month for private repo). Commonly, the quota given to public repo is definitely enough for individual use. If you have special requirements, you can deploy the workflow in your own server, or use a self-hosted Github Action runner, or pay for the exceeded execution time.

๐Ÿ“ƒ License

Distributed under the AGPLv3 License. See LICENSE for detail.

โค๏ธ Acknowledgement

โ˜• Buy Me A Coffee

If you find this project helpful, welcome to sponsor me via WeChat or via ko-fi. wechat_qr

๐ŸŒŸ Star History

Star History Chart