Publication Scraper: Arxiv and Google Scholar

This repository contains a web scraper for extracting the number of publications from Google Scholar and Arxiv separately. It focuses on the following topics: 'Deep Learning', 'Reinforcement Learning', 'Transfer Learning', and 'Causality'. The scraper retrieves the publication count for each topic per year and provides functionality to plot the data in graphs.

Usage

To use this scraper, follow the steps below:

Activate the virtual environment by running the following command:
```
source/bin/activate
```
Install the required dependencies by running the following command:
```
pip install -r requirements.txt
```
For more detailed information and instructions, please refer to the provided documentation in the doc.pdf file.
To run the scripts, you can use the following commands:
- To scrape Arxiv and plot the data:
```
python arxiv.py
```
- To scrape Google Scholar and plot the data:
```
python scholar.py
```
- To launch the CSV plotter GUI:
```
python plot.py
```

Program Files

This repository includes the following program files:

arxiv.py: This script scrapes Arxiv for publications, collects the data, and plots graphs based on the scraped data.
scholar.py: This script scrapes Google Scholar for publications, collects the data, and plots graphs based on the scraped data.
plot.py: This script is a PyQt5 application that allows users to select a CSV file and plots the data contained in the file.
1. select file -> open CSV.
2. select any csv (for ex open ./extracted_by_gautam/publication_counts_arxiv.csv).
3. output:

Scraped Data

The repository already includes pre-scraped data in the ./extracted_by_gautam directory for both Arxiv and Google Scholar:

Arxiv: The file ./extracted_by_gautam/publication_counts_arxiv.csv contains the publication counts for the specified topics from Arxiv.
Google Scholar: The file ./extracted_by_gautam/publication_counts.csv contains the publication counts for the specified topics from Google Scholar.

You can directly use the plot.py script on these CSV files to visualize the data in the form of line graphs.

Additional Details

This scraper is designed to retrieve the number of publications for specific topics from Arxiv and Google Scholar. It utilizes web scraping techniques to extract the required data. However, please note that web scraping may be subject to the terms and conditions of the websites being scraped. Ensure that you comply with the policies and guidelines of Arxiv and Google Scholar or any other platforms you scrape.

Please refer to the provided documentation ./doc.pdf or contact the repository owner for further information or assistance.

Contribution

If you would like to contribute to this project, feel free to fork the repository and submit a pull request with your improvements or additional features. Your contributions are greatly appreciated!

License

This project is licensed under the MIT License. Feel free to modify and use the code according to the terms of this license.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
bin		bin
extracted_by_gautam		extracted_by_gautam
lib/python3.9/site-packages		lib/python3.9/site-packages
share/python-wheels		share/python-wheels
.~lock.doc.odt#		.~lock.doc.odt#
LICENSE.txt		LICENSE.txt
README.md		README.md
arxiv.py		arxiv.py
doc.pdf		doc.pdf
lib64		lib64
plot.py		plot.py
pyvenv.cfg		pyvenv.cfg
requirements.txt		requirements.txt
scholar.py		scholar.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Publication Scraper: Arxiv and Google Scholar

Usage

Program Files

Scraped Data

Additional Details

Contribution

License

About

Releases

Packages

Languages

License

gautam132002/ai-publication-per-year

Folders and files

Latest commit

History

Repository files navigation

Publication Scraper: Arxiv and Google Scholar

Usage

Program Files

Scraped Data

Additional Details

Contribution

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages