Skip to content

A basic web crawler to search through google scholar and filter scientific paper according to some criterium

Notifications You must be signed in to change notification settings

T98G/scientific-papers-finder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 

Repository files navigation

scholar-crawler

This Script performes a refined search on google scholar for a query provided by the user

This script depends on the following packages :

  • selenium
  • scholarly
  • tqdm
  • re
  • requests
  • bs4
  • argparse
  • it also depends on Google Chrome web browser

This script takes the following command line arguments:

  • -q "A string with the phrase the user wants to search for, it must be delimited by quote marks"
  • -kw "key words to filter the search, they must be separated by spaces and delimited by quote marks"
  • -n "The maximum number of publications in the google scholar results to search through"
  • -p "the time in years to refine the search, it must be written as year-year delimited by quote marks" (optional)
  • -d "the website domains to take results from" (optional) #but really useful
  • -c "The minimum number of citations" (optional)
  • -o "the output file name to store the url for the filtered urls"

This script can be run by the following command as an example

python3 crawler.py -q "query" -kw "keywords" -n 1 -p "2010-2022" -c 2 -d "acs nature science direct rcs mdpi" -o test.csv

Please note:

** Google restricts the number of requests from automatized scripts to 50000 per day, which is not much so the user might want to limit the maximum number of publications to search for, as each publication corresponds to multiple requests

Have Fun !

About

A basic web crawler to search through google scholar and filter scientific paper according to some criterium

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages