scrapy-googlescholar

Steps for Installation (on linux)

We first need to install python modules,Scrapyrt, Docker, Xampp.
In the terminal run following commands:
pip install scrapy –user
pip install scrapyrt --user
pip install scrapy-splash –user
pip install scraperapi-sdk –user
Docker installation differs on different type of OS in use. You may refer to this link on how to install docker on linux. https://docs.docker.com/engine/install/ubuntu/
Pulling splash image
sudo docker pull scrapinghub/splash
Copy topl_project from my project to a directory of your choosing.
For installing xampp, refer to this link https://vitux.com/how-to-install-xampp-on-your-ubuntu-18-04-lts-system/
Copy the ‘Website’ folder in my project files and paste it inside ‘htdocs’ folder which is present the xampp installation directory. It should be (/opt/lampp/htdocs/)
Create a Scraper API account if you don't want to get banned from google
Copy your API key and paste it in topl_project/topl_project/spiders/1.py lines 22 and 52
However, If you want to use it directly, you can remove scraper api module and replace 'client.scrapyGet(url)' to just 'url' in lines 23 and 53

Steps for executing the project

In the terminal run the following commands, each in a new instance of the terminal, as they all are process we need to run in background

sudo dockerd (start docker)
sudo docker run -it -p 8050:8050 --rm scrapinghub/splash (running splash on docker container)
sudo xampp start (starting xampp server)
Go to the directory of the spider (topl_project)and then run scrapyrt -p 3000
Open the browser, enter the follwing url: localhost/Website/main.html
After the page loads, enter the desired Google Scholar Id into the input box and click enter.
The page loads with the papers written by the user, along with name of the user, and user image.
On further click any one of the paper, we can see the citations for that paper (currently only 1st page of result is scraped for citations).

Provide feedback