Folders in this repo represent a webscraper that crawls a website to extract data in structured format from their pages. More details of the scrapers in their respective README.
- Jumia - flashsales
- Worldometer - countries population, countries covid data
- Myanimelist - Animes
- Chambers and partners - lawyers information
- Booking - hotels
- Quotes
Generally, the projects are built mainly with Scrapy. Scrapy is best installed in a virtual environment to prevent conflicts with other packages, Conda, venv or pipenv can be used. Any extra description for setting up the environment for the scrapers will be provided in the projects README and a requirements.txt for installing dependencies.
Using Conda
Creating Environment
conda create -n scraypa_env
Activating Environment
conda activate scraypa_env
Installing scrapy
pip install scrapy
You are welcome to join by opening a pull request describing what it is for. Issues with any scraper can also be posted and will be reviewed at my earliest convenience.