Skip to content

Latest commit

 

History

History
97 lines (73 loc) · 3.3 KB

README.md

File metadata and controls

97 lines (73 loc) · 3.3 KB

WineTz Crawler

Status License

Simple web crawling tool to retrieve reviews from vivino.com website.

WineTz allows you to specify some parameters to filter the search on the vivino.com environment, retrieves the reviews, and produces the export in .csv dataset.

🍷 Table of Contents

🥣 Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Check in requirements.txt for prerequisites and any libraries to install.

pip3 install <module>

Installing

Download Repository from GitHub and enter in src folder

git clone https://github.com/Piltxi/Vivino-Crawler
cd src

🥂 Usage

📖 Options

WineTz shows all available features.

python3 crawler.py -h

WineTz deletes output information and recovered reviews. The /out directory will be deleted.

python3 crawler.py -r

WineTz allows you to specify filtering options for your search.

python3 crawler.py -s

WineTz will load the search parameters from a file in the input/ directory. If nothing is entered, WineTz loads the parameters from the /input/parameters.json file.
Important:
Format of the file to attach is that of the file automatically exported during scraping: /out/parameters.json

python3 crawler.py -f

WineTz prints additional information while running. This option is useful and recommended during debugging.

python3 crawler.py -v

WineTz acquires filtering parameters in a special way. This option must be specialized towards the target of interest. You can change WineTz search filters by using the src/command.py module. and production() function.

python3 crawler.py -p

⚖️ Start crawling

When WineTz starts its tasks, it prints the number of matches obtained through requests to the vivino.com API. Afterwards, you will see a progress bar describing the progress of the review retrieval.

WineTz creates an output folder /out. Inside /out create a directory for each exported dataset.
Inside the dataset directory, WineTz exports three .csv files: wines, style and reviews:

wine.csv contains information about wines
style.csv provides information on wine styles
reviews.csv the reviews of each wine.

Automatically, a fourth .json file is created: parameters.json.
This file contains the parameters used for scraping. By copying this file to the crawler/input/, you can scrape with the same search parameters.

Happy scraping!

⛏️ Built Using

👨🏻‍🔬 Authors

  • @piltxi excellent singer after some wine and amateur developer