The challenges are here.
All data are from the Titanic disaster (it reminds you Kaggle ?)
Scrapy works only with Python 2.7.
Please install Python 2.7, and not Python 3.x!
On Ubuntu 16:
sudo apt-get install python-dev python-pip libssl-dev libxml2-dev libxslt1-dev libffi-dev
On Windows:
Download and install Anaconda Distribution for Python 2.7.
On Mac OS X:
brew install python
sudo pip install scrapy scrapoxy shub
git clone https://github.com/fabienvauchelles/scraping-challenge-workshop.git
cd scraping-challenge-workshop
Scraper code is inside the file myscraper/spiders/myscraper.py
.
Items are inside the file myscraper/items.py
.
cd scraping-challenge-workshop
scrapy crawl myscraper -t jsonlines -o persons.json
Exports items are inside the file persons.json
.
See the Licence.