Kenya Law gazette scraper built on Scrapy
- Clone repo and cd into it
- Make virtual environment
- pip install -r requirements.txt
- Set ENV variables
SCRAPY_AWS_ACCESS_KEY_ID
- Get this from AWSSCARPY_AWS_SECRET_ACCESS_KEY
- Get this from AWSSCRAPY_FEED_URI=s3://name-of-bucket-here/gazettes/data.jsonlines
- Where you want thejsonlines
output for crawls to be saved. This can also be a local locationSCRAPY_FILES_STORE=s3://name-of-bucket-here/gazettes
- Where you want scraped gazettes to be stored. This can also be a local location
Deploying to Scraping Hub
It is recommended that you deploy your crawler to scrapinghub for easy management. Follow these steps to do this:
- Sign up for free scraping hub account here
- Install shub locally using
pip install shub
. Further instructions here shub login
shub deploy
Note that on scraping hub, environment variables don't need the SCRAPY_
prefix
brew install berkeley-db
export YES_I_HAVE_THE_RIGHT_TO_USE_THIS_BERKELEY_DB_VERSION=1
BERKELEYDB_DIR=$(brew --cellar)/berkeley-db/6.2.23 pip install bsddb3
. Replace6.2.23
with the version of berkeley-db that you installedpip install scrapy-deltafetch