scrapy-poet
is the web-poet Page Object pattern implementation for Scrapy.
scrapy-poet
allows to write spiders where extraction logic is separated from the crawling one.
With scrapy-poet
is possible to make a single spider that supports many sites with
different layouts.
Requires Python 3.9+ and Scrapy >= 2.6.0.
Read the documentation for more information.
License is BSD 3-clause.
- Documentation: https://scrapy-poet.readthedocs.io
- Source code: https://github.com/scrapinghub/scrapy-poet
- Issue tracker: https://github.com/scrapinghub/scrapy-poet/issues
pip install scrapy-poet
Add the following inside Scrapy's settings.py
file:
Scrapy ≥ 2.10:
ADDONS = { "scrapy_poet.Addon": 300, }
Scrapy < 2.10:
DOWNLOADER_MIDDLEWARES = { "scrapy_poet.InjectionMiddleware": 543, "scrapy.downloadermiddlewares.stats.DownloaderStats": None, "scrapy_poet.DownloaderStatsMiddleware": 850, } REQUEST_FINGERPRINTER_CLASS = "scrapy_poet.ScrapyPoetRequestFingerprinter" SPIDER_MIDDLEWARES = { "scrapy_poet.RetryMiddleware": 275, }
Setup your local Python environment via:
- pip install -r requirements-dev.txt
- pre-commit install
Now everytime you perform a git commit, these tools will run against the staged files:
- black
- isort
- flake8
You can also directly invoke pre-commit run --all-files or tox -e linters to run them without performing a commit.