Skip to content

scrapinghub/scrapy-poet

Repository files navigation

scrapy-poet

PyPI Version Supported Python Versions Build Status Coverage report Documentation Status

scrapy-poet is the web-poet Page Object pattern implementation for Scrapy. scrapy-poet allows to write spiders where extraction logic is separated from the crawling one. With scrapy-poet is possible to make a single spider that supports many sites with different layouts.

Requires Python 3.9+ and Scrapy >= 2.6.0.

Read the documentation for more information.

License is BSD 3-clause.

Quick Start

Installation

pip install scrapy-poet

Usage in a Scrapy Project

Add the following inside Scrapy's settings.py file:

  • Scrapy ≥ 2.10:

    ADDONS = {
        "scrapy_poet.Addon": 300,
    }
  • Scrapy < 2.10:

    DOWNLOADER_MIDDLEWARES = {
        "scrapy_poet.InjectionMiddleware": 543,
        "scrapy.downloadermiddlewares.stats.DownloaderStats": None,
        "scrapy_poet.DownloaderStatsMiddleware": 850,
    }
    REQUEST_FINGERPRINTER_CLASS = "scrapy_poet.ScrapyPoetRequestFingerprinter"
    SPIDER_MIDDLEWARES = {
        "scrapy_poet.RetryMiddleware": 275,
    }

Developing

Setup your local Python environment via:

  1. pip install -r requirements-dev.txt
  2. pre-commit install

Now everytime you perform a git commit, these tools will run against the staged files:

  • black
  • isort
  • flake8

You can also directly invoke pre-commit run --all-files or tox -e linters to run them without performing a commit.