Skip to content

Commit 2f75c54

Browse files
authored
Add-on (#216)
1 parent c62ee50 commit 2f75c54

19 files changed

+286
-136
lines changed

README.rst

Lines changed: 23 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ scrapy-poet
2727
With ``scrapy-poet`` is possible to make a single spider that supports many sites with
2828
different layouts.
2929

30+
Requires **Python 3.9+** and **Scrapy >= 2.6.0**.
31+
3032
Read the `documentation <https://scrapy-poet.readthedocs.io>`_ for more information.
3133

3234
License is BSD 3-clause.
@@ -48,24 +50,32 @@ Installation
4850
4951
pip install scrapy-poet
5052
51-
Requires **Python 3.9+** and **Scrapy >= 2.6.0**.
52-
5353
Usage in a Scrapy Project
5454
=========================
5555

5656
Add the following inside Scrapy's ``settings.py`` file:
5757

58-
.. code-block:: python
59-
60-
DOWNLOADER_MIDDLEWARES = {
61-
"scrapy_poet.InjectionMiddleware": 543,
62-
"scrapy.downloadermiddlewares.stats.DownloaderStats": None,
63-
"scrapy_poet.DownloaderStatsMiddleware": 850,
64-
}
65-
SPIDER_MIDDLEWARES = {
66-
"scrapy_poet.RetryMiddleware": 275,
67-
}
68-
REQUEST_FINGERPRINTER_CLASS = "scrapy_poet.ScrapyPoetRequestFingerprinter"
58+
- Scrapy ≥ 2.10:
59+
60+
.. code-block:: python
61+
62+
ADDONS = {
63+
"scrapy_poet.Addon": 300,
64+
}
65+
66+
- Scrapy < 2.10:
67+
68+
.. code-block:: python
69+
70+
DOWNLOADER_MIDDLEWARES = {
71+
"scrapy_poet.InjectionMiddleware": 543,
72+
"scrapy.downloadermiddlewares.stats.DownloaderStats": None,
73+
"scrapy_poet.DownloaderStatsMiddleware": 850,
74+
}
75+
REQUEST_FINGERPRINTER_CLASS = "scrapy_poet.ScrapyPoetRequestFingerprinter"
76+
SPIDER_MIDDLEWARES = {
77+
"scrapy_poet.RetryMiddleware": 275,
78+
}
6979
7080
Developing
7181
==========

docs/api_reference.rst

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,22 @@ API
1111
:members:
1212
:no-special-members:
1313

14-
Injection Middleware
15-
====================
14+
Scrapy components
15+
=================
16+
17+
.. autoclass:: scrapy_poet.DownloaderStatsMiddleware
18+
:members:
19+
20+
.. autoclass:: scrapy_poet.InjectionMiddleware
21+
:members:
22+
23+
.. autoclass:: scrapy_poet.RetryMiddleware
24+
:members:
1625

17-
.. automodule:: scrapy_poet.downloadermiddlewares
26+
.. autoclass:: scrapy_poet.ScrapyPoetRequestFingerprinter
1827
:members:
1928

20-
Page Input Providers
29+
Page input providers
2130
====================
2231

2332
.. automodule:: scrapy_poet.page_input_providers

docs/index.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ testability and reusability.
2121
Concrete integrations are not provided by ``web-poet``, but
2222
``scrapy-poet`` makes them possbile.
2323

24-
To get started, see :ref:`intro-install` and :ref:`intro-tutorial`.
24+
To get started, see :ref:`setup` and :ref:`intro-tutorial`.
2525

2626
:ref:`license` is BSD 3-clause.
2727

@@ -34,7 +34,7 @@ To get started, see :ref:`intro-install` and :ref:`intro-tutorial`.
3434
:caption: Getting started
3535
:maxdepth: 1
3636

37-
intro/install
37+
intro/setup
3838
intro/basic-tutorial
3939
intro/advanced-tutorial
4040
intro/pitfalls

docs/intro/advanced-tutorial.rst

Lines changed: 0 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -77,14 +77,6 @@ It can be directly used inside the spider as:
7777
7878
class ProductSpider(scrapy.Spider):
7979
80-
custom_settings = {
81-
"DOWNLOADER_MIDDLEWARES": {
82-
"scrapy_poet.InjectionMiddleware": 543,
83-
"scrapy.downloadermiddlewares.stats.DownloaderStats": None,
84-
"scrapy_poet.DownloaderStatsMiddleware": 850,
85-
}
86-
}
87-
8880
def start_requests(self):
8981
for url in [
9082
"https://example.com/category/product/item?id=123",
@@ -152,14 +144,6 @@ Let's see it in action:
152144
153145
class ProductSpider(scrapy.Spider):
154146
155-
custom_settings = {
156-
"DOWNLOADER_MIDDLEWARES": {
157-
"scrapy_poet.InjectionMiddleware": 543,
158-
"scrapy.downloadermiddlewares.stats.DownloaderStats": None,
159-
"scrapy_poet.DownloaderStatsMiddleware": 850,
160-
}
161-
}
162-
163147
start_urls = [
164148
"https://example.com/category/product/item?id=123",
165149
"https://example.com/category/product/item?id=989",

docs/intro/install.rst

Lines changed: 0 additions & 51 deletions
This file was deleted.

docs/intro/setup.rst

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
.. _setup:
2+
3+
=====
4+
Setup
5+
=====
6+
7+
.. _intro-install:
8+
9+
Install from PyPI::
10+
11+
pip install scrapy-poet
12+
13+
Then configure:
14+
15+
- For Scrapy ≥ 2.10, install the add-on:
16+
17+
.. code-block:: python
18+
:caption: settings.py
19+
20+
ADDONS = {
21+
"scrapy_poet.Addon": 300,
22+
}
23+
24+
.. _addon-changes:
25+
26+
This is what the add-on changes:
27+
28+
- In :setting:`DOWNLOADER_MIDDLEWARES`:
29+
30+
- Sets :class:`~scrapy_poet.InjectionMiddleware` with value ``543``.
31+
32+
- Replaces
33+
:class:`scrapy.downloadermiddlewares.stats.DownloaderStats`
34+
with :class:`scrapy_poet.DownloaderStatsMiddleware`.
35+
36+
- Sets :setting:`REQUEST_FINGERPRINTER_CLASS` to
37+
:class:`~scrapy_poet.ScrapyPoetRequestFingerprinter`.
38+
39+
- In :setting:`SPIDER_MIDDLEWARES`, sets
40+
:class:`~scrapy_poet.RetryMiddleware` with value ``275``.
41+
42+
- For Scrapy < 2.10, manually apply :ref:`the add-on changes
43+
<addon-changes>`. For example:
44+
45+
.. code-block:: python
46+
:caption: settings.py
47+
48+
DOWNLOADER_MIDDLEWARES = {
49+
"scrapy_poet.InjectionMiddleware": 543,
50+
"scrapy.downloadermiddlewares.stats.DownloaderStats": None,
51+
"scrapy_poet.DownloaderStatsMiddleware": 850,
52+
}
53+
REQUEST_FINGERPRINTER_CLASS = "scrapy_poet.ScrapyPoetRequestFingerprinter"
54+
SPIDER_MIDDLEWARES = {
55+
"scrapy_poet.RetryMiddleware": 275,
56+
}

example/example/settings.py

Lines changed: 3 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -9,26 +9,16 @@
99

1010
from example.autoextract import AutoextractProductProvider
1111

12-
from scrapy_poet import ScrapyPoetRequestFingerprinter
13-
1412
BOT_NAME = "example"
1513

1614
SPIDER_MODULES = ["example.spiders"]
1715
NEWSPIDER_MODULE = "example.spiders"
1816

19-
SCRAPY_POET_PROVIDERS = {AutoextractProductProvider: 500}
20-
2117
# Obey robots.txt rules
2218
ROBOTSTXT_OBEY = True
2319

24-
DOWNLOADER_MIDDLEWARES = {
25-
"scrapy_poet.InjectionMiddleware": 543,
26-
"scrapy.downloadermiddlewares.stats.DownloaderStats": None,
27-
"scrapy_poet.DownloaderStatsMiddleware": 850,
20+
ADDONS = {
21+
"scrapy_poet.Addon": 300,
2822
}
2923

30-
REQUEST_FINGERPRINTER_CLASS = ScrapyPoetRequestFingerprinter
31-
32-
SPIDER_MIDDLEWARES = {
33-
"scrapy_poet.RetryMiddleware": 275,
34-
}
24+
SCRAPY_POET_PROVIDERS = {AutoextractProductProvider: 500}

scrapy_poet/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,4 @@
44
from .page_input_providers import HttpResponseProvider, PageObjectInputProvider
55
from .spidermiddlewares import RetryMiddleware
66
from ._request_fingerprinter import ScrapyPoetRequestFingerprinter
7+
from ._addon import Addon

scrapy_poet/_addon.py

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
from logging import getLogger
2+
3+
from scrapy.downloadermiddlewares.stats import DownloaderStats
4+
from scrapy.settings import BaseSettings
5+
from scrapy.utils.misc import load_object
6+
7+
from ._request_fingerprinter import ScrapyPoetRequestFingerprinter
8+
from .downloadermiddlewares import DownloaderStatsMiddleware, InjectionMiddleware
9+
from .spidermiddlewares import RetryMiddleware
10+
11+
logger = getLogger(__name__)
12+
13+
14+
# https://github.com/zytedata/zyte-spider-templates/blob/1b72aa8912f6009d43bf87a5bd1920537d458744/zyte_spider_templates/_addon.py#L33C1-L88C37
15+
def _replace_builtin(
16+
settings: BaseSettings, setting: str, builtin_cls: type, new_cls: type
17+
) -> None:
18+
setting_value = settings[setting]
19+
if not setting_value:
20+
logger.warning(
21+
f"Setting {setting!r} is empty. Could not replace the built-in "
22+
f"{builtin_cls} entry with {new_cls}. Add {new_cls} manually to "
23+
f"silence this warning."
24+
)
25+
return None
26+
27+
if new_cls in setting_value:
28+
return None
29+
for cls_or_path in setting_value:
30+
if isinstance(cls_or_path, str):
31+
_cls = load_object(cls_or_path)
32+
if _cls == new_cls:
33+
return None
34+
35+
builtin_entry: object = None
36+
for _setting_value in (setting_value, settings[f"{setting}_BASE"]):
37+
if builtin_cls in _setting_value:
38+
builtin_entry = builtin_cls
39+
pos = _setting_value[builtin_entry]
40+
break
41+
for cls_or_path in _setting_value:
42+
if isinstance(cls_or_path, str):
43+
_cls = load_object(cls_or_path)
44+
if _cls == builtin_cls:
45+
builtin_entry = cls_or_path
46+
pos = _setting_value[builtin_entry]
47+
break
48+
if builtin_entry:
49+
break
50+
51+
if not builtin_entry:
52+
logger.warning(
53+
f"Settings {setting!r} and {setting + '_BASE'!r} are both "
54+
f"missing built-in entry {builtin_cls}. Cannot replace it with {new_cls}. "
55+
f"Add {new_cls} manually to silence this warning."
56+
)
57+
return None
58+
59+
if pos is None:
60+
logger.warning(
61+
f"Built-in entry {builtin_cls} of setting {setting!r} is disabled "
62+
f"(None). Cannot replace it with {new_cls}. Add {new_cls} "
63+
f"manually to silence this warning. If you had replaced "
64+
f"{builtin_cls} with some other entry, you might also need to "
65+
f"disable that other entry for things to work as expected."
66+
)
67+
return
68+
69+
settings[setting][builtin_entry] = None
70+
settings[setting][new_cls] = pos
71+
72+
73+
# https://github.com/scrapy-plugins/scrapy-zyte-api/blob/a1d81d11854b420248f38e7db49c685a8d46d943/scrapy_zyte_api/addon.py#L12
74+
def _setdefault(settings, setting, cls, pos):
75+
setting_value = settings[setting]
76+
if not setting_value:
77+
settings[setting] = {cls: pos}
78+
return
79+
if cls in setting_value:
80+
return
81+
for cls_or_path in setting_value:
82+
if isinstance(cls_or_path, str):
83+
_cls = load_object(cls_or_path)
84+
if _cls == cls:
85+
return
86+
settings[setting][cls] = pos
87+
88+
89+
class Addon:
90+
def update_settings(self, settings: BaseSettings) -> None:
91+
settings.set(
92+
"REQUEST_FINGERPRINTER_CLASS",
93+
ScrapyPoetRequestFingerprinter,
94+
priority="addon",
95+
)
96+
_setdefault(settings, "DOWNLOADER_MIDDLEWARES", InjectionMiddleware, 543)
97+
_setdefault(settings, "SPIDER_MIDDLEWARES", RetryMiddleware, 275)
98+
_replace_builtin(
99+
settings,
100+
"DOWNLOADER_MIDDLEWARES",
101+
DownloaderStats,
102+
DownloaderStatsMiddleware,
103+
)

0 commit comments

Comments
 (0)