Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial DynamicDeps support. #201

Merged
merged 10 commits into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions docs/dynamic-deps.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
.. _dynamic-deps:

====================
Dynamic dependencies
====================

Normally the dependencies for a callback are specified statically, as type
hints for its arguments:

.. code-block:: python

import scrapy


class BooksSpider(scrapy.Spider):
...

def start_requests(self):
yield scrapy.Request("http://books.toscrape.com/", self.parse_book)


def parse_book(self, response, book_page: BookPage, other_dep: OtherDep):
...

In some cases some or all of the dependencies need to be specified dynamically
instead, e.g. because they need to be different for different pages using the
wRAR marked this conversation as resolved.
Show resolved Hide resolved
same callback. You can use :class:`scrapy_poet.DynamicDeps
<scrapy_poet.injection.DynamicDeps>` for this. If you add a callback argument
with this type you can pass a list of additional dependency types in the
request meta dictionary using the "inject" key:

.. code-block:: python

import scrapy


class BooksSpider(scrapy.Spider):
...

def start_requests(self):
yield scrapy.Request(
"http://books.toscrape.com/",
self.parse_book,
meta={"inject": [OtherDep]},
)


def parse_book(self, response, book_page: BookPage, dynamic: DynamicDeps):
other_dep = dynamic[OtherDep]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to get a list of the created dependencies in the DynamicDeps (i.e. the available keys)? It seems we should document it. Currently only getting by type is documented, if I'm not mistaken.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed the user already knows which deps are there because it's they who set the meta key, but in other uses cases (which?) they can look at dynamic.keys(), do you want to document this or are you thinking about something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be a single callback which gets requests from different places, with different "inject" meta; this "inject" meta can be dynamic as well. For example, it can be a list of item classes to extract from a particular page, configured at spider start. Isn't it a point of DynamicDeps that the callback doesn't know what are the dependencies? Otherwise they can be specified in a signature.

As for the documentation, I think documenting DynamicDeps as a dict subclass could be enough.

...

The types passed this way will be used in the dependency resolution as usual,
and the created instances will be available in the
wRAR marked this conversation as resolved.
Show resolved Hide resolved
:class:`scrapy_poet.DynamicDeps <scrapy_poet.injection.DynamicDeps>` instance.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ To get started, see :ref:`intro-install` and :ref:`intro-tutorial`.
:maxdepth: 1

rules-from-web-poet
dynamic-deps
stats
providers
testing
Expand Down
1 change: 1 addition & 0 deletions scrapy_poet/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from .api import DummyResponse, callback_for
from .downloadermiddlewares import DownloaderStatsMiddleware, InjectionMiddleware
from .injection import DynamicDeps
from .page_input_providers import HttpResponseProvider, PageObjectInputProvider
from .spidermiddlewares import RetryMiddleware
from ._request_fingerprinter import ScrapyPoetRequestFingerprinter
62 changes: 57 additions & 5 deletions scrapy_poet/injection.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import os
import pprint
import warnings
from collections import UserDict
from typing import (
Any,
Callable,
Expand Down Expand Up @@ -54,6 +55,16 @@ class _UNDEFINED:
pass


class DynamicDeps(UserDict):
wRAR marked this conversation as resolved.
Show resolved Hide resolved
"""A container for dynamic dependencies provided via the ``"inject"`` request meta key.

The dynamic dependency instances are available at the run time as dict
values with keys being dependency types.
"""

pass


class Injector:
"""
Keep all the logic required to do dependency injection in Scrapy callbacks.
Expand Down Expand Up @@ -170,20 +181,28 @@ def build_plan(self, request: Request) -> andi.Plan:
# Callable[[Callable], Optional[Callable]] but the registry
# returns the typing for ``dict.get()`` method.
overrides=self.registry.overrides_for(request.url).get, # type: ignore[arg-type]
custom_builder_fn=self._get_item_builder(request),
custom_builder_fn=self._get_custom_builder(request),
)

def _get_item_builder(
def _get_custom_builder(
self, request: Request
) -> Callable[[Callable], Optional[Callable]]:
"""Return a function suitable for passing as ``custom_builder_fn`` to ``andi.plan``.

The returned function can map an item to a factory for that item based
on the registry.
on the registry and also supports filling :class:`.DynamicDeps`.
"""

@functools.lru_cache(maxsize=None) # to minimize the registry queries
def mapping_fn(item_cls: Callable) -> Optional[Callable]:
wRAR marked this conversation as resolved.
Show resolved Hide resolved
# building DynamicDeps
if item_cls is DynamicDeps:
dynamic_types = request.meta.get("inject", [])
if not dynamic_types:
return lambda: {}
return self._get_dynamic_deps_factory(dynamic_types)

# building items from pages
page_object_cls: Optional[Type[ItemPage]] = self.registry.page_cls_for_item(
request.url, cast(type, item_cls)
)
Expand All @@ -197,6 +216,37 @@ async def item_factory(page: page_object_cls) -> item_cls: # type: ignore[valid

return mapping_fn

@staticmethod
def _get_dynamic_deps_factory(
dynamic_types: List[type],
) -> Callable[..., DynamicDeps]:
"""Return a function that creates a :class:`.DynamicDeps` instance from its args.

It takes instances of types from ``dynamic_types`` as args and returns
a :class:`.DynamicDeps` instance where keys are types and values are
corresponding args. It has correct type hints so that it can be used as
an ``andi`` custom builder.
"""

# inspired by dataclasses._create_fn()
wRAR marked this conversation as resolved.
Show resolved Hide resolved
args = [f"{type_.__name__}_arg: {type_.__name__}" for type_ in dynamic_types]
args_str = ", ".join(args)
result_args = [
f"{type_.__name__}: {type_.__name__}_arg" for type_ in dynamic_types
]
result_args_str = ", ".join(result_args)
ns = {type_.__name__: type_ for type_ in dynamic_types}
create_args = ns.keys()
create_args_str = ", ".join(create_args)
txt = (
f"def __create_fn__({create_args_str}):\n"
f" def dynamic_deps_factory({args_str}) -> DynamicDeps:\n"
f" return DynamicDeps({{{result_args_str}}})\n"
f" return dynamic_deps_factory"
wRAR marked this conversation as resolved.
Show resolved Hide resolved
)
exec(txt, globals(), ns)
return ns["__create_fn__"](*dynamic_types)

@inlineCallbacks
def build_instances(
self,
Expand Down Expand Up @@ -480,7 +530,9 @@ class MySpider(Spider):
return Injector(crawler, registry=registry)


def get_response_for_testing(callback: Callable) -> Response:
def get_response_for_testing(
callback: Callable, meta: Optional[Dict[str, Any]] = None
) -> Response:
"""
Return a :class:`scrapy.http.Response` with fake content with the configured
callback. It is useful for testing providers.
Expand All @@ -501,6 +553,6 @@ def get_response_for_testing(callback: Callable) -> Response:
""".encode(
"utf-8"
)
request = Request(url, callback=callback)
request = Request(url, callback=callback, meta=meta)
response = Response(url, 200, None, html, request=request)
return response
95 changes: 92 additions & 3 deletions tests/test_injection.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
import shutil
import sys
from typing import Any, Callable, Dict, Generator
from typing import Any, Callable, Dict, Generator, Optional

import andi
import attr
import parsel
import pytest
Expand All @@ -16,7 +17,12 @@
from web_poet.mixins import ResponseShortcutsMixin
from web_poet.rules import ApplyRule

from scrapy_poet import DummyResponse, HttpResponseProvider, PageObjectInputProvider
from scrapy_poet import (
DummyResponse,
DynamicDeps,
HttpResponseProvider,
PageObjectInputProvider,
)
from scrapy_poet.injection import (
Injector,
check_all_providers_are_callable,
Expand Down Expand Up @@ -293,8 +299,9 @@ def _assert_instances(
callback: Callable,
expected_instances: Dict[type, Any],
expected_kwargs: Dict[str, Any],
reqmeta: Optional[Dict[str, Any]] = None,
) -> Generator[Any, Any, None]:
response = get_response_for_testing(callback)
response = get_response_for_testing(callback, meta=reqmeta)
request = response.request

plan = injector.build_plan(response.request)
Expand Down Expand Up @@ -535,6 +542,76 @@ def callback(
# not injected at all.
assert set(kwargs.keys()) == {"expensive", "item"}

@inlineCallbacks
def test_dynamic_deps(self):
def callback(dd: DynamicDeps):
pass

provider = get_provider({Cls1, Cls2})
injector = get_injector_for_testing({provider: 1})

expected_instances = {
DynamicDeps: DynamicDeps({Cls1: Cls1(), Cls2: Cls2()}),
Cls1: Cls1(),
Cls2: Cls2(),
}
expected_kwargs = {
"dd": DynamicDeps({Cls1: Cls1(), Cls2: Cls2()}),
}
yield self._assert_instances(
injector,
callback,
expected_instances,
expected_kwargs,
reqmeta={"inject": [Cls1, Cls2]},
)

@inlineCallbacks
def test_dynamic_deps_mix(self):
def callback(c1: Cls1, dd: DynamicDeps):
pass

provider = get_provider({Cls1, Cls2})
injector = get_injector_for_testing({provider: 1})

expected_instances = {
DynamicDeps: DynamicDeps({Cls1: Cls1(), Cls2: Cls2()}),
Cls1: Cls1(),
Cls2: Cls2(),
}
kmike marked this conversation as resolved.
Show resolved Hide resolved
expected_kwargs = {
"c1": Cls1(),
"dd": DynamicDeps({Cls1: Cls1(), Cls2: Cls2()}),
}
yield self._assert_instances(
injector,
callback,
expected_instances,
expected_kwargs,
reqmeta={"inject": [Cls1, Cls2]},
)

@inlineCallbacks
def test_dynamic_deps_no_meta(self):
def callback(dd: DynamicDeps):
pass

provider = get_provider({Cls1, Cls2})
injector = get_injector_for_testing({provider: 1})

expected_instances = {
DynamicDeps: DynamicDeps(),
}
expected_kwargs = {
"dd": DynamicDeps(),
}
yield self._assert_instances(
injector,
callback,
expected_instances,
expected_kwargs,
)


class Html(Injectable):
url = "http://example.com"
Expand Down Expand Up @@ -833,3 +910,15 @@ def callback(response: DummyResponse, arg_price: Price, arg_name: Name):
response.request, response, plan
)
assert injector.weak_cache.get(response.request) is None


def test_dynamic_deps_factory():
fn = Injector._get_dynamic_deps_factory([int, Cls1])
args = andi.inspect(fn)
assert args == {
"Cls1_arg": [Cls1],
"int_arg": [int],
}
c = Cls1()
dd = fn(int_arg=42, Cls1_arg=c)
assert dd == {int: 42, Cls1: c}
Loading