Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release notes for 0.22.0 #212

Merged
merged 2 commits into from
Jul 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions CHANGES.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,18 @@
Changes
=======

N.N.N (YYYY-MM-DD)
------------------
0.22.0 (2024-07-DD)
-------------------

* ``scrapy-zyte-api[provider]`` now requires zyte-common-items >= 0.20.0.
* ``scrapy-zyte-api[provider]`` now requires :doc:`zyte-common-items
<zyte-common-items:index>` 0.20.0+.

* Added the :setting:`ZYTE_API_AUTO_FIELD_STATS` setting.

* Added the :func:`~scrapy_zyte_api.is_session_init_request` function.

* Added the :data:`~scrapy_zyte_api.session_config_registry` variable.

0.21.0 (2024-07-02)
-------------------

Expand Down Expand Up @@ -120,8 +125,7 @@ N.N.N (YYYY-MM-DD)
* The ``Accept``, ``Accept-Encoding``, ``Accept-Language``, and ``User-Agent``
headers are now dropped automatically during :ref:`header mapping
<header-mapping>` unless they have user-defined values. This fix can improve
success rates on some websites when using :ref:`HTTP requests
<zyte-api-http>`.
success rates on some websites when using :ref:`HTTP requests <zapi-http>`.

0.18.1 (2024-04-19)
-------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/fingerprint-params.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ fingerprints for Zyte API requests based on the following Zyte API parameters:
- Output parameters (:http:`request:browserHtml`,
:http:`request:httpResponseBody`, :http:`request:httpResponseHeaders`,
:http:`request:responseCookies`, :http:`request:screenshot`, and
:ref:`automatic extraction outputs <zyte-api-extract-fields>` like
:ref:`automatic extraction outputs <zapi-extract-fields>` like
:http:`request:product`)

- Rendering option parameters (:http:`request:actions`,
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/request.rst
Original file line number Diff line number Diff line change
Expand Up @@ -222,10 +222,10 @@ combinations that Zyte API does not currently support, and may never support:
:http:`request:requestHeaders`.

- You can set :http:`request:httpResponseBody` to ``True`` or use
:ref:`automatic extraction from httpResponseBody <zyte-api-extract-from>`,
:ref:`automatic extraction from httpResponseBody <zapi-extract-from>`,
and also set :http:`request:browserHtml` or :http:`request:screenshot` to
``True`` or use :ref:`automatic extraction from browserHtml
<zyte-api-extract-from>`. In this case, :attr:`Request.headers
<zapi-extract-from>`. In this case, :attr:`Request.headers
<scrapy.http.Request.headers>` is mapped both as
:http:`request:customHttpRequestHeaders` and as
:http:`request:requestHeaders`, and :http:`request:browserHtml` is used as
Expand Down
8 changes: 4 additions & 4 deletions docs/reference/settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ Default: ``False``

Enables stats that indicate which requested fields :ref:`obtained through
scrapy-poet integration <scrapy-poet>` come directly from
:ref:`zyte-api-extract`.
:ref:`zapi-extract`.

If for any request no page object class is used to override
:ref:`zyte-api-extract` fields for a given item type, the following stat is
:ref:`zapi-extract` fields for a given item type, the following stat is
set:

.. code-block:: python
Expand All @@ -29,7 +29,7 @@ set:
all fields.

If for any request a custom page object class is used to override some
:ref:`zyte-api-extract` fields, the following stat is set:
:ref:`zapi-extract` fields, the following stat is set:

.. code-block:: python

Expand Down Expand Up @@ -434,7 +434,7 @@ ZYTE_API_SESSION_MAX_ERRORS
Default: ``1``

Maximum number of :ref:`unsuccessful responses
<zyte-api-unsuccessful-responses>` allowed for any given session before
<zapi-unsuccessful-responses>` allowed for any given session before
discarding the session.

You might want to increase this number if you find that a session may continue
Expand Down
2 changes: 1 addition & 1 deletion docs/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Requirements
You need at least:

- A :ref:`Zyte API <zyte-api>` subscription (there’s a :ref:`free trial
<zyte-api-trial>`).
<zapi-trial>`).

- Python 3.8+

Expand Down
2 changes: 1 addition & 1 deletion docs/usage/manual.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,4 +64,4 @@ remember to also request :http:`request:httpResponseHeaders`:
# "<html>…</html>"

To learn more about Zyte API parameters, see the upstream :ref:`usage
<zyte-api-usage>` and :ref:`API reference <zyte-api-reference>` pages.
<zapi-usage>` and :ref:`API reference <zapi-reference>` pages.
8 changes: 4 additions & 4 deletions docs/usage/retry.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
Retries
=======

To make :ref:`error handling <zyte-api-errors>` easier, scrapy-zyte-api lets
To make :ref:`error handling <zapi-errors>` easier, scrapy-zyte-api lets
you :ref:`handle successful Zyte API responses as usual <retry-successful>`,
but :ref:`implements a more advanced retry mechanism for rate-limiting and
unsuccessful responses <retry-non-successful>`.
Expand All @@ -14,7 +14,7 @@ unsuccessful responses <retry-non-successful>`.
Retrying successful Zyte API responses
======================================

When a :ref:`successful Zyte API response <zyte-api-successful-responses>` is
When a :ref:`successful Zyte API response <zapi-successful-responses>` is
received, a Scrapy response object is built based on the upstream website
response (see :ref:`response`), and passed to your :ref:`downloader middlewares
<topics-downloader-middleware>` and :ref:`spider callback <topics-spiders>`.
Expand All @@ -30,8 +30,8 @@ them using Scrapy’s built-in retry middleware
Retrying non-successful Zyte API responses
==========================================

When a :ref:`rate-limiting <zyte-api-rate-limit>` or an :ref:`unsuccessful
<zyte-api-unsuccessful-responses>` Zyte API response is received, no Scrapy
When a :ref:`rate-limiting <zapi-rate-limit>` or an :ref:`unsuccessful
<zapi-unsuccessful-responses>` Zyte API response is received, no Scrapy
response object is built. Instead, a :ref:`retry policy <retry-policy>` is
followed, and if the policy retries are exhausted, a
:class:`zyte_api.RequestError` exception is raised.
Expand Down
40 changes: 20 additions & 20 deletions docs/usage/session.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,22 @@ Session management

Zyte API provides powerful session APIs:

- :ref:`Client-managed sessions <zyte-api-session-id>` give you full control
- :ref:`Client-managed sessions <zapi-session-id>` give you full control
over session management.

- :ref:`Server-managed sessions <zyte-api-session-contexts>` let Zyte API
- :ref:`Server-managed sessions <zapi-session-contexts>` let Zyte API
handle session management for you.

When using scrapy-zyte-api, you can use these session APIs through the
corresponding Zyte API fields (:http:`request:session`,
:http:`request:sessionContext`).

However, scrapy-zyte-api also provides its own session management API, similar
to that of :ref:`server-managed sessions <zyte-api-session-contexts>`, but
built on top of :ref:`client-managed sessions <zyte-api-session-id>`.
to that of :ref:`server-managed sessions <zapi-session-contexts>`, but
built on top of :ref:`client-managed sessions <zapi-session-id>`.

scrapy-zyte-api session management offers some advantages over
:ref:`server-managed sessions <zyte-api-session-contexts>`:
:ref:`server-managed sessions <zapi-session-contexts>`:

- You can perform :ref:`session validity checks <session-check>`, so that the
sessions of responses that do not pass those checks are refreshed, and the
Expand All @@ -35,11 +35,11 @@ scrapy-zyte-api session management offers some advantages over
:ref:`optimize-sessions` and :ref:`session-configs`.

However, scrapy-zyte-api session management is not a replacement for
:ref:`server-managed sessions <zyte-api-session-contexts>` or
:ref:`client-managed sessions <zyte-api-session-id>`:
:ref:`server-managed sessions <zapi-session-contexts>` or
:ref:`client-managed sessions <zapi-session-id>`:

- :ref:`Server-managed sessions <zyte-api-session-contexts>` offer a longer
life time than the :ref:`client-managed sessions <zyte-api-session-id>`
- :ref:`Server-managed sessions <zapi-session-contexts>` offer a longer
life time than the :ref:`client-managed sessions <zapi-session-id>`
that scrapy-zyte-api session management uses, so as long as you do not need
one of the scrapy-zyte-api session management features, server-managed
sessions can be significantly more efficient (fewer total sessions needed
Expand All @@ -49,7 +49,7 @@ However, scrapy-zyte-api session management is not a replacement for
website. With scrapy-zyte-api session management, you need to :ref:`handle
optimization yourself <optimize-sessions>`.

- :ref:`Client-managed sessions <zyte-api-session-id>` offer full control
- :ref:`Client-managed sessions <zapi-session-id>` offer full control
over session management, while scrapy-zyte-api session management removes
some of that control to provide an easier API for supported use cases.

Expand All @@ -68,7 +68,7 @@ override <session-configs>`.
.. _session-init-default:

By default, scrapy-zyte-api will maintain up to 8 sessions per domain, each
initialized with a :ref:`browser request <zyte-api-browser>` targeting the URL
initialized with a :ref:`browser request <zapi-browser>` targeting the URL
of the first request that will use the session. Sessions are automatically
rotated among requests, and refreshed as they expire or get banned. You can
customize most of this logic though request metadata, settings and
Expand Down Expand Up @@ -134,7 +134,7 @@ To change the :ref:`default session initialization parameters
:reqmeta:`zyte_api_session_params` request metadata key.

It works similarly to :http:`request:sessionContextParams` from
:ref:`server-managed sessions <zyte-api-session-contexts>`, but it supports
:ref:`server-managed sessions <zapi-session-contexts>`, but it supports
arbitrary Zyte API parameters instead of a specific subset.

If it does not define a ``"url"``, the URL of the request :ref:`triggering
Expand Down Expand Up @@ -210,7 +210,7 @@ initialization request.
If your session checking implementation relies on the response body (e.g. it
uses CSS or XPath expressions), you should make sure that you are getting one,
which might not be the case if you are mostly using :ref:`Zyte API automatic
extraction <zyte-api-extract>`, e.g. when using :doc:`Zyte spider templates
extraction <zapi-extract>`, e.g. when using :doc:`Zyte spider templates
<zyte-spider-templates:index>`. For example, you can use
:setting:`ZYTE_API_AUTOMAP_PARAMS` and :setting:`ZYTE_API_PROVIDER_PARAMS` to
force :http:`request:browserHtml` or :http:`request:httpResponseBody` to be set
Expand Down Expand Up @@ -288,7 +288,7 @@ Here are some things you can try:
(:setting:`ZYTE_API_SESSION_POOL_SIZE`). The more different sessions you
use, the more slowly you send requests through each session.

Mind, however, that :ref:`client-managed sessions <zyte-api-session-id>`
Mind, however, that :ref:`client-managed sessions <zapi-session-id>`
expire after `15 minutes since creation or 2 minutes since the last request
<https://docs.zyte.com/zyte-api/usage/reference.html#operation/extract/request/session>`_.
At a certain point, increasing :setting:`ZYTE_API_SESSION_POOL_SIZE`
Expand All @@ -298,7 +298,7 @@ Here are some things you can try:
counterproductive.

- By default, sessions are discarded as soon as an :ref:`unsuccessful
response <zyte-api-unsuccessful-responses>` is received.
response <zapi-unsuccessful-responses>` is received.

However, on some websites sessions may remain valid even after a few
unsuccessful responses. If that is the case, you might want to increase
Expand All @@ -308,9 +308,9 @@ Here are some things you can try:
If you do not need :ref:`session checking <session-check>` and your
:ref:`initialization parameters <session-init>` are only
:http:`request:browserHtml` and :http:`request:actions`, :ref:`server-managed
sessions <zyte-api-session-contexts>` might be a more cost-effective choice, as
sessions <zapi-session-contexts>` might be a more cost-effective choice, as
they live much longer than :ref:`client-managed sessions
<zyte-api-session-id>`.
<zapi-session-id>`.


.. _session-configs:
Expand Down Expand Up @@ -371,7 +371,7 @@ To include cookies in session initialization requests, use
:http:`request:requestCookies` in :ref:`session initialization parameters
<session-init>`. But mind that those cookies are only set during that request,
:ref:`they are not added to the session cookie jar
<zyte-api-session-cookie-jar>`.
<zapi-session-cookie-jar>`.


Session retry policies
Expand Down Expand Up @@ -441,7 +441,7 @@ The following stats exist for scrapy-zyte-api session management:

``scrapy-zyte-api/sessions/pools/{pool}/init/failed``
Number of times that initializing a session for pool ``{pool}`` resulted in
an :ref:`unsuccessful response <zyte-api-unsuccessful-responses>`.
an :ref:`unsuccessful response <zapi-unsuccessful-responses>`.

``scrapy-zyte-api/sessions/pools/{pool}/init/param-error``
Number of times that initializing a session for pool ``{pool}`` triggered
Expand Down Expand Up @@ -473,7 +473,7 @@ The following stats exist for scrapy-zyte-api session management:

``scrapy-zyte-api/sessions/pools/{pool}/use/failed``
Number of times that a request that used a session from pool ``{pool}``
got an :ref:`unsuccessful response <zyte-api-unsuccessful-responses>`.
got an :ref:`unsuccessful response <zapi-unsuccessful-responses>`.

``scrapy-zyte-api/sessions/use/disabled``
Number of processed requests for which session management was disabled.
6 changes: 3 additions & 3 deletions docs/usage/stats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ Stats from :doc:`python-zyte-api <python-zyte-api:index>` are exposed as

For example, ``scrapy-zyte-api/status_codes/<status code>`` stats indicate the
status code of Zyte API responses (e.g. ``429`` for :ref:`rate limiting
<zyte-api-rate-limit>` or ``520`` for :ref:`temporary download errors
<zyte-api-temporary-download-errors>`).
<zapi-rate-limit>` or ``520`` for :ref:`temporary download errors
<zapi-temporary-download-errors>`).

.. note:: The actual status code that is received from the target website, i.e.
the :http:`response:statusCode` response field of a :ref:`Zyte API
successful response <zyte-api-successful-responses>`, is accounted for in
successful response <zapi-successful-responses>`, is accounted for in
the ``downloader/response_status_count/<status code>`` stat, as with any
other Scrapy response.
2 changes: 1 addition & 1 deletion scrapy_zyte_api/_annotations.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

class ExtractFrom(str, Enum):
""":ref:`Annotation <annotations>` to specify the :ref:`extraction source
<zyte-api-extract-from>` of an automatic extraction :ref:`input <inputs>`,
<zapi-extract-from>` of an automatic extraction :ref:`input <inputs>`,
such as :class:`~zyte_common_items.Product` or
:class:`~zyte_common_items.Article`.

Expand Down
2 changes: 1 addition & 1 deletion scrapy_zyte_api/responses.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ def replace(self, *args, **kwargs):
def raw_api_response(self) -> Optional[Dict]:
"""Contains the raw API response from Zyte API.

For the full list of parameters, see :ref:`zyte-api-reference`.
For the full list of parameters, see :ref:`zapi-reference`.
"""
return self._raw_api_response

Expand Down