Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ NEW: Add sitemap validation config options #37

Draft
wants to merge 14 commits into
base: master
Choose a base branch
from
37 changes: 35 additions & 2 deletions docs/advanced-configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ Customizing the URL Scheme

:confval:`sitemap_url_scheme` defaults to ``{lang}{version}{link}``, where ``{lang}`` and ``{version}`` get set by :confval:`language` and :confval:`version` in **conf.py**.

.. important:: As of Sphinx version 5, ``language`` defaults to ``"en"``, if that
makes the default scheme produce the incorrect URL, then change the default behavior.
.. important:: As of Sphinx version 5, :confval:`language` defaults to ``"en"``, if that makes the default scheme produce
the incorrect URL, then change the default scheme. You may also want to look at :ref:`` section below to help ensure the sitemap stays accurate.

To change the default behavior, set the value of :confval:`sitemap_url_scheme` in **conf.py** to the
desired format. For example:
Expand All @@ -28,6 +28,39 @@ Or for nested deployments, something like:
You can also omit values from the scheme for desired behavior.


.. _configuration_url_validation:

Setting up URL Validation
^^^^^^^^^^^^^^^^^^^^^^^^^

Use :confval:`sitemap_validator_urls` to setup URL validation, where a dictionary of lists is used to
validate one or more URLS for a given build.

The keys for the dictionary are a concatenation of :confval:`language` and :confval:`version` for that build, where the string ``"nil"`` is set for the key if both the language and version are not set.
For example, to setup validation for multiple builds:

.. code-block:: python

sitemap_validator_urls = {
'enlatest': ['https://my-site.com/en/latest/index.html', 'https://my-site..com/en/latest/test.html'],
'delatest': ['https://my-site.com/de/latest/index.html', 'https://my-site..com/de/latest/test.html'],
}

or an example for a single build:

.. code-block:: python

sitemap_validator_urls = {
'nil': ['https://my-site.com/en/latest/index.html', 'https://my-site..com/en/latest/test.html'],
}

For single builds, set :confval:`sitemap_validator_required` to validate that :confval:`language` and :confval:`version` are concatenated as expected, with ``"nil"`` being used without :confval:`language` and :confval:`version` being set.
For example, with :confval:`language` set to ``"en"`` and :confval:`version` set to ``"latest"``:

.. code-block:: python

sitemap_validator_required = 'enlatest'

.. _configuration_changing_filename:

Changing the Filename
Expand Down
9 changes: 9 additions & 0 deletions docs/configuration-values.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,15 @@ A list of possible configuration values to configure in **conf.py**:

.. versionadded:: 2.0.0

.. confval:: sitemap_validator_urls

A list of urls to check for a specified combination of :confval:`language` and :confval:`version`.
Default is ``{}``.

See :ref:`configuration_url_validation` for more information.

.. versionadded:: 2.5.0

.. confval:: sitemap_filename

The filename used for the sitemap. Default is ``sitemap.xml``.
Expand Down
46 changes: 45 additions & 1 deletion sphinx_sitemap/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ def setup(app):
"sitemap_url_scheme", default="{lang}{version}{link}", rebuild=""
)
app.add_config_value("sitemap_locales", default=None, rebuild="")

app.add_config_value("sitemap_validator_urls", default={}, rebuild="")
app.add_config_value("sitemap_validator_required", default=None, rebuild="")
app.add_config_value("sitemap_filename", default="sitemap.xml", rebuild="")

try:
Expand Down Expand Up @@ -111,6 +112,47 @@ def add_html_link(app, pagename, templatename, context, doctree):
env.sitemap_links.put(sitemap_link)


def validate_sitemap(app, filename):
"""
If sitemap_validator_required is set, then make sure the concatenated language and
version match the given string.

If sitemap_validator_urls is set, then use to check that all of the given URLs
for the current language and version are included in the sitemap.
"""
key = "{}{}".format(app.config.language or "", app.config.version or "")
key = key or "nil"

if (
app.config.sitemap_validator_required
and key != app.config.sitemap_validator_required
):
logger.warning(
"Sitemap failed validation. {} does not match the required {}".format(
key, app.config.sitemap_validator_required
),
type="sitemap",
subtype="validation",
)

passed = True
if app.config.sitemap_validator_urls and key in app.config.sitemap_validator_urls:
with open(filename, "r") as myfile:
sitemap = myfile.read()
# if any of the urls don't match, throw a warning
for url in app.config.sitemap_validator_urls[key]:
if url not in sitemap:
passed = False
logger.warning(
"Sitemap failed validation. {} not found in {}".format(
url, filename
),
type="sitemap",
subtype="validation",
)
return passed


def create_sitemap(app, exception):
"""Generates the sitemap.xml from the collected HTML page links"""
site_url = app.builder.config.site_url or app.builder.config.html_baseurl
Expand Down Expand Up @@ -176,6 +218,8 @@ def create_sitemap(app, exception):
filename, xml_declaration=True, encoding="utf-8", method="xml"
)

validate_sitemap(app, filename)

logger.info(
"sphinx-sitemap: %s was generated for URL %s in %s"
% (app.config.sitemap_filename, site_url, filename),
Expand Down