Sphinx-needs HTML for modular sphinx projects with bi-directional links #1220

arwedus · 2024-08-16T12:36:49Z

arwedus
Aug 16, 2024

Hi @danwos , @chrisjsewell, as we discussed a couple times, I have an idea for solving the scaling problem that comes up with sphinx, sphinx-needs, and large monorepos with projects that reference each other.

Since this is a rather special case for using sphinx and sphinx-needs, I wrote a whitepaper that sums up my approach. You can find my feature request to make it all work with less stress on the CI/CD environment at the end.

Multi-project Docu Build With Sphinx, Sphinx-needs, and Bazel

This documentation build strategy works best for large monorepos which use a multi-project structure for doxygen and sphinx documentation.

Hierarchical multi-project documentation project structure

Based on the sheer size of a true monorepo codebase, and if teams work mostly autonomous on their (sub-)projects, collaborating on interface level only, it makes sense to use use a multi-project documentation build to reduce the local docu build time for developers as well as the CI/CD build time.

To make the docu build in CI/CD environments scalable for large monorepos,
I devised a multi project set-up for doxygen and sphinx with inter-project linkage.
If combined with a build orchestration tool that supports hermetic, deterministic and modular builds with complete modelling of dependencies (like Bazel), the monorepo approach scales for docs-as-code as it does for compiled code.

The following diagram shows the project structure in an exemplary way:

Here's how it works:

Set up a separate Sphinx project for each separate (sub-)project within a large monorepo.

In a very large project, e.g. for an autonomous vehicle computer, we create subprojects on domain level (e.g., fusion).
Linkage between domain level components and the top-level documentation is done via intersphinx and sphinx-needs external_needs.
We use a metamodel with system and software level needs (as defined by ASpice and ISO 26262),
so we want to link needs, like requirements, on SW level with needs on system level for bi-directional traceability.
The bi-directional traceability required in the HTML output requires bi-directional linkage between sphinx projects. (project_A <-> project_B).
We use breathe, doxysphinx or doxylink to integrate the C++ code documentation with Sphinx and sphinx-needs.

Linkage between the domain-level documentation in Sphinx and code-level documentation is done via importing Doxygen XML output or linking to doxygen by reading in the doxygen tag file from one or more Doxygen projects.
We set up separate Doxygen projects on package level for libraries.

We use Doxygen Tag files
to create automatic relative links for references across different Doxygen projects.

Linkage between single doxygen projects shall be unidirectional only (package A -> package B).
We have CI jobs to generate the full documentation for every PR.
We host all generated documentation for the monorepo on a webserver.

How to resolve the bi-directional dependencies

To achieve bi-directional linkage in sphinx output, we split the sphinx build for every project into two phases:

The "inventories build": We use a custom sphinx builder that generates only objects.inv and needs.json for each project. We suppress all warnings related to unknown link targets in this stage.
The "html bild with external links" is a re-build of the same project, but with a conf.py change:
All needs_external_needs and all sphinx.ext.intersphinx configurations that define linkage to other projects are now included. We exit with an error for all warnings in this stage.

Modelling the sphinx build in bazel

We can model the sphinx build as a Bazel target, because I developed a custom bazel build rule that calls sphinx-build (or, more precisely, a python script that invokes sphinx-build).
However, all build orchestration tools require an acyclic dependency graph; cyclic dependencies are typically considered an error.

By generating inventories for every project in a sphinx-build first, we break up the cyclic dependency, and we can model the resulting dependency graph in bazel by defining two targets for each sphinx project.

The "sphinx-inventories" build target declares "needs.json" and "objects.inv" as output.
The "sphinx-html" build target declares them as input, and declares a package of the generated HTML documentation as output:

Build performance problems

I created the modular documentation project approach because it scales better than including all pages of a large monorepo into a single sphinx project, which would have been the preferred solution.
The strongest advantage of a modular docu build is the possibility to cache generated builds, and only rebuild the documentation of parts that were changed. With bazel remote cache, this can dramatically reduce the duration of a docu build CI pipeline.
The disadvantage of the approach is that each project has to be built twice for a clean rebuild,
and sphinx will not re-use much of the doctrees cache in the second build because the configuration has changed w.r.t. the first build.
Luckily, the sphinx read phase is quite fast compared to the write phase (which is slower, especially with sphinx-needs), and we only have to build required inventories of other projects once when we work locally within a single project.

However, the sphinx write phase is still quite slow, especially with large sphinx-needs databases (we currently have ~20.000 internal + 30.000 imported needs in our "main" project).
And because we have to re-run the "sphinx html with external links" build every time something changes in any project that is referenced, the cache hit rate for the sphinx HTML build targets is pretty bad.

As a "workaround", other projects have already chosen solutions that explicitly accept that sphinx-needs links to "project-external" needs are broken in their HTML output,
checking the validity of all sphinx-needs links with other post-processing tools for validation of the need model.

However, our users (the developers), actually like the navigation across project boundaries by clicking on need links. So how can we have both, a fast docu build with few re-builds, and bi-directional HTML links when browsing the docs?

Design for a true modular sphinx-needs project build with bi-directional linkage

The need to re-build all sphinx projects after all inventories are available arises because sphinx HTML output is static HTML pages.

I propose to create a new sphinx-needs builder that creates HTML pages with placeholders for all unresolved need-links (like @needref{id}).
The sphinx HTML documentation is then augmented by client-side or server-side code that looks up unresolved need-links from a database (in the simplest case, all generated needs.json files or a combined all_needs.json) when a page is rendered in the browser, and inserts the information from the database in-place.

needtables are replaced by empty tables, and the dynamic code populates the table based on the database content when the page is rendered.

Handling sphinx.ext.intersphinx

Bi-directional linkage between projects with sphinx.ext.intersphinx shall be forbidden; only unidirectional linkage shall be allowed.

Special sphinx-needs directives, like .. document:: DOC_main_index, and references via :need:, can be used to replace the target label / reference feature of sphinx if links between projects in both directions shall be supported.

Summary and outlook

I have presented an approach to generate documentation for large monorepos containing many sphinx-projects that reference each other via sphinx-needs, that works at scale.

This approach, focusing on how to build a sphinx project containing references to external needs, can be combined with ubTrace for the rendering part, or can maybe combined with a lean solution to resolve needs links and populate needtable content at the time a page is rendered. As I am no expert for dynamic web page programming, I'd be happy to hear your take on that part, @danwos and @chrisjsewell and whoever wants to chime in 😄

chrisjsewell · 2024-08-16T14:16:53Z

chrisjsewell
Aug 16, 2024
Maintainer

Thanks for the write-up @arwedus, it's certainly very helpful!

I would initially pick up on I think a key point:

The sphinx HTML documentation is then augmented by client-side or server-side code that looks up unresolved need-links from a database (in the simplest case, all generated needs.json files or a combined all_needs.json) when a page is rendered in the browser, and inserts the information from the database in-place.

This is, I think, not really possible for a "statically served" site;
you are essentially advocating to have a server dynamically serving content, so that it can read server-side databases and supply end-points etc to the front-end

Obviously, there is plenty of content out there on static vs dynamic (e.g. https://www.wix.com/blog/static-vs-dynamic-website),
but from a sphinx perspective:
the core focus of sphinx is on static sites not dynamic, so now you have to develop entirely new "sphinx-server" tooling, that can take some output from sphinx and serve it (accounting for things like routing and updating dynamic content etc)

We are working on something like this in useblocks, but it is not trivial and there are trade-offs like:

you will probably no longer just be able to open the HTML files directly
you won't be able to host sites on traditional places like GH pages and https://about.readthedocs.com, that do not allow for a "bespoke server"

for inserting basic link routing, perhaps this is possible, for inserting whole needtables, thats definitely more difficult

To note also, in general, pushing all these things to "do it at render" (like also mentioned with mermaid vs platuml/graphviz),
obviously also comes with the trade-off that none of these things can really be validated at build time: links might not resolve, graphs may not render; you won't know unless you probably manually go through every page of the site.
(It's kind of like the classic, static vs dynamic programming debate: do you want to write in Rust, where you have to wait for compiler builds, but then have certain guarantees that your code will definitely work and be fast, or programme in Python with no build times, but no guarantees that your code won't be slow or fail entorely at runtime; there is no silver bullet 😅 )

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sphinx-needs HTML for modular sphinx projects with bi-directional links #1220

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Sphinx-needs HTML for modular sphinx projects with bi-directional links #1220

arwedus Aug 16, 2024

Multi-project Docu Build With Sphinx, Sphinx-needs, and Bazel

Hierarchical multi-project documentation project structure

How to resolve the bi-directional dependencies

Modelling the sphinx build in bazel

Build performance problems

Design for a true modular sphinx-needs project build with bi-directional linkage

Handling sphinx.ext.intersphinx

Summary and outlook

Replies: 1 comment

chrisjsewell Aug 16, 2024 Maintainer

arwedus
Aug 16, 2024

chrisjsewell
Aug 16, 2024
Maintainer