Replies: 1 comment
-
Thanks for the write-up @arwedus, it's certainly very helpful! I would initially pick up on I think a key point:
This is, I think, not really possible for a "statically served" site; Obviously, there is plenty of content out there on static vs dynamic (e.g. https://www.wix.com/blog/static-vs-dynamic-website), We are working on something like this in useblocks, but it is not trivial and there are trade-offs like:
for inserting basic link routing, perhaps this is possible, for inserting whole needtables, thats definitely more difficult To note also, in general, pushing all these things to "do it at render" (like also mentioned with mermaid vs platuml/graphviz), |
Beta Was this translation helpful? Give feedback.
-
Hi @danwos , @chrisjsewell, as we discussed a couple times, I have an idea for solving the scaling problem that comes up with sphinx, sphinx-needs, and large monorepos with projects that reference each other.
Since this is a rather special case for using sphinx and sphinx-needs, I wrote a whitepaper that sums up my approach. You can find my feature request to make it all work with less stress on the CI/CD environment at the end.
Multi-project Docu Build With Sphinx, Sphinx-needs, and Bazel
This documentation build strategy works best for large monorepos which use a multi-project structure for doxygen and sphinx documentation.
Hierarchical multi-project documentation project structure
Based on the sheer size of a true monorepo codebase, and if teams work mostly autonomous on their (sub-)projects, collaborating on interface level only, it makes sense to use use a multi-project documentation build to reduce the local docu build time for developers as well as the CI/CD build time.
To make the docu build in CI/CD environments scalable for large monorepos,
I devised a multi project set-up for doxygen and sphinx with inter-project linkage.
If combined with a build orchestration tool that supports hermetic, deterministic and modular builds with complete modelling of dependencies (like Bazel), the monorepo approach scales for docs-as-code as it does for compiled code.
The following diagram shows the project structure in an exemplary way:
Here's how it works:
Set up a separate Sphinx project for each separate (sub-)project within a large monorepo.
In a very large project, e.g. for an autonomous vehicle computer, we create subprojects on domain level (e.g., fusion).
Linkage between domain level components and the top-level documentation is done via intersphinx and sphinx-needs external_needs.
We use a metamodel with system and software level needs (as defined by ASpice and ISO 26262),
so we want to link needs, like requirements, on SW level with needs on system level for bi-directional traceability.
The bi-directional traceability required in the HTML output requires bi-directional linkage between sphinx projects. (project_A <-> project_B).
We use breathe, doxysphinx or doxylink to integrate the C++ code documentation with Sphinx and sphinx-needs.
Linkage between the domain-level documentation in Sphinx and code-level documentation is done via importing Doxygen XML output or linking to doxygen by reading in the doxygen tag file from one or more Doxygen projects.
We set up separate Doxygen projects on package level for libraries.
We use Doxygen Tag files
to create automatic relative links for references across different Doxygen projects.
Linkage between single doxygen projects shall be unidirectional only (package A -> package B).
We have CI jobs to generate the full documentation for every PR.
We host all generated documentation for the monorepo on a webserver.
How to resolve the bi-directional dependencies
To achieve bi-directional linkage in sphinx output, we split the sphinx build for every project into two phases:
objects.inv
andneeds.json
for each project. We suppress all warnings related to unknown link targets in this stage.All
needs_external_needs
and allsphinx.ext.intersphinx
configurations that define linkage to other projects are now included. We exit with an error for all warnings in this stage.Modelling the sphinx build in bazel
We can model the sphinx build as a Bazel target, because I developed a custom bazel build rule that calls sphinx-build (or, more precisely, a python script that invokes sphinx-build).
However, all build orchestration tools require an acyclic dependency graph; cyclic dependencies are typically considered an error.
By generating inventories for every project in a sphinx-build first, we break up the cyclic dependency, and we can model the resulting dependency graph in bazel by defining two targets for each sphinx project.
Build performance problems
I created the modular documentation project approach because it scales better than including all pages of a large monorepo into a single sphinx project, which would have been the preferred solution.
The strongest advantage of a modular docu build is the possibility to cache generated builds, and only rebuild the documentation of parts that were changed. With bazel remote cache, this can dramatically reduce the duration of a docu build CI pipeline.
The disadvantage of the approach is that each project has to be built twice for a clean rebuild,
and sphinx will not re-use much of the doctrees cache in the second build because the configuration has changed w.r.t. the first build.
Luckily, the sphinx read phase is quite fast compared to the write phase (which is slower, especially with sphinx-needs), and we only have to build required inventories of other projects once when we work locally within a single project.
However, the sphinx write phase is still quite slow, especially with large sphinx-needs databases (we currently have ~20.000 internal + 30.000 imported needs in our "main" project).
And because we have to re-run the "sphinx html with external links" build every time something changes in any project that is referenced, the cache hit rate for the sphinx HTML build targets is pretty bad.
As a "workaround", other projects have already chosen solutions that explicitly accept that sphinx-needs links to "project-external" needs are broken in their HTML output,
checking the validity of all sphinx-needs links with other post-processing tools for validation of the need model.
However, our users (the developers), actually like the navigation across project boundaries by clicking on need links. So how can we have both, a fast docu build with few re-builds, and bi-directional HTML links when browsing the docs?
Design for a true modular sphinx-needs project build with bi-directional linkage
The need to re-build all sphinx projects after all inventories are available arises because sphinx HTML output is static HTML pages.
I propose to create a new sphinx-needs builder that creates HTML pages with placeholders for all unresolved need-links (like
@needref{id}
).The sphinx HTML documentation is then augmented by client-side or server-side code that looks up unresolved need-links from a database (in the simplest case, all generated
needs.json
files or a combinedall_needs.json
) when a page is rendered in the browser, and inserts the information from the database in-place.needtables
are replaced by empty tables, and the dynamic code populates the table based on the database content when the page is rendered.Handling sphinx.ext.intersphinx
Bi-directional linkage between projects with sphinx.ext.intersphinx shall be forbidden; only unidirectional linkage shall be allowed.
Special sphinx-needs directives, like
.. document:: DOC_main_index
, and references via:need:
, can be used to replace the target label / reference feature of sphinx if links between projects in both directions shall be supported.Summary and outlook
I have presented an approach to generate documentation for large monorepos containing many sphinx-projects that reference each other via sphinx-needs, that works at scale.
This approach, focusing on how to build a sphinx project containing references to external needs, can be combined with ubTrace for the rendering part, or can maybe combined with a lean solution to resolve needs links and populate needtable content at the time a page is rendered. As I am no expert for dynamic web page programming, I'd be happy to hear your take on that part, @danwos and @chrisjsewell and whoever wants to chime in 😄
Beta Was this translation helpful? Give feedback.
All reactions