How content changes in enki

This describes the overall steps that content goes through. Several of these steps can be grouped together into a larger step. Combining the steps would reduce the amount of validation after each step, the size of the step-dependency graph people have to keep in their heads, and reduce the amout of files that need to be documented (as output of a step and input to another step).

Overall Process
What happens in each step
Required languages
- Validation

Overall Process

Listed here is the pipeline steps grouped together into steps that could be combined.

git-fetch: Clone URL & checkout commit
git-fetch-metadata, git-assemble, git-assemble-meta
- Replace <md:metadata> and move images to ../resources/{sha}, Convert CNXML to HTML and assemble all the files together, extract abstract & revised date for each Page
git-bake
git-bake-meta, git-link
- Create a book metadata JSON file with slugs and abstracts, add attributes to links for REX so it knows the canonical book
Output-specific steps

What happens in each step

git-fetch

This just runs an authenticated git clone and checks out the correct branch/commit.

Validation:

Use POET CLI to validate the results.

git-fetch-metadata

Three things happen.

Replace <md:metadata> in CNXML and collxml files
Move images/resources into ../resources/
Copy the web style into the resources directory (for use by REX)

fetch-update-metadata

The metadata in every CNXML file is replaced with 2 fields: revised and canonical-book-uuid
The metadata in every Collection file is replaced with 2 fields: revised and version

fetch-map-resources

Move resources into a /resources/{sha} format
update the CNXML references to these resource files
generate neighboring JSON files for AWS to help set the content type when browsers fetch the resource

Validation:

The results can be validated like so:

Should still validate using POET CLI (maybe a minor tweak is necessary?)
Every <link resource="..." and <image src="..." should begin with ../resources/
Every <md:metadata> in the CNXML file and the COLLXML file should contain 2 entries
A style file exists in ../resources/ wit hcorresponding sourcemap if it exists

git-assemble

This step performs several things to convert every collection.xml file into a gigantic {slug}.collection.xhtml:

Convert every CNXML file to an XHTML file
Prefix id attributes and links to those attributes with the module ID so they are unique once they are combined into one XML document
Depending on the <c:link> type, convert it to <a href="/contents/{MODULE_ID}"> (other-book link), <a href="#page_{PAGE_ID}"> (same-book link), or <a href="#page_{PAGE_ID}_{TARGET_ID}"> (element on a page), or <a href="https://cnx.org/content/{PAGE_ID}"> if the page does not exist in the REPO
- Also add class="autogenerated-content" if CNXML does not have any link text
- See https://openstax.atlassian.net/wiki/spaces/CE/pages/1759707137/Pipeline+Pipeline+Task+Definitions#Git-Links examples
Fetch exercise JSON and TeX Math from exercises.openstax.org and convert it to HTML and MathML
- Check if the TeX to MathML is dead. Because the code supposedly calls the MMLCloud API: https://github.com/openstax/cnx-epub/blob/master/cnxepub/formatters.py#L328
When injected exercises have a cnx-context tag then resolve whether the exercise context should like to an element on this page, another page in this book, or a page in another book: https://github.com/openstax/cnx-epub/blob/master/cnxepub/formatters.py#L382
Write the book out using this template (Do we need most of this?): https://github.com/openstax/cnx-epub/blob/master/cnxepub/formatters.py#L602
A ToC is added to the top of the gigantic XHTML file: https://github.com/openstax/cnx-epub/blob/master/cnxepub/formatters.py#L932

Validation:

XHTML validator should pass for every assembled XHTML file
Some RNG to validate the root elements (unit, chapter, page).

git-assemble-meta

Generates a {slug}.assembled-metadata.json file which contains the abstract and revised date for each Page:

{
    "{page_uuid}": { abstract: "...", revised: "2022-..." },
    "{page_uuid}": { abstract: "...", revised: "2022-..." },
    "{page_uuid}": { abstract: "...", revised: "2022-..." }
}

Validation:

A JSONSchema for each JSON file.

git-bake

CS-Styles takes it over from here and bakes the big XHTML file using a Ruby recipe

Validation:

XHTML validator
The top elements that the disassembler looks for should be defined in an RNG

git-bake-meta

Create a {slug}.baked-metadata.json which contains everything in {slug}.assembled-metadata.json plus a book entry:

{
    "{page_uuid}": { abstract: "...", revised: "2022-..." }
    "{book_uuid}@{ver}": { 
        id: "{book_uuid}",
        title: "Algebra", 
        revised: "2022-...", 
        slug: "algebra-trig",
        version: "359e7eb",
        language: "en",
        license: {
            url: "http://creativecommons.org/licenses/by/1.0",
            name: "Creative Commons Attribution License"
        },
        tree: {
            id: "{uuid}",
            title: "Title of the chapter",
            contents: [
                id: "",
                title: "<span>Title with</span> Markup",
                slug: "1-1-addition"
            ]
        }
    }
}

Validation:

JSONSchema on each generated book's JSON file.
Maybe XHTML validation on each Baked XHTML file.

git-link

For links to other books, this step adds attributes on the link so REX will be able to choose the right book to link to:

data-book-uuid="..."
data-book-slug="..."
data-page-slug="..."

Validation:

Whatever REX expects these files to have.

Output-specific

This is the end of the common parts of the pipeline. Here things diverge for each output.

Required languages

Ruby for baking
Something that supports parsing XML/JSON with source line/column numbers (Sourcemaps) to run all the other steps

Validation

TypeScript for POET CLI
Java: for XHTML and RNG validation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs-pipeline.md

docs-pipeline.md

How content changes in enki

Overall Process

What happens in each step

git-fetch

git-fetch-metadata

fetch-update-metadata

fetch-map-resources

git-assemble

git-assemble-meta

git-bake

git-bake-meta

git-link

Output-specific

Required languages

Validation

Files

docs-pipeline.md

Latest commit

History

docs-pipeline.md

File metadata and controls

How content changes in enki

Overall Process

What happens in each step

git-fetch

git-fetch-metadata

fetch-update-metadata

fetch-map-resources

git-assemble

git-assemble-meta

git-bake

git-bake-meta

git-link

Output-specific

Required languages

Validation