feat(dialtone-docs): DLT-3109 add markdown-to-JSON generator for AI docs#1178
feat(dialtone-docs): DLT-3109 add markdown-to-JSON generator for AI docs#1178belumontoya wants to merge 2 commits intostagingfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 208c3b30bd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| keywords: Array.isArray(frontmatter.keywords) ? frontmatter.keywords : [], | ||
| summary: frontmatter.ai_summary ?? null, | ||
| content, | ||
| lastUpdated: frontmatter.last_updated ? String(frontmatter.last_updated) : null, |
There was a problem hiding this comment.
Serialize last_updated deterministically
Convert last_updated without String(...): gray-matter parses unquoted YAML dates (e.g. 2026-03-04) as Date, and String(date) emits a locale/timezone-dependent value. That makes ai-docs.json nondeterministic across environments and can even shift the calendar day (e.g. UTC date appears as previous day in US timezones), which breaks stable indexing and date-based consumers.
Useful? React with 👍 / 👎.
Brad Paugh (braddialpad)
left a comment
There was a problem hiding this comment.
Looks really good, perhaps my only concern would be the large amount of confusing regex to parse the markdown, however it is necessary for this change and also easier to understand in the age of AI.
Couple of small comments, nothing major.
| for (const doc of docs) { | ||
| expect(doc.type, `"${doc.id}" type is null`).not.toBeNull(); | ||
| expect(ALLOWED_TYPES, `"${doc.id}" invalid type "${doc.type}"`).toContain(doc.type); | ||
| } |
There was a problem hiding this comment.
For all tests where we are looping through arrays like this, we should be using test.each instead of a for loop.
A single failure will mask all subsequent ones when doing it the current way.
There was a problem hiding this comment.
Agree, but one thing worth noting. docs is built in beforeAll, so test.each can't get it at test definition time. I could build synchronously at module scope or keep the loop, but use soft assertions. Any preference? Or a better idea?
| blockquote: /^>\s?.*/gm, | ||
| horizontalRule: /^(?:[-*_]){3,}\s*$/gm, | ||
| emphasis: /[*_]{1,2}([^*_]+)[*_]{1,2}/g, | ||
| }; |
There was a problem hiding this comment.
This PATTERNS is a duplicate of the one in markdownParser.js. Is that intentional? They could get out of sync in future changes if they are both being used.
There was a problem hiding this comment.
Nope, nice spotted, I'm actually thinking now to replace this with remove-markdown package to handle this instead of us doing it manually
|
|
||
| test('type field uses allowed values', () => { | ||
| for (const doc of docs) { | ||
| expect(doc.type, `"${doc.id}" type is null`).not.toBeNull(); |
There was a problem hiding this comment.
nit (optional): this first assertion is probably not necessary
- Fix non-deterministic date serialization: String(Date) is
timezone-dependent, use toISOString().split('T')[0] for stable
YYYY-MM-DD output
- Remove redundant null assertion in type field test
feat(dialtone-docs): DLT-3109 add markdown-to-JSON generator for AI docs
Obligatory GIF (super important!)
🛠️ Type Of Change
📖 Jira Ticket
https://dialpad.atlassian.net/browse/DLT-3109
📖 Description
Adds the markdown-to-JSON build pipeline for the
dialtone-docspackage:src/generators/build-ai-docs.mjs— Reads all markdown files undersrc/content/, parses YAML frontmatter (type, category, keywords, ai_summary), strips markdown syntax, and compiles everything intodist/ai-docs.json— a flat JSON array of document entries for AI consumption.src/utils/strip-markdown.mjs— Utility that strips frontmatter, code blocks, HTML, links, headings, emphasis, and other markdown syntax to produce searchable plain text.package.json/project.json— Addedbuildscript and NX target sopnpm nx run dialtone-docs:buildtriggers the generator.tests/tests/build-output.test.js— 11 tests validating the JSON output schema (required fields, types, no markdown artifacts in content, file path integrity, no duplicate IDs).tests/tests/strip-markdown.test.js— Unit tests for the strip-markdown utility (headings, code blocks, links, emphasis, frontmatter removal).tests/helpers/markdownParser.js— Refactored to importstripMarkdown/stripFrontmatterfrom the new utility instead of bundling its own copy.💡 Context
The
dialtone-docspackage provides AI-discoverable documentation for the Dialtone monorepo. This PR adds Milestone 3: the build step that compiles markdown content into a structured JSON file (ai-docs.json). This JSON output will serve as the data source for MCP server and CLI search tools, enabling AI agents to search the entire documentation site programmatically.Each JSON entry includes:
id,title,type,category,keywords,summary,content(plain text),filePath,lastUpdated, andrelatedPackages.📝 Checklist
🔮 Next Steps
ai-docs.jsoninto the MCP server and CLI as a search data source