feat(dialtone-docs): DLT-3109 add markdown-to-JSON generator for AI docs by belumontoya · Pull Request #1178 · dialpad/dialtone

belumontoya · 2026-04-07T12:49:00Z

feat(dialtone-docs): DLT-3109 add markdown-to-JSON generator for AI docs

Obligatory GIF (super important!)

🛠️ Type Of Change

Feature

📖 Jira Ticket

https://dialpad.atlassian.net/browse/DLT-3109

📖 Description

Adds the markdown-to-JSON build pipeline for the dialtone-docs package:

src/generators/build-ai-docs.mjs — Reads all markdown files under src/content/, parses YAML frontmatter (type, category, keywords, ai_summary), strips markdown syntax, and compiles everything into dist/ai-docs.json — a flat JSON array of document entries for AI consumption.
src/utils/strip-markdown.mjs — Utility that strips frontmatter, code blocks, HTML, links, headings, emphasis, and other markdown syntax to produce searchable plain text.
package.json / project.json — Added build script and NX target so pnpm nx run dialtone-docs:build triggers the generator.
tests/tests/build-output.test.js — 11 tests validating the JSON output schema (required fields, types, no markdown artifacts in content, file path integrity, no duplicate IDs).
tests/tests/strip-markdown.test.js — Unit tests for the strip-markdown utility (headings, code blocks, links, emphasis, frontmatter removal).
tests/helpers/markdownParser.js — Refactored to import stripMarkdown/stripFrontmatter from the new utility instead of bundling its own copy.

💡 Context

The dialtone-docs package provides AI-discoverable documentation for the Dialtone monorepo. This PR adds Milestone 3: the build step that compiles markdown content into a structured JSON file (ai-docs.json). This JSON output will serve as the data source for MCP server and CLI search tools, enabling AI agents to search the entire documentation site programmatically.

Each JSON entry includes: id, title, type, category, keywords, summary, content (plain text), filePath, lastUpdated, and relatedPackages.

📝 Checklist

I have ensured no private Dialpad links or info are in the code or pull request description (Dialtone is a public repo!).
I have reviewed my changes.
I have added all relevant documentation.
I have considered the performance impact of my change.
I have added / updated unit tests.

🔮 Next Steps

Refactor existing test suite (consolidate 6 test files down to 3, remove hardcoded content assertions that do not scale)
Integrate ai-docs.json into the MCP server and CLI as a search data source

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 208c3b30bd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-04-07T12:52:33Z

packages/dialtone-docs/src/generators/build-ai-docs.mjs

+    keywords: Array.isArray(frontmatter.keywords) ? frontmatter.keywords : [],
+    summary: frontmatter.ai_summary ?? null,
+    content,
+    lastUpdated: frontmatter.last_updated ? String(frontmatter.last_updated) : null,


Serialize last_updated deterministically

Convert last_updated without String(...): gray-matter parses unquoted YAML dates (e.g. 2026-03-04) as Date, and String(date) emits a locale/timezone-dependent value. That makes ai-docs.json nondeterministic across environments and can even shift the calendar day (e.g. UTC date appears as previous day in US timezones), which breaks stable indexing and date-based consumers.

Useful? React with 👍 / 👎.

Brad Paugh (braddialpad)

Looks really good, perhaps my only concern would be the large amount of confusing regex to parse the markdown, however it is necessary for this change and also easier to understand in the age of AI.

Couple of small comments, nothing major.

Brad Paugh (braddialpad) · 2026-04-07T21:27:24Z

packages/dialtone-docs/tests/tests/build-output.test.js

+    for (const doc of docs) {
+      expect(doc.type, `"${doc.id}" type is null`).not.toBeNull();
+      expect(ALLOWED_TYPES, `"${doc.id}" invalid type "${doc.type}"`).toContain(doc.type);
+    }


For all tests where we are looping through arrays like this, we should be using test.each instead of a for loop.

A single failure will mask all subsequent ones when doing it the current way.

Agree, but one thing worth noting. docs is built in beforeAll, so test.each can't get it at test definition time. I could build synchronously at module scope or keep the loop, but use soft assertions. Any preference? Or a better idea?

Brad Paugh (braddialpad) · 2026-04-07T21:29:36Z

packages/dialtone-docs/src/utils/strip-markdown.mjs

+  blockquote: /^>\s?.*/gm,
+  horizontalRule: /^(?:[-*_]){3,}\s*$/gm,
+  emphasis: /[*_]{1,2}([^*_]+)[*_]{1,2}/g,
+};


This PATTERNS is a duplicate of the one in markdownParser.js. Is that intentional? They could get out of sync in future changes if they are both being used.

Nope, nice spotted, I'm actually thinking now to replace this with remove-markdown package to handle this instead of us doing it manually

Brad Paugh (braddialpad) · 2026-04-07T21:32:41Z

packages/dialtone-docs/tests/tests/build-output.test.js

+
+  test('type field uses allowed values', () => {
+    for (const doc of docs) {
+      expect(doc.type, `"${doc.id}" type is null`).not.toBeNull();


nit (optional): this first assertion is probably not necessary

- Fix non-deterministic date serialization: String(Date) is timezone-dependent, use toISOString().split('T')[0] for stable YYYY-MM-DD output - Remove redundant null assertion in type field test

feat(dialtone-docs): DLT-3109 add markdown-to-JSON generator for AI docs

208c3b3

belumontoya requested review from Brad Paugh (braddialpad), Francis Rupert (francisrupert), Ignacio Ropolo (iropolo) and Nina Repetto (ninarepetto) as code owners April 7, 2026 12:49

chatgpt-codex-connector bot reviewed Apr 7, 2026

View reviewed changes

Brad Paugh (braddialpad) reviewed Apr 7, 2026

View reviewed changes

fix(dialtone-docs): DLT-3109 address PR review feedback

df8b13d

- Fix non-deterministic date serialization: String(Date) is timezone-dependent, use toISOString().split('T')[0] for stable YYYY-MM-DD output - Remove redundant null assertion in type field test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dialtone-docs): DLT-3109 add markdown-to-JSON generator for AI docs#1178

feat(dialtone-docs): DLT-3109 add markdown-to-JSON generator for AI docs#1178
belumontoya wants to merge 2 commits intostagingfrom
feature/DLT-3109-ai-docs-generator

belumontoya commented Apr 7, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 7, 2026

Uh oh!

Brad Paugh (braddialpad) left a comment

Uh oh!

Brad Paugh (braddialpad) Apr 7, 2026

Uh oh!

belumontoya Apr 8, 2026

Uh oh!

Brad Paugh (braddialpad) Apr 7, 2026

Uh oh!

belumontoya Apr 8, 2026

Uh oh!

Brad Paugh (braddialpad) Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

belumontoya commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat(dialtone-docs): DLT-3109 add markdown-to-JSON generator for AI docs

Obligatory GIF (super important!)

🛠️ Type Of Change

📖 Jira Ticket

📖 Description

💡 Context

📝 Checklist

🔮 Next Steps

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Brad Paugh (braddialpad) left a comment

Choose a reason for hiding this comment

Uh oh!

Brad Paugh (braddialpad) Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

belumontoya Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Brad Paugh (braddialpad) Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

belumontoya Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Brad Paugh (braddialpad) Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

belumontoya commented Apr 7, 2026 •

edited

Loading