Skip to content

Commit b239fd1

Browse files
authored
Merge pull request #24170 from dvdksn/freshness-scanner
ci: freshness agent + nightly repo scan
2 parents f33d670 + cfa99e6 commit b239fd1

File tree

2 files changed

+188
-0
lines changed

2 files changed

+188
-0
lines changed

.github/agents/docs-scanner.yaml

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# yaml-language-server: $schema=https://raw.githubusercontent.com/docker/cagent/refs/heads/main/cagent-schema.json
2+
models:
3+
claude-sonnet:
4+
provider: anthropic
5+
model: claude-sonnet-4-5
6+
max_tokens: 8192
7+
temperature: 0.3
8+
9+
agents:
10+
root:
11+
model: claude-sonnet
12+
description: Daily documentation freshness scanner for Docker docs
13+
add_prompt_files:
14+
- STYLE.md
15+
instruction: |
16+
You are an experienced technical writer reviewing Docker documentation
17+
(https://docs.docker.com/) for freshness issues. The docs are maintained
18+
in this repository under content/. Your job is to read a subsection of
19+
the docs, identify genuine quality problems, and file GitHub issues for
20+
the ones worth fixing.
21+
22+
## Setup
23+
24+
1. Call `get_memories` to find which subsection to scan next
25+
2. Discover the structure: `list_directory content/`, then drill down
26+
through `content/manuals/` to find a leaf subsection not recently
27+
scanned. Skip: content/reference/, content/languages/, content/tags/,
28+
content/includes/
29+
3. Call `directory_tree` on that subsection and read all its files
30+
4. File issues for what you find (max 3 per run)
31+
5. Call `add_memory` with `scanned: <subsection> YYYY-MM-DD`
32+
33+
## What good issues look like
34+
35+
You're looking for things a reader would actually notice as wrong or
36+
confusing. Good issues are specific, verifiable, and actionable. The
37+
kinds of things worth filing:
38+
39+
- **Stale framing**: content that describes a completed migration,
40+
rollout, or transition as if it's still in progress ("is transitioning
41+
to", "will replace", "ongoing integration")
42+
- **Time-relative language**: "currently", "recently", "coming soon",
43+
"new in X.Y" — STYLE.md prohibits these because they go stale silently
44+
- **Cross-reference drift**: an internal link whose surrounding context
45+
no longer matches what the linked page actually covers; a linked
46+
heading that no longer exists
47+
- **Sibling contradictions**: two pages in the same directory that give
48+
conflicting information about the same feature or procedure
49+
- **Missing deprecation notices**: a page describing a feature you know
50+
is deprecated or removed, with no notice pointing users elsewhere
51+
52+
## What not to file
53+
54+
- Broken links (htmltest catches these)
55+
- Style and formatting issues (Vale and markdownlint catch these)
56+
- Anything that is internally consistent — if the front matter, badges,
57+
and prose all agree, the page is accurate even if it mentions beta
58+
status or platform limitations
59+
- Suspicions you can't support with text from the file
60+
61+
## Filing issues
62+
63+
Check for duplicates first:
64+
```bash
65+
FILE_PATH="path/to/file.md"
66+
gh issue list --label "agent/generated" --state open --search "in:body \"$FILE_PATH\""
67+
```
68+
69+
Then create:
70+
```bash
71+
ISSUE_TITLE="[docs-scanner] Brief description"
72+
cat << 'EOF' | gh issue create \
73+
--title "$ISSUE_TITLE" \
74+
--label "agent/generated" \
75+
--body-file -
76+
**File:** `path/to/file.md`
77+
78+
### Issue
79+
80+
What's wrong, with an exact quote from the file:
81+
82+
> quoted text
83+
84+
### Suggested fix
85+
86+
What should change.
87+
88+
---
89+
*Found by nightly documentation freshness scanner*
90+
EOF
91+
```
92+
93+
## Output
94+
95+
```
96+
SCAN COMPLETE
97+
Subsection: content/manuals/desktop/features/
98+
Files checked: N
99+
Issues created: N
100+
- #123: [docs-scanner] Issue title
101+
```
102+
103+
toolsets:
104+
- type: filesystem
105+
tools:
106+
- read_file
107+
- read_multiple_files
108+
- list_directory
109+
- directory_tree
110+
- type: memory
111+
path: .cache/scanner-memory.db
112+
- type: shell
113+
114+
permissions:
115+
allow:
116+
- shell:cmd=gh issue list --*
117+
- shell:cmd=gh issue create --*
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
name: Nightly Documentation Scan
2+
3+
on:
4+
schedule:
5+
# Run every day at 3am UTC
6+
- cron: "0 3 * * *"
7+
workflow_dispatch:
8+
inputs:
9+
dry-run:
10+
description: "Report issues but do not create them"
11+
type: boolean
12+
default: false
13+
14+
permissions:
15+
contents: read
16+
issues: write
17+
18+
concurrency:
19+
group: nightly-docs-scan
20+
cancel-in-progress: false
21+
22+
jobs:
23+
scan:
24+
runs-on: ubuntu-latest
25+
timeout-minutes: 20
26+
env:
27+
HAS_APP_SECRETS: ${{ secrets.CAGENT_REVIEWER_APP_ID != '' }}
28+
29+
steps:
30+
- name: Checkout repository
31+
uses: actions/checkout@v5
32+
with:
33+
fetch-depth: 1
34+
35+
- name: Ensure cache directory exists
36+
run: mkdir -p "${{ github.workspace }}/.cache"
37+
38+
- name: Restore scanner memory
39+
uses: actions/cache/restore@1bd1e32a3bdc45362d1e726936510720a7c30a57 # v4.2.0
40+
with:
41+
path: ${{ github.workspace }}/.cache/scanner-memory.db
42+
key: docs-scanner-memory-${{ github.repository }}-${{ github.run_id }}
43+
restore-keys: |
44+
docs-scanner-memory-${{ github.repository }}-
45+
46+
- name: Generate GitHub App token
47+
if: env.HAS_APP_SECRETS == 'true'
48+
id: app-token
49+
continue-on-error: true
50+
uses: tibdex/github-app-token@3beb63f4bd073e61482598c45c71c1019b59b73a # v2
51+
with:
52+
app_id: ${{ secrets.CAGENT_REVIEWER_APP_ID }}
53+
private_key: ${{ secrets.CAGENT_REVIEWER_APP_PRIVATE_KEY }}
54+
55+
- name: Run documentation scan
56+
uses: docker/cagent-action@latest
57+
env:
58+
GH_TOKEN: ${{ steps.app-token.outputs.token || github.token }}
59+
with:
60+
agent: ${{ github.workspace }}/.github/agents/docs-scanner.yaml
61+
prompt: "${{ inputs['dry-run'] && 'DRY RUN MODE: Do not create any GitHub issues. Report what you would create but skip the gh issue create commands.' || 'Run the nightly documentation scan as described in your instructions.' }}"
62+
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
63+
github-token: ${{ steps.app-token.outputs.token || github.token }}
64+
timeout: 1200
65+
66+
- name: Save scanner memory
67+
uses: actions/cache/save@1bd1e32a3bdc45362d1e726936510720a7c30a57 # v4.2.0
68+
if: always()
69+
with:
70+
path: ${{ github.workspace }}/.cache/scanner-memory.db
71+
key: docs-scanner-memory-${{ github.repository }}-${{ github.run_id }}

0 commit comments

Comments
 (0)