Skip to content

feat: initialize wiki-referencify prototype CLI tool#2

Open
konard wants to merge 6 commits intomainfrom
issue-1-65892e9ff94d
Open

feat: initialize wiki-referencify prototype CLI tool#2
konard wants to merge 6 commits intomainfrom
issue-1-65892e9ff94d

Conversation

@konard
Copy link
Member

@konard konard commented Mar 9, 2026

Summary

Implements the prototype of the globally installable npm CLI tool wiki-referencify that covers most concepts/terms in a markdown document with Wikipedia links.

Fixes #1

What was implemented

Core algorithm

  • Longest-match first: Prefers finding the longest sequences of words that match a single Wikipedia article (e.g., prefers "Machine learning" over "Machine" + "learning" separately)
  • Greedy approach: Processes tokens left-to-right, taking the longest matching phrase at each position
  • Disambiguation page support: If a matched term has a disambiguation page (e.g., Atlas_(disambiguation)), links to the disambiguation page first — per spec
  • Auto-disambiguation mode: With --auto-disambiguation flag, collects contexts from exactly matched articles and skips disambiguation pages for a cleaner output

Wikipedia API integration

  • Uses the real Wikipedia API (https://en.wikipedia.org/w/api.php) to check if phrases exist as articles
  • File-based caching in system temp dir to avoid repeated API calls
  • Batches up to 50 titles per API request
  • Handles normalization and redirects correctly

Markdown-aware processing

  • Skips headers (ATX # style and setext ===/--- style)
  • Skips existing links (doesn't re-link already linked text)
  • Skips code blocks (fenced ```, inline `, and indented)
  • Skips table headers (first row of markdown tables)
  • Respects formatting markers (bold **, italic * — prevents phrases from spanning across formatting boundaries)

Filtering

  • Stopword list: Common English function words, generic adjectives, and overly broad nouns are excluded from single-word matches
  • Multi-word phrases bypass the stopword filter (they're more likely to be actual concepts)

CLI tool

  • Globally installable via npm install -g wiki-referencify
  • Binary: wiki-referencify [options] [file]
  • Reads from stdin if no file is provided
  • Uses lino-arguments for option parsing with environment variable support
Usage: wiki-referencify [options] [file]

Options:
  --auto-disambiguation  Auto-resolve disambiguation using context  [boolean]
  --max-phrase-length    Max words to consider as a single phrase  [number]
  -v, --verbose          Enable verbose logging to stderr           [boolean]
  -h, --help             Show help                                  [boolean]

Example

Input:

# Introduction to Machine Learning

Machine learning is a branch of artificial intelligence. Python is
a popular programming language for machine learning.

Output:

# Introduction to Machine Learning

[Machine learning](https://en.wikipedia.org/wiki/Machine_learning) is a
[branch](https://en.wikipedia.org/wiki/branch_(disambiguation)) of
[artificial intelligence](https://en.wikipedia.org/wiki/artificial_intelligence_(disambiguation)).
[Python](https://en.wikipedia.org/wiki/Python) is a popular
[programming language](https://en.wikipedia.org/wiki/Programming_language) for
[machine learning](https://en.wikipedia.org/wiki/Machine_learning).

Architecture

src/
  cli.js              # CLI entry point using lino-arguments
  index.js            # Module exports
  index.d.ts          # TypeScript type definitions
  referencify.js      # Core algorithm orchestration
  markdown-processor.js  # Markdown parsing, skip regions, n-gram generation
  wikipedia-api.js    # Wikipedia API client with caching

Testing

  • 26 unit tests covering all core modules
  • Tests work on Node.js, Bun, and Deno (via test-anywhere framework)
  • All CI checks pass: ESLint, Prettier, jscpd duplication check

Test plan

  • Run npm test — all 26 tests pass
  • Run npm run check — lint, format, duplication all pass
  • Test CLI with stdin: echo "# Test\n\nPython is a programming language." | wiki-referencify
  • Test CLI with file argument: wiki-referencify examples/basic-usage.js
  • Verify headers are not linked
  • Verify existing links are not re-linked
  • Verify longer phrases are preferred over single words
  • Verify disambiguation URLs are used when disambiguation pages exist

🤖 Generated with Claude Code

Adding .gitkeep for PR creation (default mode).
This file will be removed when the task is complete.

Issue: #1
@konard konard self-assigned this Mar 9, 2026
Implements the core functionality of the globally installable npm CLI tool
`wiki-referencify` that covers concepts/terms in markdown documents with
Wikipedia links.

Key features:
- Wikipedia API integration with file-based caching
- Longest-match algorithm (prefers longer phrases over single words)
- Disambiguation page support (links to disambiguation pages per spec)
- --auto-disambiguation flag to auto-resolve ambiguous terms
- Markdown-aware processing (skips headers, existing links, code blocks)
- Stopword filtering to avoid linking trivial words
- stdin/file input modes
- lino-arguments for CLI option parsing with env var support

Fixes #1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@konard konard changed the title [WIP] Initialize repository with prototype of the globally installable npm CLI tool named wiki-referencify feat: initialize wiki-referencify prototype CLI tool Mar 9, 2026
konard and others added 3 commits March 9, 2026 17:58
All scripts had the template placeholder 'my-package' which caused the
changeset validation CI step to fail. Updated to 'wiki-referencify' to
match the actual package name in package.json.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deno requires 'node:' prefix for Node.js built-in modules (crypto, fs,
path, os). Without this prefix, Deno's type checker fails with TS2307
errors. This fixes the Deno test failures in CI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deno requires --allow-env to access environment variables like TMPDIR.
Using tmpdir() at module load time (top-level const) caused Deno tests
to fail with NotCapable errors even without caching being exercised.

Changed to a lazy getCacheDir() function called only when actually
caching API results, so importing the module doesn't require env access.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@konard konard marked this pull request as ready for review March 9, 2026 18:06
@konard
Copy link
Member Author

konard commented Mar 9, 2026

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $8.423931
  • Calculated by Anthropic: $6.704218 USD
  • Difference: $-1.719712 (-20.41%)
    📎 Log file uploaded as Gist (5249KB)
    🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard
Copy link
Member Author

konard commented Mar 9, 2026

🔄 Auto-restart 1/3

Detected uncommitted changes from previous run. Starting new session to review and commit them.

Uncommitted files:

?? issue-details.txt
?? lint-fix-results.txt
?? lint-results.txt
?? lint-results2.txt
?? test-results.txt

Auto-restart will stop after changes are committed or after 2 more iterations. Please wait until working session will end and give your feedback.

@konard
Copy link
Member Author

konard commented Mar 9, 2026

🔄 Auto-restart 1/3 Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $0.324616
  • Calculated by Anthropic: $0.246044 USD
  • Difference: $-0.078572 (-24.20%)
    📎 Log file uploaded as Gist (5825KB)
    🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard
Copy link
Member Author

konard commented Mar 9, 2026

✅ Ready to merge

This pull request is now ready to be merged:

  • All CI checks have passed
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Initialize repository with prototype of the globally installable npm CLI tool named wiki-referencify

1 participant