feat: add Rust MCP server + fix KaTeX rendering issues by userFRM · Pull Request #2 · tonydavis629/markxiv

userFRM · 2026-02-05T12:03:08Z

Summary

Rust MCP server that uses the markxiv library directly — no dependency on markxiv.org
KaTeX rendering fixes for common LaTeX commands that break in browser math rendering
Figure preservation with ar5iv links — addresses Feature request: add links to figures #1

Figure Links (closes #1)

Previously, <figure> blocks from pandoc output were stripped entirely. Now:

extract_figure_captions() converts <figure> blocks into numbered markdown blockquotes preserving caption text:
```
> **Figure 1:** Architecture of the proposed model
```
add_ar5iv_figure_links() enriches each figure reference with a link to the paper's ar5iv HTML page where figures are viewable:
```
> **Figure 1:** Architecture of the proposed model — [view on ar5iv](https://ar5iv.labs.arxiv.org/html/2107.02789)
```

Since ar5iv renames figure files to x1.png, x2.png etc. (unpredictable from LaTeX source), we link to the full ar5iv page rather than individual images.

KaTeX Fixes

Added regex-based post-processing in sanitize_markdown():

Issue	Before	After
Missing subscript	`\mathcal{X}{Y}`	`\mathcal{X}_{Y}`
Unsupported command	`\textsc{Algo}`	`\textbf{Algo}`
Algorithm pseudo-code	`\Call{Solve}{x}`	`\textbf{Solve}(x)`
Unsupported font	`\mathbbm{1}`	`\mathbb{1}`
Angle brackets in math	$a < b$ → HTML stripped	Math blocks preserved verbatim
Display math inline	`text $$x^2$$ text`	`$$x^2$$` on own line

Also added search() method to ArxivClient trait for arXiv keyword search.

MCP Server

A Rust-native MCP binary (markxiv-mcp) built with rmcp v0.14. Uses the markxiv library directly (no HTTP calls to markxiv.org).

Tools:

convert_paper — full arXiv paper → markdown via local pandoc pipeline
get_paper_metadata — title/authors/abstract lookup via arXiv API
search_papers — keyword search via arXiv Atom API

Claude Desktop config:

{
  "mcpServers": {
    "markxiv": {
      "command": "/path/to/markxiv-mcp"
    }
  }
}

Requirements: pandoc + pdftotext installed locally. The MCP binary can be distributed pre-built so end users don't need a Rust toolchain.

How this differs from other arXiv MCP servers

Most arXiv MCP servers (~10+ exist) either fetch raw LaTeX, scrape HTML, or do basic PDF text extraction. markxiv-mcp runs pandoc locally on actual LaTeX source for much higher fidelity markdown output — the same pipeline powering markxiv.org.

Sanitize Pipeline

The sanitize_markdown() pipeline now has 4 stages:

extract_figure_captions — convert <figure> blocks to numbered markdown blockquotes
fix_katex_commands — regex fixes for unsupported LaTeX commands
normalize_display_math — ensure $$...$$ blocks are on their own lines
strip_html_tags_preserve_math — remove remaining HTML while preserving math verbatim

Test Plan

cargo test --lib — all 50 unit tests pass
cargo build -p markxiv-mcp — MCP binary compiles
cargo build — main library compiles
Test MCP tools manually with Claude Desktop or MCP inspector
Verify KaTeX fixes against real papers with known rendering issues

- Add regex-based post-processing in sanitize_markdown(): - fix_katex_commands(): fixes \mathcal subscripts, \textsc→\textbf, \Call macro, \mathbbm→\mathbb - protect_math_angle_brackets(): replaces < and > with \langle/\rangle inside math delimiters before HTML stripping - Add SearchResult type and search() method to ArxivClient trait - Add parse_atom_search_results() for multi-entry Atom feed parsing - Add regex dependency - Add 7 unit tests for new KaTeX fix functions

Add a Rust-native MCP server (markxiv-mcp) that uses the markxiv library directly — no dependency on markxiv.org or any web service. Tools exposed: - convert_paper: full arXiv paper → markdown via pandoc pipeline - get_paper_metadata: title/authors/abstract lookup - search_papers: keyword search via arXiv Atom API Built with rmcp 0.14 (stdio transport). Users need pandoc + pdftotext installed locally. The MCP binary can be distributed pre-built so end users don't need a Rust toolchain.

The protect_math_angle_brackets() approach injected \langle/\rangle into all math blocks, which breaks inside text-mode commands like \texttt{<name>} where \langle is undefined. New approach: strip_html_tags_preserve_math() copies $...$ and $$...$$ blocks verbatim so angle brackets survive for KaTeX to handle natively. This fixes the ParseError on papers with \texttt{<...>} in math.

Adds normalize_display_math() to the sanitize pipeline to ensure $$...$$ display math blocks are isolated on their own lines. Prevents markdown renderers from misparsing inline $$...$$ as two $ inline math delimiters, which caused "Can't use function '$' in math mode" errors.

Instead of stripping <figure> blocks entirely, extract captions into markdown blockquotes and enrich them with ar5iv viewing links.

userFRM added 5 commits February 5, 2026 13:02

feat: preserve figure captions and link to ar5iv (closes tonydavis629#1)

57b32af

Instead of stripping <figure> blocks entirely, extract captions into markdown blockquotes and enrich them with ar5iv viewing links.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Rust MCP server + fix KaTeX rendering issues#2

feat: add Rust MCP server + fix KaTeX rendering issues#2
userFRM wants to merge 5 commits intotonydavis629:mainfrom
userFRM:feat/mcp-and-katex-fixes

userFRM commented Feb 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

userFRM commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Figure Links (closes #1)

KaTeX Fixes

MCP Server

How this differs from other arXiv MCP servers

Sanitize Pipeline

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

userFRM commented Feb 5, 2026 •

edited

Loading