Skip to content

feat: integrate external academic paper search via paper-search-mcp#191

Closed
tnkshuuhei wants to merge 3 commits intodevfrom
feat/paper-search-mcp
Closed

feat: integrate external academic paper search via paper-search-mcp#191
tnkshuuhei wants to merge 3 commits intodevfrom
feat/paper-search-mcp

Conversation

@tnkshuuhei
Copy link
Member

Add MCP-based external paper search (PubMed, arXiv, Google Scholar) to supplement internal evidence matching. Papers are fetched for edges with insufficient internal evidence, cached for 24 hours, and displayed in the canvas UI alongside attested evidence.

Add MCP-based external paper search (PubMed, arXiv, Google Scholar) to
supplement internal evidence matching. Papers are fetched for edges with
insufficient internal evidence, cached for 24 hours, and displayed in
the canvas UI alongside attested evidence.
@vercel
Copy link

vercel bot commented Feb 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
muse Ready Ready Preview, Comment Feb 26, 2026 4:58am

@tnkshuuhei tnkshuuhei linked an issue Feb 19, 2026 that may be closed by this pull request
@tnkshuuhei tnkshuuhei mentioned this pull request Feb 19, 2026
@claude
Copy link

claude bot commented Feb 19, 2026

PR Review: feat: integrate external academic paper search via paper-search-mcp

Overall the approach is sound — gating external search behind a feature flag, caching results, and using Promise.allSettled to handle partial failures gracefully. A few issues need attention before this is ready to merge.


Bugs

1. Operator precedence bug in normalizeRawPaper (lib/external-paper-search.ts)

// Line ~567
const id = generateExternalPaperId(source, doi || raw.paper_id ? String(raw.paper_id) : title);

Due to JS operator precedence, || binds tighter than ?:, so this parses as:

(doi || raw.paper_id) ? String(raw.paper_id) : title

When a paper has a DOI but no paper_id, the condition is truthy (from doi) but the consequent evaluates String(undefined)"undefined". Every such paper from the same source gets the same hash, causing ID collisions and breaking deduplication.

Fix:

const id = generateExternalPaperId(source, doi || (raw.paper_id ? String(raw.paper_id) : title));

2. Missing source labels in EvidenceDialog.tsx

The source badge mapping only handles "pubmed" and "semantic_scholar". Papers from callSearchTool(..., "search_arxiv", ..., "arxiv") and "google_scholar" fall through to the raw string. Users will see "arxiv" and "google_scholar" instead of "arXiv" and "Google Scholar".

// Add to the ternary chain or switch to a lookup object:
const SOURCE_LABELS: Record<string, string> = {
  pubmed: "PubMed",
  semantic_scholar: "Semantic Scholar",
  arxiv: "arXiv",
  google_scholar: "Google Scholar",
};

Code Quality

3. Sequential awaits in the workflow step

mastra/workflows/logic-model-with-evidence.ts iterates arrows with for...of + await sequentially. Each searchExternalPapersForEdge call hits 3 external APIs (already parallelised internally). A canvas with 10 edges that all miss the cache will do 10 sequential fan-outs.

Prefer Promise.all across arrows:

await Promise.all(
  canvasData.arrows.map(async (arrow: Arrow) => {
    // ...
    externalPapersByArrow[arrow.id] = await searchExternalPapersForEdge(...);
  })
);

4. Double feature-flag check

EXTERNAL_SEARCH_ENABLED is checked both inside searchExternalPapersForEdge and at the top of the workflow step. Pick one authoritative gate — the workflow step is the right place since it controls the execution path. The library function's guard is redundant and can be confusing.

5. cachedTools can trigger duplicate listToolsets() calls

async function getTools(): Promise<Record<string, any>> {
  if (!cachedTools) {
    const toolsets = await paperSearchClient.listToolsets();
    cachedTools = toolsets["paperSearch"] || {};
  }
  return cachedTools;
}

Two concurrent requests that both see cachedTools === null will both call listToolsets(), potentially spawning duplicate MCP processes. Cache the promise instead:

let cachedToolsPromise: Promise<Record<string, unknown>> | null = null;

function getTools() {
  if (!cachedToolsPromise) {
    cachedToolsPromise = paperSearchClient.listToolsets().then(ts => ts["paperSearch"] ?? {});
  }
  return cachedToolsPromise;
}

6. Extensive use of any

getTools returns Record<string, any>, and callSearchTool accepts Record<string, any>. With strict mode on and the MCP SDK available, it should be possible to type these more precisely, or at least use unknown and narrow at the call site.

7. Comment numbering

Step 2.5 in the workflow comment is unconventional. Consider renumbering to 1 → 2 → 3 → 4.


UX

8. Two buttons open the same dialog

When an edge has both internal evidence and external papers, two buttons render (green FileText, blue BookOpen). Clicking either opens the same combined EvidenceDialog. This is not obvious from the tooltip text ("View evidence" vs "View academic papers"). Consider collapsing into a single button or making the dialog tab-based.

9. Section header absent when only external papers exist

The Academic Papers (Reference) header only renders when hasEvidence && hasExternalPapers. When an edge has only external papers and no internal evidence, the papers are shown without any header to explain their provenance or that they are unvalidated references.


Security

10. URL validation for paper.url and paper.doi

href={paper.doi ? `https://doi.org/${paper.doi}` : paper.url}

paper.url is used directly from MCP output without validation. A javascript: scheme URL would execute in-browser. rel="noopener noreferrer" helps with window isolation but does not prevent javascript: protocol execution in href. Validate URLs before rendering:

function isSafeUrl(url: string | undefined): url is string {
  if (!url) return false;
  try {
    const { protocol } = new URL(url);
    return protocol === "https:" || protocol === "http:";
  } catch {
    return false;
  }
}

11. Google Scholar scraping

The PR description lists Google Scholar as a source via search_google_scholar. Automated scraping of Google Scholar violates their ToS and commonly results in IP blocks. Worth confirming that paper-search-mcp uses an official API (it doesn't appear to — Scholar has no public API). This could cause intermittent failures in dev and is not suitable for production.


Minor

  • buildSearchQuery truncates at 100 chars with no word-boundary awareness, so the query can end mid-word. Consider truncating at the last space before the limit.
  • The in-memory searchCache won't survive Next.js hot reloads, so the "24 hour TTL" note is aspirational in dev. A comment clarifying this is dev-only is enough.
  • arrow.externalPapers = edge.data.externalPapers as any[] in react-flow-utils.ts loses type safety; casting to ExternalPaper[] is safer and consistent with the defined types.

The feature flag approach and graceful degradation (empty array on failure) are good patterns. Addressing the ID collision bug (#1) and URL validation (#10) are the most important before merging.

@tnkshuuhei tnkshuuhei changed the base branch from upgrade/mastra_v1 to dev February 25, 2026 09:13
@claude
Copy link

claude bot commented Feb 25, 2026

Code Review: feat: integrate external academic paper search via paper-search-mcp

Overall this is a well-structured addition — the feature is cleanly scoped, the type definitions are solid, and the opt-in flag (EXTERNAL_SEARCH_ENABLED) is the right default posture. A few issues need attention before merging.


Bugs

1. Operator precedence bug in generateExternalPaperId (lib/external-paper-search.ts)

// Current — parsed as: (doi || raw.paper_id) ? String(raw.paper_id) : title
const id = generateExternalPaperId(source, doi || raw.paper_id ? String(raw.paper_id) : title);

When doi is present but raw.paper_id is undefined, the condition is truthy and String(raw.paper_id) evaluates to "undefined". Every paper in that situation gets a hash of the literal string "undefined" — a collision on IDs. Fix:

const id = generateExternalPaperId(source, doi || (raw.paper_id ? String(raw.paper_id) : title));

2. Two separate buttons open the same dialog (components/canvas/EvidenceEdge.tsx)

Both the green FileText button and the blue BookOpen button call setDialogOpen(true) with no distinction. Users see two buttons implying different actions, but clicking either shows the same mixed dialog. Either merge them into a single button (update icon/colour based on what content is present), or pass a defaultTab prop to the dialog so each button opens to the relevant section.


Performance

3. Sequential external search per arrow (mastra/workflows/logic-model-with-evidence.ts)

for (const arrow of canvasData.arrows) {
  externalPapersByArrow[arrow.id] = await searchExternalPapersForEdge(...);
}

Each call spawns the Python MCP process and awaits up to three network round-trips (PubMed + arXiv + Semantic Scholar). For a model with 10+ arrows this compounds badly. Use Promise.allSettled across arrows, in the same pattern already used inside searchExternalPapersForEdge for the three backends.

4. In-memory cache is ephemeral in serverless deployments (lib/external-paper-search.ts)

const searchCache = new Map<string, CacheEntry>() lives only in module scope. On Vercel/serverless each cold start gets a fresh process — the "24-hour cache" comment is misleading. If production use is intended, externalise the cache (Redis, Upstash, etc.). If this is dev-only, update the comment accordingly.

The Map also grows without bound — entries are evicted only on a TTL miss during a read. Under sustained load with diverse queries this is a memory leak. A simple size cap would help.

5. MCP cold-start may exceed the 30 s timeout (mastra/mcp/paper-search-client.ts)

uv run --with paper-search-mcp ... downloads and installs the Python package on first invocation. On a CI runner or clean deployment this can easily exceed timeout: 30000. Consider pre-warming or documenting that uv must be run once manually before the server starts.


Security

6. Unvalidated URLs from the external MCP server

In normalizeRawPaper, raw URL strings are stored and then placed directly into href in the dialog:

href={paper.doi ? `https://doi.org/${paper.doi}` : paper.url}

A compromised or misconfigured MCP server could return javascript:... URLs. Add a protocol check before storing:

function safeUrl(raw: unknown): string | undefined {
  const s = raw ? String(raw) : '';
  return s.startsWith('https://') || s.startsWith('http://') ? s : undefined;
}

7. Google Scholar ToS

search_google_scholar is called in a production-eligible code path. Google Scholar prohibits automated scraping and may block IPs or trigger legal issues. Consider removing it from the default search set or gating it behind an additional explicit opt-in flag.


Minor Issues

8. Missing arXiv label in the UI (components/canvas/EvidenceDialog.tsx)

The ternary chain maps pubmed"PubMed" and semantic_scholar"Semantic Scholar" but has no case for "arxiv", so it falls through to paper.source and displays the raw lowercase string. Add paper.source === 'arxiv' ? 'arXiv' : to the chain.

9. Race condition in getTools() (lib/external-paper-search.ts)

Two concurrent calls can both observe cachedTools === null and both call listToolsets(). Fix with a promise-based singleton:

let toolsPromise: Promise<Record<string, any>> | null = null;
async function getTools() {
  if (!toolsPromise) {
    toolsPromise = paperSearchClient.listToolsets().then(ts => ts['paperSearch'] || {});
  }
  return toolsPromise;
}

10. Misleading constant comment (lib/constants.ts)

/** Whether external academic search is enabled (development only) */
export const EXTERNAL_SEARCH_ENABLED = process.env.EXTERNAL_SEARCH_ENABLED === 'true';

The guard is purely env-var-based — nothing prevents enabling it in production. Either enforce && process.env.NODE_ENV !== 'production' in the constant, or update the comment to reflect that it is environment-agnostic.

11. Title-based deduplication is fragile (lib/external-paper-search.ts)

const key = p.doi || p.title;

When DOI is absent, the same paper can appear twice across sources because titles differ slightly in capitalisation or punctuation. Normalise the fallback: (p.doi || p.title.toLowerCase().replace(/\s+/g, ' ').trim()).

12. Non-standard step numbering in the workflow

The new step is labelled 2.5 in both the docstring and logger.info messages. Renumber as step 3 and the existing enrich step as step 4 for a cleaner sequence.


Positive Notes

  • ExternalPaperSchema / ExternalPaper placement in types/index.ts is clean and consistent with the rest of the schema.
  • Using Promise.allSettled (not Promise.all) when calling the three search backends is the right call — one failing source does not abort the others.
  • The opt-in flag and MIN_INTERNAL_MATCHES_BEFORE_EXTERNAL threshold are good design decisions that avoid unnecessary external calls when internal evidence is already sufficient.
  • The visual distinction between attested evidence (green, internal) and external papers (blue, reference-only) in the dialog is clear and communicates the trust hierarchy well.

@tnkshuuhei
Copy link
Member Author

2026-02-25 11:40:46.986 [error] [paperSearch] Error: spawn uv ENOENT
    at ChildProcess._handle.onexit (node:internal/child_process:285:19)
    at onErrorNT (node:internal/child_process:483:16)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:46.987 [error] MCPClient errored connecting to MCP server: {
  error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n    at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n    at onErrorNT (node:internal/child_process:483:16)\\n    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:46.992 [error] [paperSearch] Error: spawn uv ENOENT
    at ChildProcess._handle.onexit (node:internal/child_process:285:19)
    at onErrorNT (node:internal/child_process:483:16)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:46.992 [error] MCPClient errored connecting to MCP server: {
  error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n    at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n    at onErrorNT (node:internal/child_process:483:16)\\n    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:46.995 [error] [paperSearch] Error: spawn uv ENOENT
    at ChildProcess._handle.onexit (node:internal/child_process:285:19)
    at onErrorNT (node:internal/child_process:483:16)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:46.995 [error] MCPClient errored connecting to MCP server: {
  error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n    at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n    at onErrorNT (node:internal/child_process:483:16)\\n    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:46.998 [error] [paperSearch] Error: spawn uv ENOENT
    at ChildProcess._handle.onexit (node:internal/child_process:285:19)
    at onErrorNT (node:internal/child_process:483:16)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:46.998 [error] MCPClient errored connecting to MCP server: {
  error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n    at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n    at onErrorNT (node:internal/child_process:483:16)\\n    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.001 [error] [paperSearch] Error: spawn uv ENOENT
    at ChildProcess._handle.onexit (node:internal/child_process:285:19)
    at onErrorNT (node:internal/child_process:483:16)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.002 [error] MCPClient errored connecting to MCP server: {
  error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n    at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n    at onErrorNT (node:internal/child_process:483:16)\\n    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.005 [error] [paperSearch] Error: spawn uv ENOENT
    at ChildProcess._handle.onexit (node:internal/child_process:285:19)
    at onErrorNT (node:internal/child_process:483:16)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.006 [error] MCPClient errored connecting to MCP server: {
  error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n    at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n    at onErrorNT (node:internal/child_process:483:16)\\n    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.008 [error] [paperSearch] Error: spawn uv ENOENT
    at ChildProcess._handle.onexit (node:internal/child_process:285:19)
    at onErrorNT (node:internal/child_process:483:16)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.008 [error] MCPClient errored connecting to MCP server: {
  error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n    at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n    at onErrorNT (node:internal/child_process:483:16)\\n    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.011 [error] [paperSearch] Error: spawn uv ENOENT
    at ChildProcess._handle.onexit (node:internal/child_process:285:19)
    at onErrorNT (node:internal/child_process:483:16)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.011 [error] MCPClient errored connecting to MCP server: {
  error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n    at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n    at onErrorNT (node:internal/child_process:483:16)\\n    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.016 [error] [paperSearch] Error: spawn uv ENOENT
    at ChildProcess._handle.onexit (node:internal/child_process:285:19)
    at onErrorNT (node:internal/child_process:483:16)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.016 [error] MCPClient errored connecting to MCP server: {
  error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n    at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n    at onErrorNT (node:internal/child_process:483:16)\\n    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.019 [error] (node:4) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 SIGTERM listeners added to [process]. MaxListeners is 10. Use emitter.setMaxListeners() to increase limit
(Use `node --trace-warnings ...` to show where the warning was created)
2026-02-25 11:40:47.019 [error] [paperSearch] Error: spawn uv ENOENT
    at ChildProcess._handle.onexit (node:internal/child_process:285:19)
    at onErrorNT (node:internal/child_process:483:16)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.019 [error] MCPClient errored connecting to MCP server: {
  error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n    at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n    at onErrorNT (node:internal/child_process:483:16)\\n    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.022 [error] [paperSearch] Error: spawn uv ENOENT
    at ChildProcess._handle.onexit (node:internal/child_process:285:19)
    at onErrorNT (node:internal/child_process:483:16)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.022 [error] MCPClient errored connecting to MCP server: {
  error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n    at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n    at onErrorNT (node:internal/child_process:483:16)\\n    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.025 [error] [paperSearch] Error: spawn uv ENOENT
    at ChildProcess._handle.onexit (node:internal/child_process:285:19)
    at onErrorNT (node:internal/child_process:483:16)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.026 [error] MCPClient errored connecting to MCP server: {
  error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n    at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n    at onErrorNT (node:internal/child_process:483:16)\\n    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.028 [error] [paperSearch] Error: spawn uv ENOENT
    at ChildProcess._handle.onexit (node:internal/child_process:285:19)
    at onErrorNT (node:internal/child_process:483:16)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.029 [error] MCPClient errored connecting to MCP server: {
  error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n    at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n    at onErrorNT (node:internal/child_process:483:16)\\n    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.033 [error] [paperSearch] Error: spawn uv ENOENT
    at ChildProcess._handle.onexit (node:internal/child_process:285:19)
    at onErrorNT (node:internal/child_process:483:16)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.033 [error] MCPClient errored connecting to MCP server: {
  error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n    at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n    at onErrorNT (node:internal/child_process:483:16)\\n    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}

…ch-mcp

- Add `includeExternalPapers` option to POST /api/evidence/search
- Extract shared search core from edge-based search for query-based reuse
- Add biorxiv and medrxiv sources (5 sources total)
- Interleave results round-robin across sources for diversity
- Fix operator precedence bug in paper ID generation
- Add cache size limit (500 entries) and normalize title dedup
@claude
Copy link

claude bot commented Feb 26, 2026

PR Review: feat - integrate external academic paper search via paper-search-mcp

Good work overall. The feature is well-structured with a clean opt-in flag, graceful degradation, and sensible separation between attested and reference evidence. Several issues worth addressing before merging.


Bugs

Race condition in getTools() (lib/external-paper-search.ts)

let cachedTools: Record<string, any> | null = null;

async function getTools(): Promise<Record<string, any>> {
  if (!cachedTools) {
    const toolsets = await paperSearchClient.listToolsets(); // ← two concurrent callers can both enter here
    cachedTools = toolsets["paperSearch"] || {};
  }
  return cachedTools;
}

Under concurrent requests, two callers can both observe cachedTools === null before either resolves, resulting in multiple listToolsets() calls. Fix with a promise-based singleton:

let toolsPromise: Promise<Record<string, any>> | null = null;

async function getTools(): Promise<Record<string, any>> {
  if (!toolsPromise) {
    toolsPromise = paperSearchClient.listToolsets().then(ts => ts["paperSearch"] || {});
  }
  return toolsPromise;
}

Cache stores truncated results, breaking larger requests

setCachedResult(query, limited) stores the already-sliced limited array. A subsequent call with a higher maxResults will silently return fewer results than requested. Cache the full deduplicated+interleaved set and slice at read time:

// Store full results:
setCachedResult(query, interleaved);          // not `limited`
// Slice on read:
return cached.papers.slice(0, maxResults);   // already done in getCachedResult ✓

Security

Unvalidated external URLs rendered as href (components/canvas/EvidenceDialog.tsx)

href={paper.doi ? `https://doi.org/${paper.doi}` : paper.url}

paper.url comes from an external MCP process. If the MCP server ever returns a javascript: or data: URI, it executes in the browser context. rel="noopener noreferrer" does not prevent this. Add validation at the schema or normalization layer:

// In normalizeRawPaper, validate URL:
url: raw.url && /^https?:\/\//.test(String(raw.url)) ? String(raw.url) : undefined,

Similarly, validate that doi values don't start with http before prefixing with https://doi.org/.


Performance

Sequential external searches in workflow (mastra/workflows/logic-model-with-evidence.ts)

for (const arrow of canvasData.arrows) {
  externalPapersByArrow[arrow.id] = await searchExternalPapersForEdge(...);
}

With 10+ arrows, this serializes all MCP calls. The per-call latency compounds directly. A Promise.allSettled with a concurrency limit (e.g., p-limit with 3 concurrent) would be significantly faster without overwhelming the MCP server.


Code Quality

as any[] casts in react-flow-utils.ts

arrow.externalPapers = edge.data.externalPapers as any[];

ExternalPaper is already imported; use it:

arrow.externalPapers = edge.data.externalPapers as ExternalPaper[];

source should be an enum, not z.string()

In types/index.ts, source: z.string() allows any string. The five known sources are constants — using z.enum(["pubmed", "arxiv", "google_scholar", "biorxiv", "medrxiv"]) would enable exhaustive checking in the source badge renderer and catch normalization bugs earlier.

url field not validated as a URL

url: z.string().optional() in ExternalPaperSchema accepts any string. z.string().url().optional() would add format validation at the type boundary.

limit parameter reused for external papers in the API route

const externalSearchPromise = includeExternalPapers
  ? searchExternalPapers(query, limit)   // limit = internal evidence limit
  : Promise.resolve([]);

limit controls how many internal evidence items to surface; reusing it for external paper count conflates two different concepts. Consider a separate externalPapersLimit param, or at minimum document the dual use.

Abstract truncated mid-word

abstract: raw.abstract ? String(raw.abstract).slice(0, 500) : undefined,

This cuts text at an arbitrary character position. Slicing at a word boundary or sentence boundary would produce cleaner UI output.


Architecture / UX

Edge color doesn't distinguish evidence quality

arrowsToEdges turns edges green (#10b981) for hasAnyContent, meaning an edge supported only by unvetted Google Scholar results looks identical to one backed by attested on-chain evidence. Consider a different stroke color (e.g., blue) for edges with only external papers, matching the blue UI treatment already used in the dialog and edge button.

Both edge buttons open the same dialog

Clicking the green FileText button or the blue BookOpen button both call setDialogOpen(true), opening one combined dialog regardless of which button was clicked. This is fine if intentional, but the dual-button affordance implies separate views. Either collapse to one button when both types are present, or use separate dialogs/tabs.

In-memory cache is not shared across instances

const searchCache = new Map() is module-level. In a multi-instance or serverless deployment this cache is per-process with no sharing. Given the 24-hour TTL, cold starts will cause repeated MCP calls for popular queries. Worth noting in docs or a TODO for a future Redis-backed cache.

Google Scholar rate limits / ToS

search_google_scholar is included in parallel calls. Google Scholar actively blocks scrapers. At sufficient query volume this will cause IP blocks or errors. Recommend either removing it from the default sources or documenting the production risk prominently.


Minor

  • doi: doi || undefined on line 690 is redundant — doi is already string | undefined from the ternary above it.
  • The dialog description changes from "X evidence items" to "X items", which loses signal about what the items represent. Something like "X evidence items + Y academic papers" would preserve the distinction.
  • The env: {} in paper-search-client.ts passes an empty environment to the subprocess. Depending on the platform, this may strip PATH and prevent uv from locating the Python interpreter. Consider env: process.env or relying on the default (inheriting parent env).

Summary: The feature is valuable and the overall design is sound — feature-flagged, gracefully degrading, well-typed. The race condition in getTools() and the unvalidated URL rendering are the two items I'd flag as must-fix before merge. The sequential workflow step is worth addressing for usability if logic models regularly have 5+ arrows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IDEA: add search from external source feature

1 participant