feat: integrate external academic paper search via paper-search-mcp#191
feat: integrate external academic paper search via paper-search-mcp#191tnkshuuhei wants to merge 3 commits intodevfrom
Conversation
Add MCP-based external paper search (PubMed, arXiv, Google Scholar) to supplement internal evidence matching. Papers are fetched for edges with insufficient internal evidence, cached for 24 hours, and displayed in the canvas UI alongside attested evidence.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
PR Review: feat: integrate external academic paper search via paper-search-mcpOverall the approach is sound — gating external search behind a feature flag, caching results, and using Bugs1. Operator precedence bug in // Line ~567
const id = generateExternalPaperId(source, doi || raw.paper_id ? String(raw.paper_id) : title);Due to JS operator precedence, (doi || raw.paper_id) ? String(raw.paper_id) : titleWhen a paper has a DOI but no Fix: const id = generateExternalPaperId(source, doi || (raw.paper_id ? String(raw.paper_id) : title));2. Missing source labels in The source badge mapping only handles // Add to the ternary chain or switch to a lookup object:
const SOURCE_LABELS: Record<string, string> = {
pubmed: "PubMed",
semantic_scholar: "Semantic Scholar",
arxiv: "arXiv",
google_scholar: "Google Scholar",
};Code Quality3. Sequential awaits in the workflow step
Prefer await Promise.all(
canvasData.arrows.map(async (arrow: Arrow) => {
// ...
externalPapersByArrow[arrow.id] = await searchExternalPapersForEdge(...);
})
);4. Double feature-flag check
5. async function getTools(): Promise<Record<string, any>> {
if (!cachedTools) {
const toolsets = await paperSearchClient.listToolsets();
cachedTools = toolsets["paperSearch"] || {};
}
return cachedTools;
}Two concurrent requests that both see let cachedToolsPromise: Promise<Record<string, unknown>> | null = null;
function getTools() {
if (!cachedToolsPromise) {
cachedToolsPromise = paperSearchClient.listToolsets().then(ts => ts["paperSearch"] ?? {});
}
return cachedToolsPromise;
}6. Extensive use of
7. Comment numbering Step UX8. Two buttons open the same dialog When an edge has both internal evidence and external papers, two buttons render (green 9. Section header absent when only external papers exist The Security10. URL validation for href={paper.doi ? `https://doi.org/${paper.doi}` : paper.url}
function isSafeUrl(url: string | undefined): url is string {
if (!url) return false;
try {
const { protocol } = new URL(url);
return protocol === "https:" || protocol === "http:";
} catch {
return false;
}
}11. Google Scholar scraping The PR description lists Google Scholar as a source via Minor
The feature flag approach and graceful degradation (empty array on failure) are good patterns. Addressing the ID collision bug (#1) and URL validation (#10) are the most important before merging. |
Code Review: feat: integrate external academic paper search via paper-search-mcpOverall this is a well-structured addition — the feature is cleanly scoped, the type definitions are solid, and the opt-in flag ( Bugs1. Operator precedence bug in // Current — parsed as: (doi || raw.paper_id) ? String(raw.paper_id) : title
const id = generateExternalPaperId(source, doi || raw.paper_id ? String(raw.paper_id) : title);When const id = generateExternalPaperId(source, doi || (raw.paper_id ? String(raw.paper_id) : title));2. Two separate buttons open the same dialog ( Both the green Performance3. Sequential external search per arrow ( for (const arrow of canvasData.arrows) {
externalPapersByArrow[arrow.id] = await searchExternalPapersForEdge(...);
}Each call spawns the Python MCP process and awaits up to three network round-trips (PubMed + arXiv + Semantic Scholar). For a model with 10+ arrows this compounds badly. Use 4. In-memory cache is ephemeral in serverless deployments (
The Map also grows without bound — entries are evicted only on a TTL miss during a read. Under sustained load with diverse queries this is a memory leak. A simple size cap would help. 5. MCP cold-start may exceed the 30 s timeout (
Security6. Unvalidated URLs from the external MCP server In href={paper.doi ? `https://doi.org/${paper.doi}` : paper.url}A compromised or misconfigured MCP server could return function safeUrl(raw: unknown): string | undefined {
const s = raw ? String(raw) : '';
return s.startsWith('https://') || s.startsWith('http://') ? s : undefined;
}7. Google Scholar ToS
Minor Issues8. Missing The ternary chain maps 9. Race condition in Two concurrent calls can both observe let toolsPromise: Promise<Record<string, any>> | null = null;
async function getTools() {
if (!toolsPromise) {
toolsPromise = paperSearchClient.listToolsets().then(ts => ts['paperSearch'] || {});
}
return toolsPromise;
}10. Misleading constant comment ( /** Whether external academic search is enabled (development only) */
export const EXTERNAL_SEARCH_ENABLED = process.env.EXTERNAL_SEARCH_ENABLED === 'true';The guard is purely env-var-based — nothing prevents enabling it in production. Either enforce 11. Title-based deduplication is fragile ( const key = p.doi || p.title;When DOI is absent, the same paper can appear twice across sources because titles differ slightly in capitalisation or punctuation. Normalise the fallback: 12. Non-standard step numbering in the workflow The new step is labelled Positive Notes
|
2026-02-25 11:40:46.986 [error] [paperSearch] Error: spawn uv ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:285:19)
at onErrorNT (node:internal/child_process:483:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:46.987 [error] MCPClient errored connecting to MCP server: {
error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n at onErrorNT (node:internal/child_process:483:16)\\n at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:46.992 [error] [paperSearch] Error: spawn uv ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:285:19)
at onErrorNT (node:internal/child_process:483:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:46.992 [error] MCPClient errored connecting to MCP server: {
error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n at onErrorNT (node:internal/child_process:483:16)\\n at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:46.995 [error] [paperSearch] Error: spawn uv ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:285:19)
at onErrorNT (node:internal/child_process:483:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:46.995 [error] MCPClient errored connecting to MCP server: {
error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n at onErrorNT (node:internal/child_process:483:16)\\n at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:46.998 [error] [paperSearch] Error: spawn uv ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:285:19)
at onErrorNT (node:internal/child_process:483:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:46.998 [error] MCPClient errored connecting to MCP server: {
error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n at onErrorNT (node:internal/child_process:483:16)\\n at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.001 [error] [paperSearch] Error: spawn uv ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:285:19)
at onErrorNT (node:internal/child_process:483:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.002 [error] MCPClient errored connecting to MCP server: {
error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n at onErrorNT (node:internal/child_process:483:16)\\n at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.005 [error] [paperSearch] Error: spawn uv ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:285:19)
at onErrorNT (node:internal/child_process:483:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.006 [error] MCPClient errored connecting to MCP server: {
error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n at onErrorNT (node:internal/child_process:483:16)\\n at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.008 [error] [paperSearch] Error: spawn uv ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:285:19)
at onErrorNT (node:internal/child_process:483:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.008 [error] MCPClient errored connecting to MCP server: {
error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n at onErrorNT (node:internal/child_process:483:16)\\n at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.011 [error] [paperSearch] Error: spawn uv ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:285:19)
at onErrorNT (node:internal/child_process:483:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.011 [error] MCPClient errored connecting to MCP server: {
error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n at onErrorNT (node:internal/child_process:483:16)\\n at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.016 [error] [paperSearch] Error: spawn uv ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:285:19)
at onErrorNT (node:internal/child_process:483:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.016 [error] MCPClient errored connecting to MCP server: {
error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n at onErrorNT (node:internal/child_process:483:16)\\n at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.019 [error] (node:4) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 SIGTERM listeners added to [process]. MaxListeners is 10. Use emitter.setMaxListeners() to increase limit
(Use `node --trace-warnings ...` to show where the warning was created)
2026-02-25 11:40:47.019 [error] [paperSearch] Error: spawn uv ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:285:19)
at onErrorNT (node:internal/child_process:483:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.019 [error] MCPClient errored connecting to MCP server: {
error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n at onErrorNT (node:internal/child_process:483:16)\\n at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.022 [error] [paperSearch] Error: spawn uv ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:285:19)
at onErrorNT (node:internal/child_process:483:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.022 [error] MCPClient errored connecting to MCP server: {
error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n at onErrorNT (node:internal/child_process:483:16)\\n at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.025 [error] [paperSearch] Error: spawn uv ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:285:19)
at onErrorNT (node:internal/child_process:483:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.026 [error] MCPClient errored connecting to MCP server: {
error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n at onErrorNT (node:internal/child_process:483:16)\\n at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.028 [error] [paperSearch] Error: spawn uv ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:285:19)
at onErrorNT (node:internal/child_process:483:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.029 [error] MCPClient errored connecting to MCP server: {
error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n at onErrorNT (node:internal/child_process:483:16)\\n at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
}
2026-02-25 11:40:47.033 [error] [paperSearch] Error: spawn uv ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:285:19)
at onErrorNT (node:internal/child_process:483:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21) undefined
2026-02-25 11:40:47.033 [error] MCPClient errored connecting to MCP server: {
error: '{"message":"Failed to connect to MCP server paperSearch: Error: spawn uv ENOENT\\n at ChildProcess._handle.onexit (node:internal/child_process:285:19)\\n at onErrorNT (node:internal/child_process:483:16)\\n at process.processTicksAndRejections (node:internal/process/task_queues:90:21)","domain":"MCP","category":"THIRD_PARTY","code":"MCP_CLIENT_CONNECT_FAILED","details":{"name":"paperSearch"},"cause":{"message":"spawn uv ENOENT","name":"Error","errno":-2,"code":"ENOENT","syscall":"spawn uv","path":"uv","spawnargs":["run","--with","paper-search-mcp","-m","paper_search_mcp.server"]}}'
} |
…ch-mcp - Add `includeExternalPapers` option to POST /api/evidence/search - Extract shared search core from edge-based search for query-based reuse - Add biorxiv and medrxiv sources (5 sources total) - Interleave results round-robin across sources for diversity - Fix operator precedence bug in paper ID generation - Add cache size limit (500 entries) and normalize title dedup
PR Review: feat - integrate external academic paper search via paper-search-mcpGood work overall. The feature is well-structured with a clean opt-in flag, graceful degradation, and sensible separation between attested and reference evidence. Several issues worth addressing before merging. BugsRace condition in let cachedTools: Record<string, any> | null = null;
async function getTools(): Promise<Record<string, any>> {
if (!cachedTools) {
const toolsets = await paperSearchClient.listToolsets(); // ← two concurrent callers can both enter here
cachedTools = toolsets["paperSearch"] || {};
}
return cachedTools;
}Under concurrent requests, two callers can both observe let toolsPromise: Promise<Record<string, any>> | null = null;
async function getTools(): Promise<Record<string, any>> {
if (!toolsPromise) {
toolsPromise = paperSearchClient.listToolsets().then(ts => ts["paperSearch"] || {});
}
return toolsPromise;
}Cache stores truncated results, breaking larger requests
// Store full results:
setCachedResult(query, interleaved); // not `limited`
// Slice on read:
return cached.papers.slice(0, maxResults); // already done in getCachedResult ✓SecurityUnvalidated external URLs rendered as href={paper.doi ? `https://doi.org/${paper.doi}` : paper.url}
// In normalizeRawPaper, validate URL:
url: raw.url && /^https?:\/\//.test(String(raw.url)) ? String(raw.url) : undefined,Similarly, validate that PerformanceSequential external searches in workflow ( for (const arrow of canvasData.arrows) {
externalPapersByArrow[arrow.id] = await searchExternalPapersForEdge(...);
}With 10+ arrows, this serializes all MCP calls. The per-call latency compounds directly. A Code Quality
arrow.externalPapers = edge.data.externalPapers as any[];
arrow.externalPapers = edge.data.externalPapers as ExternalPaper[];
In
const externalSearchPromise = includeExternalPapers
? searchExternalPapers(query, limit) // limit = internal evidence limit
: Promise.resolve([]);
Abstract truncated mid-word abstract: raw.abstract ? String(raw.abstract).slice(0, 500) : undefined,This cuts text at an arbitrary character position. Slicing at a word boundary or sentence boundary would produce cleaner UI output. Architecture / UXEdge color doesn't distinguish evidence quality
Both edge buttons open the same dialog Clicking the green FileText button or the blue BookOpen button both call In-memory cache is not shared across instances
Google Scholar rate limits / ToS
Minor
Summary: The feature is valuable and the overall design is sound — feature-flagged, gracefully degrading, well-typed. The race condition in |
Add MCP-based external paper search (PubMed, arXiv, Google Scholar) to supplement internal evidence matching. Papers are fetched for edges with insufficient internal evidence, cached for 24 hours, and displayed in the canvas UI alongside attested evidence.