Skip to content

Feat/semantic scholar#194

Draft
tnkshuuhei wants to merge 15 commits intodevfrom
feat/semantic-scholar
Draft

Feat/semantic scholar#194
tnkshuuhei wants to merge 15 commits intodevfrom
feat/semantic-scholar

Conversation

@tnkshuuhei
Copy link
Member

@tnkshuuhei tnkshuuhei commented Feb 26, 2026

Add MCP-based external paper search (PubMed, arXiv, Google Scholar) to
supplement internal evidence matching. Papers are fetched for edges with
insufficient internal evidence, cached for 24 hours, and displayed in
the canvas UI alongside attested evidence.
…ch-mcp

- Add `includeExternalPapers` option to POST /api/evidence/search
- Extract shared search core from edge-based search for query-based reuse
- Add biorxiv and medrxiv sources (5 sources total)
- Interleave results round-robin across sources for diversity
- Fix operator precedence bug in paper ID generation
- Add cache size limit (500 entries) and normalize title dedup
…ntegration

Remove the @mastra/mcp dependency and Python-based paper-search-mcp server
in favor of a lightweight HTTP client for the Semantic Scholar Academic Graph API.
This simplifies the external paper search stack by eliminating the need for
Python/uv runtime and reducing the number of external dependencies.
@vercel
Copy link

vercel bot commented Feb 26, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
muse Ready Ready Preview, Comment Feb 26, 2026 7:22am

@claude
Copy link

claude bot commented Feb 26, 2026

test

@claude
Copy link

claude bot commented Feb 26, 2026

PR Review: Feat/Semantic Scholar Integration

Good overall direction - adding Semantic Scholar as a fallback for under-matched edges is a sensible approach. The architecture is clean and the graceful degradation patterns are solid. Here are the issues I found, grouped by severity.


Bugs / Correctness Issues

1. External papers omitted when results are empty (app/api/evidence/search/route.ts)

When includeExternalPapers is true but the external search returns zero results (not an error), response.externalPapers is never set. Callers requesting external papers receive undefined instead of [], requiring extra null-checks client-side.

Fix: always set response.externalPapers when the flag is requested, even if empty:

if (includeExternalPapers) {
  response.externalPapers =
    externalResult.status === "fulfilled" ? externalResult.value : [];
  if (externalResult.status === "rejected") {
    logger.warn({ error: externalResult.reason }, "External paper search failed");
  }
}

2. Sequential external search in workflow loop (mastra/workflows/logic-model-with-evidence.ts)

The for...of with await inside searchExternalPapersStep searches each arrow sequentially. With N arrows this is N times slower than necessary on a cold cache. Since each arrow is independent, use Promise.all:

await Promise.all(
  canvasData.arrows.map(async (arrow) => {
    externalPapersByArrow[arrow.id] = await searchExternalPapersForEdge(fromContent, toContent);
  })
);

Performance / Infrastructure Concerns

3. In-memory cache does not survive serverless restarts (lib/external-paper-search.ts)

The module-level searchCache Map is described as lasting "24 hours", but in Next.js serverless environments each cold start creates a new process and the cache will be empty. The TTL comment is misleading for production deployments on Vercel/Lambda. Either document this limitation explicitly, or use an external cache (Redis, KV) if the TTL needs to be meaningful.

4. EXTERNAL_SEARCH_ENABLED defaults to opt-out (lib/constants.ts)

export const EXTERNAL_SEARCH_ENABLED = process.env.EXTERNAL_SEARCH_ENABLED !== "false";

This silently enables external API calls on all deployments that have not set the env var. Opt-in (=== "true") would be safer and more explicit, especially since the API key is optional and rate limits apply.


Code Quality Issues

5. Unsafe as any[] cast (lib/canvas/react-flow-utils.ts)

arrow.externalPapers = edge.data.externalPapers as any[];

Should cast to ExternalPaper[] to maintain type safety (same issue exists for the adjacent evidenceMetadata as any[]).

6. Duplicate buttons open the same dialog (components/canvas/EvidenceEdge.tsx)

When an edge has both internal evidence and external papers, two separate buttons (emerald + blue) appear but both set the same dialogOpen state and open the same combined dialog. A user clicking the blue "academic papers" button would expect a different view. Consider merging them into one button, or at minimum adding a tooltip clarifying both open the same combined view.

7. Section header layout (components/canvas/EvidenceDialog.tsx)

The "Academic Papers (Reference)" <h3> is inside a border-t pt-4 div, but the paper cards are rendered as siblings outside that div. The separator line and padding only wrap the heading, leaving the cards without consistent spacing from the divider.

8. buildSearchQuery truncates mid-word (lib/external-paper-search.ts)

combined.slice(0, 100) can cut mid-word. Consider trimming to the last full word before the cutoff for better query quality.


Minor / Low Priority

9. Truncated abstracts have no ellipsis indicator (lib/academic-apis/semantic-scholar.ts)

raw.abstract.slice(0, 500) drops text without appending .... Users reading the displayed text have no indication the abstract is cut off.

10. Unvalidated paper.url in anchor tag (components/canvas/EvidenceDialog.tsx)

paper.url is passed directly as an href. The source is trustworthy and rel="noopener noreferrer" is present, but a URL scheme check (allowing only https?:) would be a small hardening improvement.

11. year: raw.year ?? undefined is a no-op (lib/academic-apis/semantic-scholar.ts)

raw.year is already typed as number | undefined; ?? undefined adds nothing - just use raw.year directly.


Summary

# File Severity
1 app/api/evidence/search/route.ts Bug
2 mastra/workflows/logic-model-with-evidence.ts Performance
3 lib/external-paper-search.ts Misleading / Infrastructure
4 lib/constants.ts Behaviour
5 lib/canvas/react-flow-utils.ts Type safety
6 components/canvas/EvidenceEdge.tsx UX
7 components/canvas/EvidenceDialog.tsx Layout
8-11 Various Minor

Items 1 and 2 are the most important to address before merging.

- Add Gemini 2.0 Flash-based keyword extraction to convert card titles
  (including Japanese) into English academic search keywords
- Fix buildSearchQuery to use only card titles instead of full
  title+description, preventing garbled truncated queries
- Add empty/short query guard in Semantic Scholar client to avoid 500 errors
- Add @ai-sdk/google dependency for direct Vercel AI SDK usage
@claude
Copy link

claude bot commented Feb 26, 2026

Code Review: Semantic Scholar Integration

Good overall direction — adding external academic evidence as a fallback when internal matches are sparse is a solid feature. The layered architecture (type → API client → orchestration → workflow step → UI) is clean and follows the project's patterns well. A few issues worth addressing before merging:


Bugs / Correctness

1. "Academic Papers" section heading missing when no internal evidence

In EvidenceDialog.tsx, the "Academic Papers (Reference)" heading only renders when hasEvidence is also true:

{hasExternalPapers && (
  <>
    {hasEvidence && (         // <-- heading only shows when both exist
      <div className="border-t pt-4">
        <h3 ...>Academic Papers (Reference)</h3>
      </div>
    )}
    {externalPapers.map(...)}
  </>
)}

If an edge has only external papers (no internal evidence), users see a list of blue cards with no heading explaining what they are. The heading should always render when external papers are present:

{hasExternalPapers && (
  <>
    <div className={hasEvidence ? "border-t pt-4" : ""}>
      <h3 className="mb-3 text-sm font-semibold text-gray-500">
        Academic Papers (Reference)
      </h3>
    </div>
    {externalPapers.map(...)}
  </>
)}

2. buildSearchQuery return value is unused in searchExternalPapersForEdge

In lib/external-paper-search.ts, rawQuery is only used for the empty-string guard — the actual search always uses the LLM-extracted query instead. The buildSearchQuery call is effectively dead code here:

const rawQuery = buildSearchQuery(fromContent, toContent); // only used for empty check
if (!rawQuery) return [];

const fromTitle = fromContent.split(".")[0].trim();  // duplicates buildSearchQuery logic
const toTitle = toContent.split(".")[0].trim();
const query = await extractSearchKeywords(fromTitle, toTitle);  // this is what's used

Either pass rawQuery as the fallback into extractSearchKeywords, or remove buildSearchQuery from this path entirely.


Performance

3. Sequential external paper search in workflow

searchExternalPapersStep processes all arrows in a for...of loop, so each edge waits for the previous one's LLM call + HTTP round-trip. For a logic model with 10+ arrows, this compounds badly:

for (const arrow of canvasData.arrows) {
  externalPapersByArrow[arrow.id] = await searchExternalPapersForEdge(fromContent, toContent);
}

These are independent and should run in parallel:

const results = await Promise.allSettled(
  canvasData.arrows.map(async (arrow) => {
    // ... build fromContent, toContent ...
    const papers = await searchExternalPapersForEdge(fromContent, toContent);
    return { arrowId: arrow.id, papers };
  })
);
for (const result of results) {
  if (result.status === "fulfilled") {
    externalPapersByArrow[result.value.arrowId] = result.value.papers;
  }
}

Design / Semantics

4. Green edge color for external-paper-only edges

In lib/canvas/react-flow-utils.ts, edges turn green (#10b981) whenever hasAnyContent is true — including when there is only external papers and no attested internal evidence:

stroke: hasAnyContent ? "#10b981" : "#6b7280",

Green edges visually signal "validated by attested evidence" in this system. Using the same color for "has reference papers only" blurs that distinction. Consider using the blue accent (#3b82f6) for external-only edges to match the blue UI treatment already applied in the dialog and edge buttons.

5. Comments say "paper-search-mcp" but implementation is direct REST

Multiple doc comments reference paper-search-mcp:

  • app/api/evidence/search/route.ts JSDoc: "via paper-search-mcp"
  • mastra/workflows/logic-model-with-evidence.ts: "External academic paper search … (paper-search-mcp)"

The implementation calls the Semantic Scholar REST API directly — no MCP is involved. Update these to say "Semantic Scholar API".


Minor Issues

6. EXTERNAL_SEARCH_ENABLED missing from .env.example

lib/constants.ts reads process.env.EXTERNAL_SEARCH_ENABLED but it isn't documented in .env.example. Add an entry alongside SEMANTIC_SCHOLAR_API_KEY:

# Set to "false" to disable external academic paper search (Semantic Scholar). Defaults to enabled.
EXTERNAL_SEARCH_ENABLED=""

7. as any[] cast in react-flow-utils.ts

arrow.externalPapers = edge.data.externalPapers as any[];

This is consistent with the evidenceMetadata cast above it, but it would be cleaner to cast to ExternalPaper[] now that the type is available in the codebase.

8. Two buttons, one dialog

In EvidenceEdge.tsx, both the green (internal) and blue (external) buttons call setDialogOpen(true) on the same dialog. This is fine functionally, but may puzzle users who assume the two buttons open different views. A title tooltip like "View evidence and academic papers" on both, or consolidating into one button when both types are present, would reduce confusion.


Positive Notes

  • Graceful degradation throughout (failures fall back silently, external results are optional fields) is well-done.
  • The in-memory cache with TTL and max-size eviction in external-paper-search.ts is a sensible addition — just be aware it resets on server restarts / cold starts in serverless deployments.
  • Abstract truncation to 500 chars in normalizePaper is a reasonable choice to keep payloads lean.
  • Zod schema additions in types/index.ts are clean and well-placed.

…ading

- Always return externalPapers as array (not undefined) when includeExternalPapers is true
- Show "Academic Papers (Reference)" heading even when only external papers exist (no internal evidence)
…lSettled

Replace sequential for...of loop with Promise.allSettled to search all arrows
concurrently. Each arrow's LLM keyword extraction + HTTP request is independent,
so parallelization significantly reduces total latency for logic models with
multiple edges.
…olar integration

- Replace `as any[]` casts with proper `EvidenceMatch[]` / `ExternalPaper[]` types
- Differentiate edge colors: green (attested evidence), blue (external papers only), gray (none)
- Consolidate dual buttons into single button when both evidence types exist on an edge
- Update all "paper-search-mcp" references to "Semantic Scholar API"
- Change EXTERNAL_SEARCH_ENABLED default to opt-in (=== "true") for safer deployments
- Add EXTERNAL_SEARCH_ENABLED to .env.example
- Add ellipsis to truncated abstracts, remove no-op `?? undefined` expressions
- Fix word-boundary truncation in search query builder
- Remove dead code in searchExternalPapersForEdge (unused buildSearchQuery call)
@claude
Copy link

claude bot commented Feb 26, 2026

PR Review: Feat/semantic scholar

Good overall structure for integrating external paper search. The parallel execution design (Promise.allSettled) and graceful degradation approach are solid. Below are the issues I found, from most critical to minor.


🐛 Bugs

1. Incorrect JSDoc comment on searchExternalPapersForEdge (lib/external-paper-search.ts)

The function is documented as "Deterministic (no LLM calls)" but it explicitly calls extractSearchKeywords, which invokes Gemini:

// Extract English academic keywords via LLM (falls back to raw titles on failure)
const query = await extractSearchKeywords(fromTitle, toTitle);

Only searchExternalPapers (the API path) is LLM-free. Fix the JSDoc before this confuses future maintainers.

2. Query truncation bug in searchExternalPapers (lib/external-paper-search.ts)

const searchQuery =
  trimmed.length > 200 ? trimmed.slice(0, trimmed.lastIndexOf(" ", 200) || 200) : trimmed;

lastIndexOf(" ", 200) returns -1 when there is no space in the first 200 characters. In JavaScript, -1 is truthy, so -1 || 200 evaluates to -1 — and slice(0, -1) silently drops the last character of the string rather than truncating to 200 chars. This won't matter for normal prose queries, but it's a latent correctness bug. Replace with:

const lastSpace = trimmed.lastIndexOf(" ", 200);
const searchQuery = trimmed.length > 200
  ? trimmed.slice(0, lastSpace > 0 ? lastSpace : 200)
  : trimmed;

3. LLM-generated keywords defeat the cache for the workflow path (lib/external-paper-search.ts)

searchExternalPapersForEdge calls extractSearchKeywords (Gemini) to build a search string, then uses that string as the cache key. Because LLM output is non-deterministic, repeated calls for the same edge may produce different keywords, causing cache misses every time. The LLM call overhead also negates the purpose of caching.

Consider using the deterministic buildSearchQuery(fromContent, toContent) result as the cache key (and optionally as a cache key prefix), reserving the LLM expansion for the actual API query while keying on the stable input.


⚠️ Correctness / Logic Issues

4. In-memory cache won't survive serverless restarts (lib/external-paper-search.ts)

The searchCache Map is module-level. On Vercel (or any serverless deployment), each cold start creates a fresh instance, so the claimed 24-hour TTL is misleading — the cache lifetime is actually bounded by the function instance lifetime. The EXTERNAL_CACHE_TTL_MS constant and its comments suggest persistent caching. Either document this limitation clearly, or note in lib/constants.ts that the TTL is only meaningful for long-lived server processes.

5. Cache eviction is FIFO, not LRU (lib/external-paper-search.ts)

const oldestKey = searchCache.keys().next().value; // removes insertion-order oldest

The code is commented as "LRU-style eviction" but this removes the oldest-inserted entry, not the least-recently-used one. Either rename it to "FIFO eviction" in the comment or implement actual LRU (e.g., delete and re-insert on cache hit to maintain recency order in the Map).


🔒 Security / Configuration

6. .env.example enables external search by default

EXTERNAL_SEARCH_ENABLED="true"

The inline comment says "Defaults to disabled", but anyone who copies .env.example verbatim gets the feature enabled. The example should use "false" (or leave the value empty) to match the documented default.


🧹 Code Quality

7. buildSearchQuery is dead code (lib/external-paper-search.ts)

buildSearchQuery is exported but never called — searchExternalPapersForEdge duplicates the same logic inline. Either delete buildSearchQuery and use the inline version, or use buildSearchQuery inside searchExternalPapersForEdge to avoid the duplication.

8. LLM call per under-matched edge may be expensive in the workflow (mastra/workflows/logic-model-with-evidence.ts)

Step 2.5 fires extractSearchKeywords (a Gemini call) for every edge with fewer than MIN_INTERNAL_MATCHES_BEFORE_EXTERNAL internal matches, all in parallel via Promise.allSettled. For a canvas with many arrows, this could mean a large number of simultaneous LLM requests. Consider adding a concurrency cap (e.g., process in batches of N) to avoid rate-limit errors and runaway costs.


📝 Minor / Nits

  • semantic-scholar.ts defines a module-local BASE_URL constant that shadows the project-wide BASE_URL from @/lib/constants in other files. The naming collision won't cause a runtime error (different modules), but consider renaming to SEMANTIC_SCHOLAR_BASE_URL for clarity.
  • The normalizePaper function appends "..." after 500-char abstract truncation — good. Just noting it is correct.
  • The parallel search in route.ts (Promise.allSettled) with graceful degradation on external failure is a clean pattern — well done.
  • UI colour coding (green = attested evidence, blue = external papers) is a thoughtful affordance for distinguishing source trust levels.

Summary: The core design is sound. The most actionable items before merge are the JSDoc bug (#1), the query-truncation edge case (#2), the cache-key non-determinism (#3), and the .env.example default (#6). The rest are improvement suggestions.

…orrect JSDoc

- Use deterministic buildSearchQuery output as cache key for edge-based searches
  instead of LLM-generated keywords, ensuring cache hits for repeated edge queries
- Fix lastIndexOf returning -1 (truthy) causing slice(0, -1) instead of slice(0, 200)
- Correct JSDoc: searchExternalPapersForEdge uses LLM (Gemini), not deterministic
- Make buildSearchQuery private (no external callers) and reuse within the module
- Document cache limitations: in-memory resets on serverless cold starts, FIFO eviction
The example file previously set EXTERNAL_SEARCH_ENABLED="true" which contradicts
the documented default of disabled. Use empty string so copying .env.example
verbatim keeps external search disabled until explicitly enabled.
@claude
Copy link

claude bot commented Feb 26, 2026

PR Review: Feat/semantic-scholar (#194)

This is a well-structured feature that adds Semantic Scholar integration for supplementing internal evidence matching. The architecture is clean, the degradation is graceful, and the opt-in feature flag approach is the right call for production safety. A few issues worth addressing before merge:


Bugs / Correctness

1. Missing GOOGLE_GENERATIVE_AI_API_KEY in .env.example

lib/academic-apis/extract-search-keywords.ts uses @ai-sdk/google which requires GOOGLE_GENERATIVE_AI_API_KEY to be set. This env var is not in .env.example, so it will silently fall back to raw titles for every keyword extraction call — making the LLM-based extraction effectively dead code in most deployments.

Suggested addition to .env.example:

# Required for LLM-based keyword extraction (extract-search-keywords.ts)
GOOGLE_GENERATIVE_AI_API_KEY=""

2. Double cache writes in searchExternalPapersForEdge

lib/external-paper-search.ts — after a cache miss on stableKey, the function calls searchExternalPapersCore(query, ...) which internally calls setCachedResult(query, limited) (keyed by the LLM query string), and then the outer function also calls setCachedResult(stableKey, papers). This writes two entries for every edge miss, consuming 2 of 500 cache slots per unique edge. Not a correctness bug but worth cleaning up — either remove the outer setCachedResult and let searchExternalPapersCore be the sole cache manager, or remove the inner one.

3. lastSpace > 0 edge case in searchExternalPapers

const lastSpace = trimmed.lastIndexOf(" ", 200);
const searchQuery = trimmed.length > 200 ? trimmed.slice(0, lastSpace > 0 ? lastSpace : 200) : trimmed;

lastSpace !== -1 is the semantically correct check. lastSpace > 0 would incorrectly fall through if a space happened to be at index 0 (extremely rare post-trim, but the intent is to handle the no-space-found (-1) case).


Security

4. Unvalidated paper.url used directly as href

components/canvas/EvidenceDialog.tsx:

href={paper.doi ? `https://doi.org/${paper.doi}` : paper.url}

paper.doi is always prefixed with https://doi.org/ so it is safe. But paper.url comes directly from the Semantic Scholar API and is used as a bare href. Consider adding a guard in normalizePaper() in semantic-scholar.ts:

url: raw.url?.startsWith("https://") ? raw.url : undefined,

Performance

5. No concurrency limit on parallel Semantic Scholar requests

mastra/workflows/logic-model-with-evidence.ts fires one Semantic Scholar HTTP request per edge simultaneously via Promise.allSettled. Without an API key, Semantic Scholar caps at roughly 100 requests/5 min. A logic model with many edges could exhaust this quickly. Consider capping concurrency (e.g., p-limit) or staggering unauthenticated requests.

6. gemini-2.5-flash is hardcoded in extract-search-keywords.ts

The project has a configurable MODEL env var but this file always uses Gemini regardless. This is probably intentional (needs a multilingual-capable model), but users without Gemini API access get silent fallback with no indication the extraction is failing. Worth adding a note to .env.example or the docs.


Code Quality

7. ExternalPaperSchema.source would benefit from an enum

Currently source: z.string() — with only one integration at present, using z.enum(["semantic_scholar"]) would improve type safety and document valid values for future integrations.

8. EvidenceDialog description conflates trust levels

The new description "X items linked to this connection" groups blockchain-attested evidence and unvetted external papers into one count. Since these have meaningfully different epistemic weight, showing them separately would be clearer for users:

2 attested evidence items · 3 academic papers (reference)

Positives

  • Graceful degradation: external search failure returns an empty array and logs a warning rather than breaking the API response.
  • Opt-in via EXTERNAL_SEARCH_ENABLED === "true" is the right default. .env.example correctly uses empty string.
  • Promise.allSettled for parallelization across edges is well-applied.
  • Type safety improvements replacing as any[] with proper typed casts throughout.
  • Edge color differentiation (green = attested, blue = external only, gray = none) is intuitive.
  • In-memory cache cold-start caveat is documented clearly in JSDoc.

Summary

The main pre-merge fix is adding GOOGLE_GENERATIVE_AI_API_KEY to .env.example — without it the LLM keyword extraction silently falls back on every call and the feature does not work as advertised. The double-cache and rate-limiting issues are worth addressing before production. Overall the architecture and feature flag approach are solid.

Add Step 2.5 (external academic paper search) to workflow diagrams and
descriptions. Document three-tier edge color system (green/blue/gray),
ExternalPaper type, and new file references for academic API clients.
Add attested evidence vs external papers comparison table, update UI
presentation to three-tier color system, and add external search file
references.
Document three-tier color logic (green/blue/gray), FileText/BookOpen
button differentiation, two-section evidence dialog layout, and
updated code examples matching current implementation.
Mirror all English documentation changes in Japanese: add Step 2.5
workflow diagrams, external paper search description, three-tier color
system, two-section dialog, and updated frontend component details.
Add Semantic Scholar API to technology stack in CLAUDE.md and mention
optional external paper search in README.md features section.
@claude
Copy link

claude bot commented Feb 26, 2026

PR Review: Feat/Semantic Scholar Integration

Great feature addition! The Semantic Scholar integration is well-structured and adds meaningful academic context for under-evidenced edges. The code is clean and architecture decisions are sound. Here is my detailed feedback:


Strengths

  • Fault isolation: Using Promise.allSettled for parallel edge searches prevents one failure from blocking others.
  • Clear separation of concerns: The split across extract-search-keywords.ts, semantic-scholar.ts, and external-paper-search.ts is easy to follow.
  • Feature flag design: Defaulting EXTERNAL_SEARCH_ENABLED to false makes the feature opt-in without breaking existing deployments.
  • Graceful degradation: All failure paths return empty arrays rather than propagating errors into the canvas render.
  • Two-key caching strategy: Caching under both the stable title-derived key (before LLM) and the LLM-extracted query key (after) avoids redundant LLM calls for repeated edge searches.
  • Type imports: Consistent use of import type for type-only imports throughout.
  • Improved typing: Replacing as any[] with as EvidenceMatch[] in react-flow-utils.ts is a good cleanup.

Must Fix

Missing Gemini API key documentation in .env.example

The new @ai-sdk/google package requires a GOOGLE_GENERATIVE_AI_API_KEY (or GEMINI_API_KEY) environment variable. Without it, setting EXTERNAL_SEARCH_ENABLED=true will silently fall back to raw titles for every keyword extraction run, but developers will not know why.

Suggested addition to .env.example:

# Google Gemini API key (required for keyword extraction when EXTERNAL_SEARCH_ENABLED=true)
GOOGLE_GENERATIVE_AI_API_KEY=""

Should Fix

XSS risk: unvalidated paper.url used as href (components/canvas/EvidenceDialog.tsx)

paper.url from the Semantic Scholar API is passed directly into an anchor href with no URL scheme validation. While Semantic Scholar URLs are almost always https://, there is no enforcement. A malformed or injected javascript: URL would be a stored XSS vector once canvas data is persisted.

Option A — validate at render time:

href={
  paper.doi
    ? `https://doi.org/${paper.doi}`
    : paper.url?.match(/^https?:\/\//)
      ? paper.url
      : undefined
}

Option B — validate in normalizePaper() inside semantic-scholar.ts so invalid URLs are stripped at the source before reaching the type system.


Suggestions

1. Mixed-language comment in English docs (docs/mastra-agents.md)

A Japanese comment was accidentally left in the English Mermaid diagram (around line 952 of the diff):

Note over Canvas: evidenceIds付きの矢印 → type="evidence", Green (#10b981)

Should be English in docs/mastra-agents.md. DEV.ja.md is fine as-is.


2. Step numbering: "Step 2.5" is unconventional

Renaming to a clean 4-step sequence would improve log readability:

  • Step 1: Generate Logic Model Structure
  • Step 2: Batch Evidence Search
  • Step 2.5 → Step 3: External Paper Search
  • Step 3 → Step 4: Enrich Canvas

3. Server-side abstract truncation is redundant (lib/academic-apis/semantic-scholar.ts)

The UI already applies line-clamp-3 and the dialog is scrollable. The 500-char server-side truncation adds complexity without clear user benefit. If it is intended as a payload size guard, that rationale is worth a comment; otherwise consider removing it.


4. Document the intentional double-write to cache (lib/external-paper-search.ts)

In searchExternalPapersForEdge, results are stored to cache twice per miss: once inside searchExternalPapersCore (keyed by LLM-extracted keywords) and once after (keyed by the stable raw-title key). This is intentional to skip the LLM on repeated calls for the same edge, but a short inline comment would save future readers from confusion.


5. In-memory cache lifespan on serverless (awareness note)

The existing comment correctly notes the cache resets on cold starts. The 24h TTL is effectively meaningless on Vercel/Lambda deployments. Worth noting as a known limitation — a shared cache (Redis/Upstash) would make the TTL durable at production scale.


Summary

Category Count
Must fix 1
Should fix 1
Suggestions 5

The core logic is solid. The two items to address before merging are the missing GOOGLE_GENERATIVE_AI_API_KEY env var documentation and the unvalidated external URL in the dialog. Everything else is polish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IDEA: add search from external source feature

1 participant