Skip to content

feat: Add research-skill (start/check two-stage mode) to prevent timeout#126

Closed
yazelin wants to merge 1 commit intomainfrom
openspec/add-research-skill
Closed

feat: Add research-skill (start/check two-stage mode) to prevent timeout#126
yazelin wants to merge 1 commit intomainfrom
openspec/add-research-skill

Conversation

@yazelin
Copy link
Owner

@yazelin yazelin commented Feb 25, 2026

Summary

Implement research-skill with background job mechanism to handle long-running external research tasks without blocking 480s session timeout.

Problem

  • External research tasks (search + fetch + synthesize multiple sources) often exceed 480s session timeout
  • Leads to partial tool completions but overall response failure
  • User has no way to retrieve partial results or check progress later

Solution

  • New skill: research-skill with two scripts:
    • start-research: Async launch, returns job_id immediately
    • check-research: Query progress, partial results, or final summary
  • Job mechanism: Background process with status.json tracking
  • Two-stage interaction: Start → receive job_id → check progress → get results
  • Fallback paths: Maintained via script-first routing with optional MCP fallback

Changes

  • backend/src/ching_tech_os/skills/research-skill/ (new)
    • SKILL.md: Capability definition & script usage
    • scripts/start-research.py: Async job launcher
    • scripts/check-research.py: Status/result querier
  • backend/src/ching_tech_os/services/
    • bot/agents.py: Updated usage tips for research-skill routing
    • linebot_ai.py: Added _extract_research_tool_feedback() for two-stage reply handling
  • openspec/specs/ (synced from delta)
    • bot-platform/spec.md: Added 2 new requirements
    • line-bot/spec.md: Added 2 new requirements
    • research-skill/spec.md: New capability (3 requirements)
  • Tests: Updated test_skills_manager.py, test_linebot_ai_service.py for new skill

Testing

  • ✅ npm run build
  • ✅ Python compile checks
  • ✅ 41 backend tests pass (skills, routing, linebot)
  • ✅ Manual smoke tests: start → check flow works

Notes

  • Script-first routing (if enabled) will suppress overlapping MCP tools
  • Fallback to WebSearch/WebFetch available if script fails
  • No sleep loops in Claude session (prevents nested timeout)
  • Stale job detection: Tasks inactive >20min auto-marked failed

…etch/WebSearch timeout

## Summary
Implement research-skill with background job mechanism to handle long-running external research tasks (search + fetch + synthesize) without blocking 480s session timeout.

## Key Changes
- **New skill**: research-skill/start-research (async launch) & check-research (progress/result)
- **Bot routing**: Update agents.py to prioritize run_skill_script for research tasks
- **LineBot integration**:
  - _extract_research_tool_feedback() handles start/check responses
  - Start always returns job_id to user
  - Check returns progress/completion/failure with summaries
- **Main specs sync**: Updated bot-platform/line-bot, added research-skill capability
- **Tests**: Verify skill loading, routing, and linebot AI response handling

## Motivation
- Prevents timeout when multiple WebFetch calls exceed 480s session limit
- Enables user to query progress and retrieve partial results later
- Maintains fallback paths (script failure → MCP tool)

## Implementation Details
- start-research: Fork background process, return job_id immediately
- check-research: Query status.json, handle stale jobs (>20min → failed)
- JSON status files per job: status/progress/sources/final_summary
- No sleep loops in Claude session (avoid nested timeout)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@gemini-code-assist
Copy link

Summary of Changes

Hello @yazelin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses the issue of long-running external research tasks causing session timeouts by introducing an asynchronous research-skill. This new capability allows the system to initiate research in the background, provide immediate feedback to the user with a job ID, and enable subsequent checks for progress and results. This significantly improves user experience by preventing failures due to timeouts and offering transparency into the research process, ensuring that partial or final results are always retrievable.

Highlights

  • New Research Skill: Introduced a new research-skill with start-research and check-research scripts to handle long-running external research tasks asynchronously, preventing session timeouts.
  • Two-Stage Interaction: Implemented a two-stage interaction model where start-research immediately returns a job_id, and check-research is used to query progress, partial results, or the final summary.
  • Bot Routing and Feedback: Updated bot routing to prioritize the new research-skill for external research queries and added logic to the Linebot AI service to extract and format feedback from research-skill tool calls, ensuring proper user communication.
  • Persistent Job Status: Designed a background job mechanism that tracks research task status using status.json files, allowing for persistent state and progress updates even if the initial session times out.
  • Comprehensive Specification and Testing: Added detailed specification documents for the new research-skill and updated existing bot platform and Line Bot specifications, alongside new and updated backend tests to cover the new functionality.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • backend/src/ching_tech_os/services/bot/agents.py
    • Added new usage tips for the research-skill to guide AI agents on how to initiate and check research tasks.
  • backend/src/ching_tech_os/services/linebot_ai.py
    • Implemented _parse_json_object to safely parse JSON strings.
    • Added _extract_research_tool_feedback to process outputs from research-skill tool calls and generate appropriate user messages.
    • Integrated _extract_research_tool_feedback into process_message_with_ai to handle responses for research-skill.
  • backend/src/ching_tech_os/skills/research-skill/SKILL.md
    • Added documentation for the new research-skill, detailing its purpose, available scripts (start-research, check-research), parameters, typical workflow, and AI behavior guidelines.
  • backend/src/ching_tech_os/skills/research-skill/scripts/check-research.py
    • Added a new Python script to query the status and results of an ongoing or completed research job.
    • Implemented logic to parse input, find the job's status file, and return structured data including status, progress, partial results, and final summary.
    • Included functionality to mark stale jobs as failed if they haven't updated for a specified duration.
  • backend/src/ching_tech_os/skills/research-skill/scripts/start-research.py
    • Added a new Python script to asynchronously start a research task, performing search, fetching, and synthesizing information.
    • Implemented input validation for query and URLs, and parameter clamping for max_results and max_fetch.
    • Designed a background process using os.fork() to execute the research, immediately returning a job_id to the caller.
    • Included functions for searching (DuckDuckGo), fetching content, stripping HTML, building summaries, and writing results to markdown files.
  • backend/tests/test_linebot_ai_service.py
    • Imported the json module for testing JSON parsing.
    • Added tests for _extract_research_tool_feedback to verify correct parsing and message generation for start-research and check-research outputs.
    • Included a test case for process_message_with_ai to ensure that the job_id from start-research is correctly appended to the AI's response.
  • backend/tests/test_skills_manager.py
    • Updated the test test_native_base_file_manager_script_first to include the new research-skill.
    • Asserted that research-skill is recognized by the skill manager, has mcp__ching-tech-os__run_skill_script as an allowed tool, requires the file-manager app, and has its scripts (start-research, check-research) discoverable.
  • openspec/changes/archive/2026-02-25-add-research-skill/.openspec.yaml
    • Added metadata for the new specification archive.
  • openspec/changes/archive/2026-02-25-add-research-skill/design.md
    • Added a design document outlining the context, goals, decisions (script-first research-skill, file-based persistence, background execution, structured results, prioritized bot routing), risks, and migration plan for the research-skill.
  • openspec/changes/archive/2026-02-25-add-research-skill/proposal.md
    • Added a proposal document detailing the 'why' and 'what changes' for the research-skill, including new and modified capabilities and impact on various backend services.
  • openspec/changes/archive/2026-02-25-add-research-skill/specs/bot-platform/spec.md
    • Added new requirements for the bot platform regarding prioritized script-first routing for external research tasks and traceability of research job results.
  • openspec/changes/archive/2026-02-25-add-research-skill/specs/line-bot/spec.md
    • Added new requirements for the Line Bot, specifying two-stage interaction for long-running research tasks and detailed handling of research progress queries.
  • openspec/changes/archive/2026-02-25-add-research-skill/specs/research-skill/spec.md
    • Added a new specification document for the research-skill, defining requirements for asynchronous task initiation, progress/result querying, and status tracking with diagnostic information.
  • openspec/changes/archive/2026-02-25-add-research-skill/tasks.md
    • Added a task list detailing the implementation steps for the research-skill, from skeleton creation to integration and validation.
  • openspec/specs/bot-platform/spec.md
    • Appended new requirements for external research task routing and result traceability to the bot platform specification.
  • openspec/specs/line-bot/spec.md
    • Appended new requirements for two-stage interaction and progress query responses for long-running research tasks to the Line Bot specification.
  • openspec/specs/research-skill/spec.md
    • Added a new specification document for the research-skill, detailing its purpose and requirements for asynchronous task initiation, progress/result querying, and status tracking with diagnostic information.
Activity
  • The author, yazelin, has performed several testing steps including npm run build, Python compile checks, verified 41 backend tests, and conducted manual smoke tests for the start-check flow.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

此 PR 導入了一個新的 research-skill,用以非同步處理長時間執行的研究任務,並採用 start/check 兩階段機制與背景工作來避免 session timeout。儘管這是一個出色的功能增強,但審查發現了兩個高嚴重性安全漏洞:research 腳本中存在 Server-Side Request Forgery (SSRF) 以及機器人回應處理邏輯中存在 Malicious Message Injection 漏洞,兩者皆源於對從外部網站獲取的數據缺乏足夠的驗證和清理。此外,我的審查還發現了一個正規表示式中的錯誤,並建議重構一些過長的函式以提高程式碼的可維護性和清晰度。

}

try:
response = client.get(url)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The start-research script is vulnerable to Server-Side Request Forgery (SSRF). The urls parameter is used to fetch content from external websites, but the validation in _normalize_url only checks for the protocol scheme and the presence of a network location. It does not prevent access to internal IP addresses or hostnames (e.g., 127.0.0.1, 169.254.169.254, localhost). An attacker could exploit this to scan the internal network or access sensitive metadata services on the server.

Recommendation: Implement a strict validation check to ensure that the provided URLs do not resolve to private or reserved IP ranges. You can use a library like ipaddress to check the resolved IP of the hostname before making the request.

Comment on lines +812 to +825
if research_feedback:
script_name = research_feedback.get("script")
feedback_text = str(research_feedback.get("message") or "").strip()
job_id = str(research_feedback.get("job_id") or "").strip()

# start-research:確保回覆一定包含 job_id
if script_name == "start-research" and feedback_text:
if not ai_response:
ai_response = feedback_text
elif job_id and job_id not in ai_response:
ai_response = ai_response.rstrip() + "\n\n" + feedback_text
# check-research:若 AI 無文字內容,直接使用工具回傳摘要
elif feedback_text and not str(ai_response or "").strip():
ai_response = feedback_text

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The bot is vulnerable to Malicious Message Injection. The tool output from the research skill (specifically the final_summary and partial_results which contain content fetched from external websites) is directly incorporated into the ai_response without sanitization. Since the bot's message parser (parse_ai_response) extracts and executes special tags like [FILE_MESSAGE:...] to send files or images, an attacker can host a malicious website containing these tags. When the research skill fetches the site, the tags will be included in the bot's response and interpreted as internal instructions, allowing the attacker to spoof bot messages or trick the bot into sending arbitrary links/images.

Recommendation: Sanitize the tool output before including it in the ai_response. You should escape or remove any patterns that match the bot's internal command tags (like [FILE_MESSAGE:...]) from the text fetched from external sources.


_SCRIPT_STYLE_RE = re.compile(r"(?is)<(script|style|noscript).*?>.*?</\\1>")
_TAG_RE = re.compile(r"(?is)<[^>]+>")
_WHITESPACE_RE = re.compile(r"\\s+")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

_WHITESPACE_RE 這個正規表示式被定義為 re.compile(r"\\s+")。在 raw string 中,\\s+ 會匹配一個字面上的反斜線 \ 加上 "s",而不是空白字元。這很可能是一個錯誤,會導致 _strip_html 函式中壓縮空白的功能無法正常運作。

Suggested change
_WHITESPACE_RE = re.compile(r"\\s+")
_WHITESPACE_RE = re.compile(r"\s+")

Comment on lines +291 to +408
def _extract_research_tool_feedback(tool_calls: list) -> dict | None:
"""從 run_skill_script(research-skill) 的工具輸出組合可回覆訊息。"""
for tool_call in reversed(tool_calls or []):
tool_name = getattr(tool_call, "name", "")
if tool_name != "mcp__ching-tech-os__run_skill_script":
continue

tool_input = getattr(tool_call, "input", {}) or {}
if tool_input.get("skill") != "research-skill":
continue

script_name = str(tool_input.get("script") or "")
raw_output = str(getattr(tool_call, "output", "") or "")
wrapper = _parse_json_object(raw_output)
if not wrapper:
continue

payload = wrapper
nested_output = wrapper.get("output")
if isinstance(nested_output, str):
parsed_nested = _parse_json_object(nested_output)
if parsed_nested:
payload = parsed_nested

# start-research: 確保 job_id 一定回到使用者
if script_name == "start-research":
if payload.get("success") is True:
job_id = str(payload.get("job_id") or "").strip()
if job_id:
message = (
f"✅ 研究任務已受理(job_id: {job_id})。\n"
f"請稍後提供 job_id 或輸入「查詢研究進度 {job_id}」,我會幫你查最新狀態。"
)
else:
message = "✅ 研究任務已受理,請稍後再查詢進度。"
return {
"script": script_name,
"job_id": job_id,
"message": message,
}

error = payload.get("error") or wrapper.get("error") or "未知錯誤"
return {
"script": script_name,
"job_id": "",
"message": f"⚠️ 研究任務啟動失敗:{error}",
}

# check-research: 依狀態回覆進度/完成/失敗
if script_name == "check-research":
if payload.get("success") is False:
error = payload.get("error") or wrapper.get("error") or "未知錯誤"
return {
"script": script_name,
"job_id": str(payload.get("job_id") or ""),
"message": f"⚠️ 查詢研究進度失敗:{error}",
}

status = str(payload.get("status") or "").strip()
job_id = str(payload.get("job_id") or "").strip()

if status == "completed":
summary = str(payload.get("final_summary") or "").strip() or "✅ 研究任務已完成。"
sources = payload.get("sources") or []
source_lines = []
for source in sources[:5]:
if not isinstance(source, dict):
continue
title = str(source.get("title") or source.get("url") or "來源")
url = str(source.get("url") or "").strip()
source_lines.append(f"- {title}" + (f"({url})" if url else ""))

message = summary
if source_lines:
message += "\n\n參考來源:\n" + "\n".join(source_lines)
return {
"script": script_name,
"job_id": job_id,
"message": message,
}

if status == "failed":
error = payload.get("error") or "研究任務失敗"
return {
"script": script_name,
"job_id": job_id,
"message": f"⚠️ 研究任務失敗:{error}",
}

status_label = str(payload.get("status_label") or status or "進行中")
progress = payload.get("progress")
progress_text = ""
if isinstance(progress, (int, float)):
progress_text = f" {int(progress)}%"

partial_lines = []
partial_results = payload.get("partial_results") or []
for item in partial_results[:2]:
if not isinstance(item, dict):
continue
snippet = str(item.get("snippet") or "").strip()
if not snippet:
continue
title = str(item.get("title") or item.get("url") or "來源")
partial_lines.append(f"- {title}:{snippet[:120]}" + ("..." if len(snippet) > 120 else ""))

message = f"⏳ 研究任務進行中({status_label}{progress_text})。"
if job_id:
message += f"\njob_id: {job_id}"
if partial_lines:
message += "\n\n目前已取得資料:\n" + "\n".join(partial_lines)
return {
"script": script_name,
"job_id": job_id,
"message": message,
}

return None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

_extract_research_tool_feedback 函式相當長且複雜,它同時處理 start-researchcheck-research 的邏輯,並且在 check-research 中還處理多種狀態(completedfailed、進行中)。這降低了可讀性,也讓維護變得更加困難。
建議將其重構為更小、更專注的輔助函式。例如:

  • _handle_start_research_feedback(...)
  • _handle_check_research_feedback(...)

_handle_check_research_feedback 還可以進一步為每種狀態(completedfailed、進行中)拆分出輔助函式。這將使程式碼更加模組化且易於理解。

Comment on lines +234 to +254
def _build_final_summary(query: str, fetched_results: list[dict]) -> str:
"""根據已擷取內容產生最終統整。"""
ok_results = [item for item in fetched_results if item.get("fetch_status") == "ok" and item.get("content")]
failed_results = [item for item in fetched_results if item.get("fetch_status") != "ok"]

if not ok_results:
failed_count = len(failed_results)
if failed_count:
return f"針對「{query}」目前未取得可用內容,共有 {failed_count} 個來源擷取失敗。"
return f"針對「{query}」目前未取得可用內容。"

lines = [f"研究主題:{query}", "", "重點整理:"]
for idx, item in enumerate(ok_results[:4], start=1):
lines.append(f"{idx}. {item['title']}")
lines.append(f" {_truncate(item['content'], 320)}")

if failed_results:
lines.append("")
lines.append(f"備註:另有 {len(failed_results)} 個來源擷取失敗。")

return "\n".join(lines).strip()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

函式名稱 _build_final_summary 以及註解中的「重點整理」暗示了它會執行摘要功能。然而,目前的實作只是將標題和截斷的內容片段串接起來,並未執行任何真正的摘要。這可能會讓未來的維護者感到困惑。

為了更準確地反映其行為,建議將其重新命名為 _format_fetched_results 之類的名稱。

Comment on lines +291 to +419
def _do_research(
job_dir: Path,
status_path: Path,
job_id: str,
query: str,
seed_urls: list[str],
max_results: int,
max_fetch: int,
) -> None:
"""背景程序:執行研究流程。"""
status_data = {
"job_id": job_id,
"status": "starting",
"status_label": "啟動中",
"progress": 0,
"query": query,
"sources": [],
"partial_results": [],
"final_summary": "",
"error": None,
"created_at": datetime.now().isoformat(),
}
_write_status(status_path, status_data)

try:
headers = {"User-Agent": USER_AGENT}
with httpx.Client(timeout=HTTP_TIMEOUT_SEC, follow_redirects=True, headers=headers) as client:
# 1) 搜尋來源
status_data["status"] = "searching"
status_data["status_label"] = "搜尋中"
status_data["progress"] = 15
_write_status(status_path, status_data)

candidate_sources: list[dict] = []
seen: set[str] = set()

for idx, url in enumerate(seed_urls[:MAX_SEED_URLS], start=1):
if url in seen:
continue
seen.add(url)
candidate_sources.append(
{
"title": f"指定來源 {idx}",
"url": url,
"snippet": "",
}
)

if len(candidate_sources) < max_results:
search_results = _search_duckduckgo(client, query, max_results=max_results)
for item in search_results:
normalized_url = _normalize_url(str(item.get("url", "")))
if not normalized_url or normalized_url in seen:
continue
seen.add(normalized_url)
candidate_sources.append(
{
"title": str(item.get("title", "來源")),
"url": normalized_url,
"snippet": str(item.get("snippet", "")),
}
)
if len(candidate_sources) >= max_results:
break

if not candidate_sources:
raise RuntimeError("找不到可用的研究來源")

status_data["sources"] = candidate_sources
status_data["progress"] = 30
_write_status(status_path, status_data)

# 2) 擷取內容
status_data["status"] = "fetching"
status_data["status_label"] = "擷取中"
status_data["progress"] = 35
_write_status(status_path, status_data)

to_fetch = candidate_sources[: max(1, max_fetch)]
total_fetch = len(to_fetch)
fetched_results: list[dict] = []

for idx, source in enumerate(to_fetch, start=1):
fetched = _fetch_source(client, source)
fetched_results.append(fetched)
status_data["partial_results"] = [
{
"title": item.get("title"),
"url": item.get("url"),
"fetch_status": item.get("fetch_status"),
"snippet": item.get("snippet", ""),
"error": item.get("error"),
}
for item in fetched_results
]
status_data["progress"] = min(85, 35 + int(idx / total_fetch * 50))
_write_status(status_path, status_data)

# 3) 統整結果
status_data["status"] = "synthesizing"
status_data["status_label"] = "統整中"
status_data["progress"] = 90
_write_status(status_path, status_data)

final_summary = _build_final_summary(query, fetched_results)
result_path = job_dir / "result.md"
_write_result_markdown(
result_path=result_path,
query=query,
final_summary=final_summary,
fetched_results=fetched_results,
)

date_str = job_dir.parent.name
status_data["status"] = "completed"
status_data["status_label"] = "完成"
status_data["progress"] = 100
status_data["final_summary"] = final_summary
status_data["error"] = None
status_data["result_file_path"] = str(result_path)
status_data["result_ctos_path"] = f"ctos://linebot/research/{date_str}/{job_id}/result.md"
_write_status(status_path, status_data)
except (httpx.HTTPError, OSError, RuntimeError, ValueError) as exc:
status_data["status"] = "failed"
status_data["status_label"] = "失敗"
status_data["error"] = str(exc)
status_data["progress"] = status_data.get("progress", 0)
_write_status(status_path, status_data)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

_do_research 函式超過 100 行,包含了整個研究流程(搜尋、擷取、統整)。這使得它難以閱讀和維護。其邏輯已經透過註解區分為不同區塊。

建議將每個步驟提取到各自的輔助函式中,例如 _search_sources_fetch_all_sources_synthesize_results。這將提高模組化程度,並使主要工作流程更容易理解。

@yazelin
Copy link
Owner Author

yazelin commented Feb 25, 2026

已併入 PR #127 一起 merge,此 PR 不再需要。

@yazelin yazelin closed this Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant