Update the functions of RAG, using Cross-Encoder and Vector index#6
Update the functions of RAG, using Cross-Encoder and Vector index#6beebatter wants to merge 2 commits intocongde:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR upgrades the RAG retrieval pipeline from keyword matching toward embedding-based vector search (Chroma), adds cross-encoder reranking for improved relevance, and introduces an LLM-based intent router to reduce accidental RAG triggers on casual chat.
Changes:
- Enable real embeddings + Chroma vectorstore ingestion (including metadata sanitization via
filter_complex_metadata) and add metadata filtering support in search. - Add cross-encoder reranking (
BAAI/bge-reranker-base) in the context-answering flow and refactorshould_use_ragto use an LLM classifier with keyword fallback. - Add an integration evaluation script (
test_rag_eval.py) and adjust KB init chunking configuration.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
backend/modules/rag/core/knowledge_base.py |
Enables embeddings + Chroma creation with metadata filtering; extends search API with optional metadata filter. |
backend/modules/rag/services/rag_service.py |
Adds reranker-based retrieval path and LLM-driven intent routing for RAG usage. |
init_rag_knowledge.py |
Initializes KB with structure chunking parameters (size/overlap). |
test_rag_eval.py |
Adds a local integration/latency evaluation runner over 20 test cases. |
frontend/package-lock.json |
Updates lockfile dependency spec for react-scripts to match package.json. |
Files not reviewed (1)
- frontend/package-lock.json: Language not supported
Comments suppressed due to low confidence (2)
backend/modules/rag/core/knowledge_base.py:316
- 参数名
filter会遮蔽 Python 内置filter(),降低可读性且 IDE/类型提示容易混淆。建议改名为更语义化的metadata_filter/where,并同步更新日志与调用处。
def search_similar(self, query: str, k: int = 3, filter: Optional[Dict[str, Any]] = None) -> List[Document]:
"""
相似度搜索
Args:
query: 查询文本
k: 返回结果数量
filter: Chroma 元数据过滤器
test_rag_eval.py:2
json在脚本中未使用,建议删除该 import,避免 lint/静态检查告警。
import json
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| response = requests.post(API_URL, json=payload) | ||
| latency = time.time() - start_time |
There was a problem hiding this comment.
requests.post 未设置 timeout,在后端无响应/网络异常时脚本可能长期卡住,影响压测结果与 CI/本地运行体验。建议为请求增加合理的超时(并可将其做成常量/参数)。
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| "react-markdown": "^9.0.0", | ||
| "react-router-dom": "^6.8.0", | ||
| "react-scripts": "5.0.1", | ||
| "react-scripts": "^5.0.1", |
There was a problem hiding this comment.
PR 描述提到已清理 ChatContainer.js 的无用引用以消除 React eslint 警告,但本次变更里前端仅改动了 package-lock.json,且当前 frontend/src/components/ChatContainer.js 仍包含 useState 与大量 icon 引用。若该清理确实需要,请把对应源码修改也纳入本 PR;否则建议更新 PR 描述避免与实际改动不一致。
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| is_rag_needed = "true" in decision.lower() | ||
| logger.info(f"LLM 意图判断 RAG 分类: {decision} -> {is_rag_needed}") | ||
| if is_rag_needed: | ||
| return True |
There was a problem hiding this comment.
LLM 分类结果用 "true" in decision.lower() 解析会把诸如 "not true"、"untrue" 等字符串误判为 True,导致错误触发 RAG。建议严格解析(例如 decision.strip().lower() == "true"),并在无法解析时回退关键词逻辑。
| is_rag_needed = "true" in decision.lower() | |
| logger.info(f"LLM 意图判断 RAG 分类: {decision} -> {is_rag_needed}") | |
| if is_rag_needed: | |
| return True | |
| normalized = decision.strip().lower() | |
| if normalized in ("true", "false"): | |
| is_rag_needed = normalized == "true" | |
| logger.info(f"LLM 意图判断 RAG 分类: {decision} -> {is_rag_needed}") | |
| if is_rag_needed: | |
| return True | |
| else: | |
| logger.warning(f"无法解析 LLM 意图分类结果 '{decision}',回退至关键词检测") |
| message_lower = message.lower() | ||
| has_trigger = any(trigger in message_lower for trigger in rag_triggers) | ||
|
|
||
| # 检查情绪是否需要专业建议 | ||
| needs_professional = emotion and any(prof in emotion for prof in professional_emotions) | ||
|
|
||
| # 检查知识库是否可用 | ||
| rag_available = self.rag_service.is_knowledge_available() | ||
|
|
||
| should_use = (has_trigger or needs_professional) and rag_available | ||
| should_use = has_trigger or needs_professional | ||
|
|
||
| if should_use: | ||
| logger.info(f"触发RAG: trigger={has_trigger}, emotion={needs_professional}") | ||
| logger.info(f"触发RAG(关键词回退): trigger={has_trigger}, emotion={needs_professional}") | ||
|
|
||
| return should_use |
There was a problem hiding this comment.
这里 needs_professional = emotion and ... 在 emotion is None 时会得到 None,从而 should_use = has_trigger or needs_professional 可能变成 None 并最终 return None,与返回类型 -> bool 不一致且可能影响上层逻辑。建议把 needs_professional/should_use 显式转成 bool(例如 needs_professional = bool(emotion) and ...,或 should_use = bool(has_trigger or needs_professional))。
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
|
@copilot open a new pull request to apply changes based on the comments in this thread |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
动机与背景 (Motivation)
原有的 RAG 模块基于基础的字符串/关键词匹配,在实际测试中发现,当用户使用隐喻或口语化表达(如:“感觉像漏气的气球”、“喝了十杯冰美式烙饼”)描述心理症状时,系统无法映射到专业的心理干预知识(召回率为 0)。
为了提升情感陪伴系统的专业度和理解力,我对底层的 RAG 检索管线进行了高阶向量化升级。
核心改动 (Key Changes)
引入真实的向量检索语义映射:接入 Embedding 模型与 Chroma 向量库,并使用 filter_complex_metadata 修复了 LangChain 写入 Chroma 时的复杂元数据奔溃问题。
分块策略优化 (Chunking):将原有的切分逻辑改写为 Structure Chunking(size: 800, overlap: 150),确保心理学长文本在切分后依然保持完整的语境和业务结构。
集成 Cross-Encoder 语义重排:在检索后置链路引入 BAAI/bge-reranker-base,对召回的 Top-20 文档进行高精度二次打分过滤,最后输出 Top-K,大幅提升了相关性下限。
LLM 意图路由分类器:重构了 should_use_rag 逻辑,用 LLM 动态分类替代静态词表,精准区分“日常闲聊”与“专业求助”,防止闲聊误触医学数据库。
前端死代码清理:顺手修复了[ChatContainer.js]中的无用引用(useState, 各种 icons 等)导致的 React eslint 编译警告。
3. 测试与验证 (Testing & Validation)
新增了自动化压测脚本 [test_rag_eval.py],包含了 20 个不同难度梯度的测试用例(覆盖隐喻泛化、闲聊防误触、专业求助三个维度)。
测试结果:
20 题全部成功召回相关上下文,彻底解决原版“隐喻语句 0 召回”的问题。
单次检索延迟:平均保持在 < 0.2s,性能极佳。
测试命令:拉取分支后开启后端,运行 python test_rag_eval.py 即可在终端查看对比基线的评分报告