-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[issue:1933] RAG engine plugin architecture #1967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
huanghuoguoguo
wants to merge
26
commits into
master
Choose a base branch
from
refactor/core-plugin-rag-sdk
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…rvice architecture.
…dgeBase BREAKING CHANGE: RAGPluginAdapter has been removed. All plugin communication is now handled directly by RuntimeKnowledgeBase. Architecture change: - Before: RuntimeKnowledgeBase → RAGPluginAdapter → plugin_connector - After: RuntimeKnowledgeBase → plugin_connector (direct) Changes to kbmgr.py (RuntimeKnowledgeBase): - Remove RAGPluginAdapter import and usage - Inline plugin communication methods: - _on_kb_create(): Notify plugin when KB is created - _on_kb_delete(): Notify plugin when KB is deleted - _ingest_document(): Call plugin for document ingestion - _retrieve(): Call plugin for retrieval - _delete_document(): Call plugin to delete document - Simplify dispose(): Only notify plugin, no built-in VDB assumption Changes to base.py (KnowledgeBaseInterface): - Remove get_type() abstract method (outdated internal/external concept) - Add get_rag_engine_plugin_id() abstract method Changes to localagent.py: - Remove get_type() call - Simplify top_k retrieval from KB entity Deleted files: - pkg/rag/knowledge/plugin_adapter.py Benefits: - Reduced abstraction layer, simpler code - Plugin communication logic centralized in RuntimeKnowledgeBase - Easier to understand and maintain 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
BREAKING CHANGE: ExternalKnowledgeBase has been completely removed. All knowledge bases are now unified under the single KnowledgeBase model, differentiated by their rag_engine_plugin_id. Deleted files: - pkg/api/http/controller/groups/knowledge/external.py (ExternalKBController with /external-bases routes) - pkg/api/http/service/external_kb.py (ExternalKnowledgeBaseService) - pkg/rag/knowledge/external.py (ExternalKnowledgeBase implementation) Modified files: - pkg/entity/persistence/rag.py: Remove ExternalKnowledgeBase SQLAlchemy table definition - pkg/core/app.py: Remove external_kb_service attribute from LangBotApplication - pkg/core/stages/build_app.py: Remove external_kb_service initialization Migration notes: - Existing external knowledge base data should be migrated manually - API consumers should use /api/v1/knowledge/bases for all KB operations - Use /api/v1/knowledge/engines to discover available RAG engines 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove deprecated list_knowledge_retrievers functionality from the plugin communication layer. This aligns with the SDK change that removed the LIST_KNOWLEDGE_RETRIEVERS action. Changes: - connector.py: Remove list_knowledge_retrievers() method - handler.py: Remove list_knowledge_retrievers() handler The functionality is replaced by the new /api/v1/knowledge/engines endpoint which lists available RAGEngine components with their capabilities and configuration schemas. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace type-based checks with capability-based checks for file operations, aligning with the unified knowledge base architecture. Changes to knowledge.py: - store_file(): Replace get_type() check with doc_ingestion capability check - delete_file(): Replace get_type() check with doc_ingestion capability check - list_rag_engines(): Remove list_knowledge_retrievers call, simplify to only list RAGEngine components (KnowledgeRetriever type removed) Changes to pipelines.py: - Minor cleanup related to knowledge base references The capability-based approach allows RAG engines to declare their supported features (doc_ingestion, chunking_config, rerank, hybrid_search) and the system responds accordingly, rather than hardcoding behavior based on internal/external type distinction. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
BREAKING CHANGE: The internal/external knowledge base distinction has been removed from the frontend. All knowledge bases are now displayed in a unified list, differentiated by their RAG engine. Changes to page.tsx: - Remove Tab component (内置/外置 tabs) - Remove selectedKbType state - Unified knowledge base list display - Single "Create Knowledge Base" button for all types Changes to KBDetailDialog.tsx: - Remove kbType prop - Simplify dialog logic for unified KB handling - Documents menu item conditionally shown based on doc_ingestion capability Changes to KBForm.tsx: - Remove retriever type handling code - Simplify form for unified KB creation - Dynamic form rendering based on RAG engine's creation_schema Changes to KBCardVO.ts: - Remove 'type' field from KBCardVO interface Changes to BackendClient.ts: - Remove all external KB related methods: - getExternalKnowledgeBases() - getExternalKnowledgeBase() - createExternalKnowledgeBase() - updateExternalKnowledgeBase() - deleteExternalKnowledgeBase() - retrieveFromExternalKnowledgeBase() Changes to api/index.ts: - Remove ExternalKnowledgeBase interface definition UI/UX improvements: - Users no longer need to understand internal vs external distinction - RAG engine selection is now the primary differentiator - Documents panel visibility is capability-driven (doc_ingestion) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Unify embed_model field naming to embedding_model_uuid only - Add structured error responses with error_type for RAG actions - Fix file_size and mime_type detection in _store_file_task - Improve error handling with detailed error context (error_type, original_error) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Frontend: Refactor Knowledge Base form using DynamicForm components. - Frontend: Remove obsolete jsonSchemaConverter utility. - Backend: Update VectorManager and PluginHandler to support new RAG architecture. - Chore: Update dependencies in pyproject.toml.
- Remove DEBUG stderr outputs in handler.py - Move repeated `import json` to file top - Add warning log for unimplemented delete_by_filter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Define MUTABLE_FIELDS, CREATE_FIELDS, ALL_DB_FIELDS as class constants in KnowledgeBase entity to eliminate duplication. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create RAGRuntimeService to encapsulate RAG capability implementation (Embedding, VectorOps). - Refactor PluginHandler to delegate RAG actions to RAGRuntimeService. - Move KnowledgeService enrichment and creation logic to RAGManager. - Register RAGRuntimeService in Application and BuildAppStage. - Clean up legacy code in KnowledgeService.
- Use self.ap.logger consistently in kbmgr.py and runtime.py, removing module-level loggers. - Fix type hints for retrieve_knowledge in handler.py and connector.py to match implementation returning dict.
- Fix missing key prop in PluginComponentList by using ternary instead of Fragment - Fix RAGEngine.name type to I18nObject and use extractI18nObject() for rendering - Preserves multi-language support 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
P0 fixes: - Fix ALL_DB_FIELDS missing collection_id and emoji fields - Move rag_engine_plugin_id to CREATE_FIELDS (immutable after creation) - Fix creation_settings mutable default value (dict -> None) - Rename vector delete method to delete_by_file_id for correct semantics - Fix delete_by_filter to raise NotImplementedError instead of silent no-op - Add database migration script (dbm019) for new columns and table cleanup P1 fixes: - Clean up design-hesitation comments in connector.py - Add _parse_plugin_id() with format validation for all RAG methods - Make _retrieve() raise exceptions instead of silently returning empty results - Extract _make_rag_error_response() helper for clean error formatting - Remove unused imports from handler.py P2 fixes: - Fix runtime.py indentation inconsistencies - Simplify get_file_stream to use storage abstraction uniformly - Reduce redundant DB queries in knowledge service (extract _check_doc_capability) - Fix engines.py URL encoding: use <path:plugin_id> instead of __ replacement - Add read-only mode for engine settings in KBForm edit mode - Simplify page.tsx handleKBCardClick to pass only kbId string Co-authored-by: Cursor <cursoragent@cursor.com>
- Frontend: add retrieval_settings param to retrieveKnowledgeBase API call
- Backend: return {uuid} from PUT knowledge base to match frontend expectation
- Backend: validate query is non-empty in retrieve endpoint (400 on empty)
- Backend: rename vector_delete ids→file_ids for semantic clarity, keep
backward compat by accepting both 'file_ids' and 'ids' in RPC handler
- Backend: ensure rag_engine.name fallback is always I18nObject-compatible
dict, preventing frontend extractI18nObject from receiving plain strings
- Migration: fix misleading docstring about external_kb data migration
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
eh: Feature
enhance: 新功能添加 / add new features
External Plugin
插件问题或新增插件 / Issues on plugin itself
m: Plugins
插件加载及管理模块 / Plugins loading and management
pd: Need testing
pending: 待测试的PR / PR waiting to be tested
size:XXL
This PR changes 1000+ lines, ignoring generated files.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR implements the RAG Engine Plugin Architecture for LangBot, enabling third-party plugins to provide custom RAG (Retrieval-Augmented Generation) implementations. This is a major architectural refactor that unifies the previously separate "internal" and "external" knowledge base concepts.
Related
Key Changes
1. Unified Knowledge Base Model
KnowledgeBaseentity withrag_engine_plugin_idfield2. RAGEngine Plugin Component
RAGEnginecomponent type for pluginsdoc_ingestion,chunking_config,rerank,hybrid_search)3. RPC Action Protocol
New actions for plugin-host communication:
RAG_KB_CREATEDRAG_KB_DELETEDRAG_INGEST_DOCUMENTRAG_RETRIEVERAG_DELETE_DOCUMENTRAG_EMBED_DOCUMENTSRAG_EMBED_QUERYRAG_VECTOR_UPSERTRAG_VECTOR_SEARCHRAG_VECTOR_DELETERAG_GET_FILE_STREAM4. New API Endpoints
GET /api/v1/knowledge/engines - List available RAG engines
GET /api/v1/knowledge/engines//creation-schema - Get creation form schema
GET /api/v1/knowledge/engines//retrieval-schema - Get retrieval form schema
5. RAGRuntimeService
New service bridging plugin requests to LangBot infrastructure (embedding models, vector database, file storage).
Breaking Changes
/api/v1/knowledge/external-basesendpointsExternalKnowledgeBasetable dropped; new columns added toKnowledgeBaseKnowledgeRetrievercomponent deprecated, useRAGEngineinsteadFile Changes Summary
Added
src/langbot/pkg/rag/service/runtime.py- RAGRuntimeServicesrc/langbot/pkg/api/http/controller/groups/knowledge/engines.py- Engine APIsrc/langbot/pkg/persistence/migrations/dbm019_rag_engine_plugin_architecture.py- DB migrationModified
src/langbot/pkg/rag/knowledge/kbmgr.py- Plugin communication integrationsrc/langbot/pkg/plugin/handler.py- RAG action handlerssrc/langbot/pkg/plugin/connector.py- RAG connector methodssrc/langbot/pkg/api/http/service/knowledge.py- Capability-based checkssrc/langbot/pkg/entity/persistence/rag.py- Updated KB entityweb/src/app/home/knowledge/- Unified UI componentsDeleted
src/langbot/pkg/rag/knowledge/external.pysrc/langbot/pkg/api/http/controller/groups/knowledge/external.pysrc/langbot/pkg/api/http/service/external_kb.pysrc/langbot/pkg/rag/knowledge/services/(old service implementations)web/src/app/home/knowledge/components/external-kb-*/Migration Notes
langbot-plugin-rag) to use built-in RAG features/external-basesto/basesendpointsChecklist