feat(wren-ai-service): Add Instructions for SQL Generation #1376

paopa · 2025-03-07T08:49:15Z

Overview

This PR introduces a new instructions indexing and retrieval pipeline and endpoints to enhance SQL generation by incorporating custom instructions during query generation.

Key Changes

Added new pipeline configurations for instructions_indexing and instructions_retrieval across different LLM providers (OpenAI, Azure, Deepseek, Google AI, Groq, Ollama)
Integrated instructions retrieval into SQL generation pipelines
Added new settings for instructions similarity threshold and top-k retrieval
Enhanced the service container to support instructions service

Configuration Updates

Added new settings:
- instructions_similarity_threshold (default: 0.7)
- instructions_top_k (default: 10)

Pipeline Changes

Added support for instructions in SQL generation prompts
Modified prompt construction to include retrieved instructions
Added new Instructions pipeline class for indexing and retrieval

Testing Recommendations

Verify instructions indexing works with different LLM providers
Test SQL generation with custom instructions
Validate instruction retrieval with different similarity thresholds
Check integration with existing SQL generation pipelines

Summary by CodeRabbit

New Features
- Introduced enhanced instruction pipelines that support advanced indexing, retrieval, and deletion capabilities with real-time event status tracking.
- Extended query and SQL generation processes to accept additional instruction data, improving user query handling.
Tests
- Added comprehensive tests to validate the indexing, retrieval, and deletion functionalities, ensuring robust performance and reliability.

coderabbitai · 2025-03-07T08:49:22Z

Walkthrough

This pull request introduces new pipeline entries for handling instructions in the Wren AI service. It adds two new entries—instructions_indexing and instructions_retrieval—across multiple configuration files and enhances the service architecture by incorporating new parameters and service methods. The changes update API endpoints, modify pipelines for SQL generation and indexing, and introduce a new InstructionsService with associated tests. Additionally, minor refactorings and improvements in YAML formatting were performed.

Changes

File(s)	Change Summary
`deployment/kustomizations/base/cm.yaml`, `docker/config.example.yaml`, `wren-ai-service/docs/config_examples/config..yaml`, `wren-ai-service/tools/config/config.yaml`	Added new pipeline/pipe entries for `instructions_indexing` and `instructions_retrieval` using `litellm_embedder.default` and `qdrant`; updated model list formatting in YAML files.
`wren-ai-service/src/config.py`, `wren-ai-service/src/globals.py`	Introduced new configuration attributes (`instructions_similarity_threshold`, `instructions_top_k`) and updated the service container to integrate instructions pipelines.
`wren-ai-service/src/pipelines/generation/*`, `wren-ai-service/src/pipelines/generation/utils/sql.py`	Updated method signatures in SQL generation pipelines to accept an optional `instructions` parameter and adjusted the `construct_instructions` function accordingly.
`wren-ai-service/src/pipelines/indexing/`, `wren-ai-service/src/pipelines/retrieval/`	Added new files and modifications to implement pipelines for indexing and retrieving instructions including new classes, functions, and updated `__all__` declarations.
`wren-ai-service/src/web/v1/routers/instructions.py`, `wren-ai-service/src/web/v1/routers/__init__.py`	Introduced a new FastAPI router with endpoints for posting, deleting, and retrieving instructions status.
`wren-ai-service/src/web/v1/services/instructions.py`, `wren-ai-service/src/web/v1/services/ask.py`, `wren-ai-service/src/web/v1/services/semantics_preparation.py`	Added a new `InstructionsService` and integrated instructions retrieval into the ask service; updated deletion handling logic in semantics preparation.
`wren-ai-service/tests/pytest/pipelines/indexing/test_instructions.py`, `wren-ai-service/tests/pytest/services/test_instructions.py`	Added new test suites to cover the indexing, retrieval, deletion, and overall functionality of instructions pipelines and the InstructionsService.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant C as Client
    participant R as FastAPI Router (/instructions)
    participant S as InstructionsService
    participant P as Instructions Pipeline
    participant DS as Document Store (qdrant)
    
    C->>R: POST /instructions with instructions payload
    R->>S: Trigger indexing via background task
    S->>P: Invoke instructions_indexing pipeline
    P->>P: Convert instructions to documents
    P->>P: Generate embeddings with litellm_embedder.default
    P->>DS: Write documents to qdrant
    P-->>S: Return indexing result
    S-->>R: Update event status in TTL cache
    R-->>C: Respond with event ID

sequenceDiagram
    autonumber
    participant C as Client
    participant A as AskService
    participant IR as Instructions Retrieval Pipeline
    participant DS as Document Store (qdrant)
    participant SQL as SQL Generation Pipeline
    
    C->>A: Send query with project ID
    A->>IR: Call instructions_retrieval pipeline
    IR->>DS: Retrieve documents using qdrant
    IR-->>A: Return retrieved instructions
    A->>SQL: Pass instructions along with SQL samples
    SQL-->>A: Generate SQL query response
    A-->>C: Return final response

Possibly related PRs

chore(wren-ai-service): minor updates #1326: Introduced similar pipeline entries for instructions indexing and retrieval using the same embedder and document store.
Create config.azure.yaml #1248: Added new pipeline entries for instructions processing with similar configuration changes and embedder usage.
chore(wren-ai-service): minor update #1210: Enhanced handling of instructions within pipeline contexts, similar to the modifications implemented here.

Poem

I'm a coding rabbit, hopping with delight,
New instructions added, making pipelines bright.
Indexing and retrieval now run so fine,
With Qdrant and litellm in perfect line.
I nibble on tests and code with glee,
Proud to see our Wren AI leap free!
🐇✨

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

… and retrieval pipe

…ocument content for instruction

…n pair

…s for object

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (16)

wren-ai-service/src/pipelines/generation/utils/sql.py (1)
552-558: Consider addressing the TODO comment

There's a TODO comment about refactoring the format of the instructions. Consider specifying what refactoring is needed or creating a follow-up task for this refactoring.
-        # todo: refactor the format of the instructions
+        # TODO: Refactor the format of the instructions to support structured metadata and additional properties
wren-ai-service/src/web/v1/services/ask.py (1)
367-375: Implementation of instructions retrieval looks good with minor typo.

The code successfully retrieves instructions using the instructions_retrieval pipeline, but there's a typo in the first TODO comment ("retireve" should be "retrieve").
-# todo: consider to retireve at the same time with sql_samples
+# todo: consider to retrieve at the same time with sql_samples
wren-ai-service/tests/pytest/pipelines/indexing/test_instructions.py (1)

21-21: Use a unique pipeline instance for each test to prevent cross-test interference.

While the tests appear to be functioning correctly, consider creating a fresh pipeline instance inside each test function (or as a fixture) to avoid potential cross-test interference when tests are run concurrently or in parallel.

wren-ai-service/src/web/v1/services/instructions.py (2)

34-41: Be mindful of large TTL cache sizes.

TTLCache is configured with maxsize=1_000_000, which may lead to high memory usage in long-running environments. Ensure your system capacity can handle this or consider a smaller cache limit if usage patterns do not require storing that many events.

78-86: Skip indexing instructions with empty questions if desired.

Currently, each question spawns a separate instruction for indexing. If empty or trivial questions are present, it might create redundant documents. Consider filtering them out if your use case requires it.

wren-ai-service/tests/pytest/services/test_instructions.py (1)

13-25: Recreate index in a fixture teardown to ensure isolation.

While recreate_index=True is used, consider a teardown step that consistently wipes the index after each test to avoid accidental state leaks if future tests mutate the store without re-creating.

wren-ai-service/src/web/v1/routers/instructions.py (2)

88-108: Ensure thread safety when storing events
Storing event_id references in the service dictionary could expose race conditions in high-concurrency scenarios. If multiple indexing requests come in simultaneously, consider using an asynchronous or thread-safe data structure/persistence to avoid data overwrites.

116-139: Return more specific HTTP status codes
Currently, the endpoint sets a 500 status code when event.status == "failed". In certain failures (e.g., invalid IDs), a 400 or 404 might be more appropriate. Consider refining error-handling to reflect the nature of failures accurately.

wren-ai-service/src/pipelines/indexing/instructions.py (4)

20-25: Clarify the role of question vs. instruction
The Instruction model includes both instruction and question fields. Explain or rename them to avoid ambiguity. It’s not entirely clear how each field is used downstream.

28-51: Validate instructions before document conversion
The InstructionsConverter returns Document objects with minimal validation. If instructions are malformed, the indexing pipeline may fail downstream. Consider adding guards or skipping invalid instructions.

80-87: Allow for empty instructions
When instructions list is empty, the pipeline returns an empty dict quietly. Confirm if this is the correct behavior or if a warning should be logged for visibility.

123-177: Add coverage for error-handling paths
The Instructions pipeline has a well-defined flow for indexing but lacks explicit error-handling logic if components fail. Consider adding try/except blocks or centralized error handling to prevent partial writes or inconsistent states.

wren-ai-service/src/pipelines/retrieval/instructions.py (4)

18-35: Handle empty or inconsistent metadata
When constructing outputs, confirm that the documents’ meta fields (e.g., "instruction", "instruction_id") have valid contents. An unexpected None could propagate. Optionally, add fallback values or warning logs.

88-103: Expose reason for filtered documents
filtered_documents can remove many relevant results if similarity_threshold or top_k is strict. Consider logging how many documents were filtered out, for transparency in debugging.

127-139: Combine results more transparently
formatted_output merges default instructions with retrieved documents. Log or document the final ordering logic in case the user expects default instructions to appear first or last.

144-171: Standardize pipeline usage
The Instructions retrieval pipeline is straightforward, but ensure that the naming of pipeline steps matches the indexing pipeline. Differences in naming or usage across indexing vs. retrieval might confuse maintainers.

Would you like help aligning naming conventions between indexing and retrieval?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f0f7a6 and 8719a3d.

📒 Files selected for processing (27)

deployment/kustomizations/base/cm.yaml (2 hunks)
docker/config.example.yaml (2 hunks)
wren-ai-service/docs/config_examples/config.azure.yaml (1 hunks)
wren-ai-service/docs/config_examples/config.deepseek.yaml (1 hunks)
wren-ai-service/docs/config_examples/config.google_ai_studio.yaml (1 hunks)
wren-ai-service/docs/config_examples/config.groq.yaml (1 hunks)
wren-ai-service/docs/config_examples/config.ollama.yaml (1 hunks)
wren-ai-service/src/config.py (1 hunks)
wren-ai-service/src/globals.py (4 hunks)
wren-ai-service/src/pipelines/generation/followup_sql_generation.py (4 hunks)
wren-ai-service/src/pipelines/generation/sql_generation.py (4 hunks)
wren-ai-service/src/pipelines/generation/utils/sql.py (1 hunks)
wren-ai-service/src/pipelines/indexing/__init__.py (2 hunks)
wren-ai-service/src/pipelines/indexing/instructions.py (1 hunks)
wren-ai-service/src/pipelines/indexing/sql_pairs.py (2 hunks)
wren-ai-service/src/pipelines/retrieval/__init__.py (2 hunks)
wren-ai-service/src/pipelines/retrieval/instructions.py (1 hunks)
wren-ai-service/src/web/v1/routers/__init__.py (2 hunks)
wren-ai-service/src/web/v1/routers/instructions.py (1 hunks)
wren-ai-service/src/web/v1/services/__init__.py (2 hunks)
wren-ai-service/src/web/v1/services/ask.py (3 hunks)
wren-ai-service/src/web/v1/services/instructions.py (1 hunks)
wren-ai-service/src/web/v1/services/semantics_preparation.py (2 hunks)
wren-ai-service/tests/pytest/pipelines/indexing/test_instructions.py (1 hunks)
wren-ai-service/tests/pytest/services/test_instructions.py (1 hunks)
wren-ai-service/tools/config/config.example.yaml (2 hunks)
wren-ai-service/tools/config/config.full.yaml (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: pytest
GitHub Check: pytest

🔇 Additional comments (49)

wren-ai-service/docs/config_examples/config.groq.yaml (1)

127-132: New Pipeline Entries for Instructions Indexing and Retrieval

The new entries for instructions_indexing and instructions_retrieval are correctly defined with an embedder set to litellm_embedder.default and a document store of qdrant. Their structure is consistent with the other pipeline definitions in this configuration file. Please ensure that any related runtime logic or documentation references these entries appropriately.

wren-ai-service/docs/config_examples/config.deepseek.yaml (1)

145-150: Consistent Addition of Instructions Pipelines

The additions of instructions_indexing and instructions_retrieval using litellm_embedder.default and document_store: qdrant are properly integrated into the pipelines section. This change aligns well with the configuration style observed in other parts of the file. Consider verifying that any changes in pipeline behavior due to these new entries are covered by your integration tests.

wren-ai-service/tools/config/config.full.yaml (1)

152-157: Integration of Instructions Pipeline in Full Configuration

The configuration now includes two new pipeline entries, instructions_indexing and instructions_retrieval, which are both correctly configured with the expected embedder and document store. Their placement within the full configuration file maintains consistency with other similar pipeline entries. Ensure that the documentation and any related configuration references are updated accordingly.

wren-ai-service/docs/config_examples/config.google_ai_studio.yaml (1)

134-139: Addition of Instructions Pipelines for Google AI Studio Config

The new pipeline entries for instructions_indexing and instructions_retrieval are implemented with the same consistent parameters (litellm_embedder.default and qdrant). The additions are clear and follow the expected YAML syntax. Please double-check that any service consuming these pipelines is aware of the new configuration options.

wren-ai-service/tools/config/config.example.yaml (1)

154-159: Incorporating Instructions Pipelines into the Example Configuration

The new entries for instructions_indexing and instructions_retrieval have been added in a manner consistent with the rest of the pipeline entries. They utilize the same embedder and document store settings as other similar pipelines. This consistency helps ensure that the new SQL generation instruction capabilities integrate smoothly across different deployments. Verify that the example configuration documentation reflects these changes.

docker/config.example.yaml (3)

5-30: Well-structured model configuration

The restructuring of the models section under the llm type to use a hyphen-prefixed list format improves readability and standardizes the YAML structure. This format is more consistent with YAML best practices.

37-41: Consistent formatting for embedder models

The embedder model configuration follows the same improved structure as the LLM models, maintaining consistency throughout the configuration file.

136-141: New instructions pipeline entries align with PR objectives

The addition of instructions_indexing and instructions_retrieval pipelines successfully implements the core functionality described in the PR objectives. These entries correctly use the litellm_embedder.default embedder and the qdrant document store, maintaining consistency with other similar pipeline configurations.

wren-ai-service/src/web/v1/services/__init__.py (2)

66-66: Import added in alphabetical order

The InstructionsService import is correctly added in alphabetical order within the import section, adhering to good code organization practices.

89-89: Service properly exposed through all

Adding InstructionsService to the __all__ list ensures that it's correctly exposed as part of the module's public API, allowing it to be imported by other components.

wren-ai-service/src/web/v1/routers/__init__.py (2)

8-8: Import added in alphabetical order

The instructions module import is correctly added in alphabetical order within the import list, which is good practice for code organization.

32-32: Router integration

The instructions.router is properly included in the main router, which will expose the new instructions endpoints in the API. This completes the integration of the new feature into the API routing system.

wren-ai-service/src/config.py (1)

35-36: Configuration parameters for instructions feature

The addition of instructions_similarity_threshold and instructions_top_k parameters with sensible defaults (0.7 and 10 respectively) aligns with the PR objectives. These parameters are appropriately placed in the indexing and retrieval config section, following a similar pattern to the existing sql_pairs parameters.

Would you like to add documentation comments for these new parameters to match the style of other documented parameters in this file?

wren-ai-service/src/pipelines/indexing/__init__.py (1)

93-93: Proper addition of the Instructions module to the indexing pipeline

The addition of the Instructions import and its inclusion in the __all__ list correctly exposes the new instructions functionality to the rest of the codebase. This aligns well with the PR objective of implementing a new instructions indexing and retrieval pipeline.

Also applies to: 102-102

wren-ai-service/docs/config_examples/config.ollama.yaml (1)

124-129: Correctly configured instructions pipeline entries

The addition of instructions_indexing and instructions_retrieval pipeline entries with appropriate configurations for embedder and document store matches the PR objectives. This implementation ensures that the Ollama LLM provider can properly utilize the new instructions functionality.

wren-ai-service/docs/config_examples/config.azure.yaml (1)

135-140: Well-implemented instructions pipeline for Azure

The new pipeline entries for instructions indexing and retrieval are properly configured with the Azure embedder and document store. This implementation maintains consistency with the other LLM providers and ensures that custom instructions can be incorporated during SQL generation when using Azure.

wren-ai-service/src/pipelines/retrieval/__init__.py (1)

2-2: Instructions retrieval pipeline properly integrated

The addition of the Instructions import and its inclusion in the __all__ list correctly exposes the instructions retrieval functionality. This complements the indexing pipeline and completes the implementation of the instructions feature.

Also applies to: 14-14

deployment/kustomizations/base/cm.yaml (1)

184-189: New pipeline entries for instructions handling look good

The addition of instructions_indexing and instructions_retrieval pipelines is well-structured and consistent with existing pipeline definitions. These entries use the same embedder and document store as other similar pipelines, which maintains consistency in the configuration.

wren-ai-service/src/pipelines/generation/followup_sql_generation.py (4)

78-78: Parameter addition for instructions looks good

The optional instructions parameter with proper type hints matches the PR's goal of incorporating custom instructions into the SQL generation process.

95-96: Correctly passing instructions to construct_instructions

The parameter is correctly passed to the construct_instructions function.

160-161: Parameter addition to run method looks good

The optional instructions parameter is consistently added to the class's run method with the same type definition as in the prompt function.

176-177: Correctly including instructions in pipeline inputs

The instructions parameter is properly included in the inputs dictionary passed to the pipeline execution.

wren-ai-service/src/pipelines/generation/sql_generation.py (4)

68-69: Parameter addition for instructions looks good

The optional instructions parameter with proper type hints matches the PR's goal of incorporating custom instructions into the SQL generation process.

81-82: Correctly passing instructions to construct_instructions

The parameter is correctly passed to the construct_instructions function.

148-149: Parameter addition to run method looks good

The optional instructions parameter is consistently added to the class's run method with the same type definition as in the prompt function.

161-162: Correctly including instructions in pipeline inputs

The instructions parameter is properly included in the inputs dictionary passed to the pipeline execution.

wren-ai-service/src/pipelines/generation/utils/sql.py (2)

537-538: Parameter addition to construct_instructions function looks good

The optional instructions parameter is added with appropriate type hints, maintaining consistency with changes in other files.

542-543: Good variable rename to avoid parameter name clash

Renaming the variable from instructions to _instructions avoids a name clash with the parameter of the same name.

wren-ai-service/src/web/v1/services/semantics_preparation.py (4)

92-96: Code reformatting looks good.

The reformatting of the assignment to self._prepare_semantics_statuses improves readability by properly structuring the multi-line assignment with parentheses.

100-108: Consistency in formatting applied correctly.

Similar to the previous block, this reformatting maintains consistent style throughout the file while improving readability.

138-138: Parameter addition supports extensibility.

The addition of **kwargs to the method signature allows for passing arbitrary keyword arguments, making the method more flexible for future extensions.

149-149: Instructions successfully integrated into deletion process.

Adding "instructions" to the list of names ensures that instruction data is properly cleaned up when semantics are deleted, which aligns with the PR objective of adding instruction handling capabilities.

wren-ai-service/src/web/v1/services/ask.py (2)

426-426: Instructions successfully integrated into followup SQL generation.

The retrieved instructions are correctly passed to the followup_sql_generation pipeline, aligning with the PR objective of incorporating custom instructions during the query generation process.

440-440: Instructions successfully integrated into SQL generation.

The retrieved instructions are correctly passed to the sql_generation pipeline, ensuring consistent instruction handling across both initial queries and follow-up queries.

wren-ai-service/src/pipelines/indexing/sql_pairs.py (2)

128-128: Default parameter increases flexibility.

Making the embedding parameter optional with a default empty dictionary provides more flexibility in how the function can be called, which is particularly useful when this pipeline is integrated with the new instructions functionality.

218-218: Default parameter consistency maintained.

Similarly, making the sql_pairs parameter in the clean method optional with a default empty list maintains consistency with the above change and increases the method's flexibility.

wren-ai-service/src/globals.py (4)

29-29: New service properly added to container class.

The addition of the instructions_service field to the ServiceContainer class properly extends the service architecture to support the new instructions functionality.

71-73: Instructions indexing pipeline correctly configured.

The instructions indexing pipeline is properly integrated into the semantics preparation service, which allows for managing instruction data alongside other semantics components.

99-103: Instructions retrieval pipeline well-configured with settings.

The code correctly initializes the instructions retrieval pipeline with configurable similarity threshold and top-k parameters, which aligns with the PR objective of adding new settings for managing instructions similarity thresholds and retrieval options.

247-254: Instructions service properly instantiated.

The instructions_service is correctly instantiated with the appropriate pipeline configuration, completing the integration of the new instructions functionality into the service container.

wren-ai-service/tests/pytest/pipelines/indexing/test_instructions.py (1)

44-49: Consider checking the exact content of document.meta.

You currently verify that specific keys exist in document metadata. For deeper coverage, consider asserting the exact values of instruction_id, instruction, or is_default to ensure the pipeline’s indexing logic is fully correct.

wren-ai-service/src/web/v1/services/instructions.py (1)

125-127: Validate instruction IDs.

Ensure that invalid or empty instruction IDs are effectively handled. Currently, the code does not appear to guard against empty or malformed IDs. If needed, add validation to prevent indexing or deleting instructions with invalid IDs.

wren-ai-service/tests/pytest/services/test_instructions.py (1)

229-246: Validate coverage for instructions referencing multiple questions.

Your tests demonstrate indexing a single question per instruction. Consider adding a case with multiple questions in a single instruction to confirm that each question is properly expanded into separate documents.

wren-ai-service/src/web/v1/routers/instructions.py (2)

79-82: Consider adding validation for instruction fields
Currently, the PostRequest model references InstructionsService.Instruction but does not impose additional constraints (e.g., max length). For robustness and security, ensure that the instruction data is validated (e.g., length, special characters) before indexing.

148-154: Handle missing event IDs more gracefully
Accessing container.instructions_service[event_id] may raise a KeyError if event_id is not found, leading to a 500 response. Consider adding explicit error handling to return a 404 or custom error response if the ID is invalid or no longer in memory.

wren-ai-service/src/pipelines/indexing/instructions.py (2)

53-75: Confirm index existence before deletion
When deleting documents, ensure the targeted collection or index (instructions dataset) exists in the document store. Consider logging or handling the case where no matching documents are found for the specified IDs.

97-110: Consider concurrency checks for partial deletes
The clean function could experience conflicts if multiple concurrent deletions target different sets of instruction IDs. Consider concurrency handling or locking if partial overlap in instruction IDs might cause issues.

wren-ai-service/src/pipelines/retrieval/instructions.py (2)

38-54: Ensure count_documents aligns with indexing filter logic
count_documents applies a project ID filter. Confirm that the filters used here match the ones used during indexing so that the counted dataset is consistent with what was stored.

105-125: Use consistent approach for retrieving default instructions
default_instructions uses synchronous store.filter_documents, unlike the async calls in other pipeline steps. For consistency, consider bridging to an async method or describing why a sync call is acceptable here.

wren-ai-service/src/pipelines/retrieval/instructions.py

paopa added 16 commits March 7, 2025 16:54

chore: update the all config for instructions indexing

3dbaeb7

feat: instructions indexing and clean pipelien

249edeb

chore: update all config for instructions retrieval

12f2099

feat: implement instruction retrieval pipeline

2ae1054

chore: update config.yaml of docker and k8s for instructions indexing…

f406c69

… and retrieval pipe

feat: implement instructions service

d8eefc2

feat: add instruction service into global

d173d1e

feat: implement instruction endpoint

bbca126

feat: trigger delete instruction from vector store when delete semantics

b92ff18

refactor: simplify the code base for deleting semantics

9fdcb94

feat: retrieve the default is true instructions

97ac8f0

chore: add a comment to explain default instruction and correct the d…

04590c1

…ocument content for instruction

feat: flatten the request questions to one-on-one question instructio…

88c7efc

…n pair

feat: retrieve the instructions for ask process

fef523c

feat: correct the instruction format for prompt and fix the wrong pas…

277a4b5

…s for object

chore: set the pipeline config to default embedder

da50bc0

paopa force-pushed the feat/instruction-pipeline-and-endpoints branch from 9127e1a to da50bc0 Compare March 7, 2025 08:59

paopa added 3 commits March 7, 2025 17:02

chore: early return when document count is 0

6ba52b6

feat: add test case for instructions indexing

d628a1f

feat: test case for instruction service

8719a3d

paopa added module/ai-service ai-service related ci/ai-service ai-service related labels Mar 7, 2025

paopa marked this pull request as ready for review March 7, 2025 09:22

coderabbitai bot reviewed Mar 7, 2025

View reviewed changes

wren-ai-service/src/pipelines/retrieval/instructions.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(wren-ai-service): Add Instructions for SQL Generation #1376

feat(wren-ai-service): Add Instructions for SQL Generation #1376

paopa commented Mar 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 7, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

feat(wren-ai-service): Add Instructions for SQL Generation #1376

Are you sure you want to change the base?

feat(wren-ai-service): Add Instructions for SQL Generation #1376

Conversation

paopa commented Mar 7, 2025 • edited by coderabbitai bot Loading

Overview

Key Changes

Configuration Updates

Pipeline Changes

Testing Recommendations

Related Documentation

Summary by CodeRabbit

coderabbitai bot commented Mar 7, 2025 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

paopa commented Mar 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 7, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)