vsearch #174

cncases · 2026-01-17T08:58:07Z

Summary by CodeRabbit

New Features
- Optional vector-based semantic search (toggleable) with timing info; results export and pagination preserve selected search mode and show semantic suggestions when results are few.
Configuration
- Added an optional vector-search endpoint config with a sensible default (used only when semantic search is enabled).
Style
- Added styles for the search-type selector and semantic suggestion elements.
Bug Fixes
- Ignores whitespace-only searches.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-17T08:58:17Z

Caution

Review failed

The pull request is closed.

Walkthrough

Adds an optional, feature-gated vector search ("vsearch") using FastEmbed and Qdrant; updates dependencies and config; extends AppState and controller for dual-mode search (keyword vs vsearch); and adds UI controls and styling to select and propagate search_type.

Changes

Cohort / File(s)	Summary
Dependency Configuration `Cargo.toml`	Added optional `fastembed = { version = "5.8.1", features = ["hf-hub-rustls-tls","ort-download-binaries"], default-features = false, optional = true }` and `qdrant-client = { version = "1.16.0", optional = true }`; bumped `tower` to `0.5.3`; added `max_level_info` to `tracing` features; introduced `[features]` with `default = []` and `vsearch = ["fastembed","qdrant-client"]`.
Configuration Management `config.toml`, `src/config.rs`	Added `qdrant_grpc = "http://localhost:6334"` default in `config.toml`; added feature-gated `pub qdrant_grpc: String` to `Config` (`#[cfg(feature = "vsearch")]`).
Application State & Initialization `src/lib.rs`, `src/bin/main.rs`	Feature-gated import of `Qdrant` and new `#[cfg(feature = "vsearch")] pub qclient: Qdrant` field on `AppState`; main initializes `qclient` via `Qdrant::from_url(CONFIG.qdrant_grpc.as_str()).build().unwrap()` when `vsearch` is enabled and includes it in AppState construction.
Search Controller Logic `src/controller.rs`	Added dual-mode search selected by `search_type`: retains Tantivy keyword path; adds embedding-based vsearch path using FastEmbed (`TextEmbedding`) + Qdrant, computes limit as `limit + offset`, collects IDs from payload or point IDs, trims whitespace-only queries, times operations, handles Qdrant errors, and returns `search_type` and `enable_vsearch`. Public structs updated: `QuerySearch.search_type: Option<String>`; `SearchPage.search_type: String`, `SearchPage.enable_vsearch: bool`.
Templates & UI `templates/search.html`, `static/style.css`	Added UI select for `search_type` gated by `enable_vsearch`; propagate `search_type` in export and pagination links; conditionally show keyword-only result count and prompt semantic search when results are few; added `.suggest_vsearch` and `.search-type-select` CSS rules.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Controller
    participant Embedding as TextEmbedding
    participant Qdrant

    Client->>Controller: POST /search (query, search_type=vsearch)
    Controller->>Controller: trim & validate query
    Controller->>Embedding: embed(query)
    Embedding-->>Controller: embedding vector
    Controller->>Qdrant: search_points(embedding, limit+offset)
    Qdrant-->>Controller: point IDs & scores
    Controller->>Controller: collect IDs from payload or point IDs
    Controller-->>Client: results + search_type metadata

sequenceDiagram
    participant Client
    participant Controller
    participant Tantivy
    participant Qdrant as VSearch

    Client->>Controller: POST /search (query, search_type)
    Controller->>Controller: trim & validate query

    alt search_type == "keyword"
        Controller->>Tantivy: search(query, limit, offset)
        Tantivy-->>Controller: results
    else search_type == "vsearch"
        Controller->>Controller: embed(query)
        Controller->>VSearch: search_points(embedding, limit+offset)
        VSearch-->>Controller: results
    end

    Controller-->>Client: results + metadata (search_type, timing)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I nibble code and plant a vector seed,
FastEmbed hums while Qdrant finds the speed,
Keywords hop, semantics leap and play,
A feature gate opens a brighter way,
The rabbit cheers—searches bloom today! 🌼

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'vsearch' is vague and does not clearly convey what the pull request accomplishes; it appears to be a feature flag name rather than a descriptive summary of changes.	Use a more descriptive title that summarizes the main change, such as 'Add vector search capability with Qdrant integration' or 'Implement semantic search feature using embeddings'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch vsearch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@src/controller.rs`:
- Around line 119-123: The embedding model is being constructed per-request via
TextEmbedding::try_new(...) (with InitOptions and EmbeddingModel::BGESmallZHV15)
which is expensive; instead initialize the TextEmbedding once at startup, add it
to AppState (e.g. an Arc<Mutex<TextEmbedding>> field), create it in main.rs
during bootstrap, and then in controller replace the per-request
TextEmbedding::try_new call with using
state.embedding_model.lock().unwrap().embed(...) (or equivalent locking) to
reuse the single shared model instance.
- Line 124: The code currently calls model.embed(vec![&search], None).unwrap()
and later uses query_vec.into_iter().next().unwrap(), which can panic on embed
failures or empty results; change these to propagate or handle errors instead:
replace the first unwrap by handling the Result from model.embed (using ? to
return an Err from the enclosing function or match/if let to log and return an
error) and replace the second unwrap by checking for Some(value) from
query_vec.into_iter().next() (e.g., match or if let Some(vec) { ... } else {
return Err(...) } ), ensuring the surrounding function signature returns a
Result or otherwise returns a controlled error path; reference model.embed,
query_vec, and the into_iter().next() usage when making these changes.

In `@templates/search.html`:
- Around line 48-51: The export anchor in templates/search.html currently builds
its href with only search and offset and therefore omits the search_type
parameter; update the export link generation (the <a> element with title
"...导出") to include &search_type={{ search_type }} so the exported results
respect the current search mode (keep using the existing variables search,
offset and search_type).

🧹 Nitpick comments (5)

src/config.rs (1)

15-16: Consider adding a serde default for smoother feature adoption.

When users enable the vsearch feature, deserialization will fail if qdrant_grpc is missing from their existing config.toml. Adding a default value would provide a better experience:
♻️ Suggested improvement
     #[cfg(feature = "vsearch")]
+    #[serde(default = "default_qdrant_grpc")]
     pub qdrant_grpc: String,
 }
+
+#[cfg(feature = "vsearch")]
+fn default_qdrant_grpc() -> String {
+    "http://localhost:6334".to_owned()
+}

src/bin/main.rs (1)

36-39: Consider adding context to the panic message for Qdrant initialization.

Using unwrap() is acceptable for fail-fast behavior at startup (consistent with other initialization code), but adding context via expect() would help with debugging configuration issues.
💡 Suggested improvement
     #[cfg(feature = "vsearch")]
     let qclient = Qdrant::from_url(CONFIG.qdrant_grpc.as_str())
         .build()
-        .unwrap();
+        .expect("Failed to build Qdrant client");

src/controller.rs (3)

83-88: Minor: The cfg! macro evaluates at compile time, so runtime check is partially redundant.

When vsearch feature is disabled, cfg!(feature = "vsearch") is false, so the condition short-circuits. This is fine but could be simplified.

💡 Alternative for clarity

-    let search_type =
-        if cfg!(feature = "vsearch") && input.search_type.as_deref() == Some("vsearch") {
-            "vsearch".to_owned()
-        } else {
-            "keyword".to_owned()
-        };
+    #[cfg(feature = "vsearch")]
+    let search_type = match input.search_type.as_deref() {
+        Some("vsearch") => "vsearch".to_owned(),
+        _ => "keyword".to_owned(),
+    };
+    #[cfg(not(feature = "vsearch"))]
+    let search_type = "keyword".to_owned();

141-151: Consider using a more defensive approach for point ID extraction.

The nested unwrap() calls on lines 143-148 could panic if a point has no ID. While this should not happen in practice, defensive coding would prevent crashes.

💡 Safer extraction

                         for point in &search_result.result {
-                            let id = point
-                                .id
-                                .as_ref()
-                                .unwrap()
-                                .point_id_options
-                                .as_ref()
-                                .unwrap();
-                            if let PointIdOptions::Num(id) = id {
-                                ids.insert(*id as u32);
+                            if let Some(point_id) = &point.id {
+                                if let Some(PointIdOptions::Num(id)) = &point_id.point_id_options {
+                                    ids.insert(*id as u32);
+                                }
                             }
                         }

127-127: Remove redundant limit computation in SearchPointsBuilder.

search_limit is computed as limit + offset and passed to the constructor, but then immediately overridden by .limit(limit). Pass limit directly to the constructor instead and remove the redundant .limit() call.

♻️ Suggested simplification

                     let client = state.qclient;
-                    let search_limit = limit + offset;
                     if let Ok(search_result) = client
                         .search_points(
                             SearchPointsBuilder::new(
                                 "cases",
                                 query_vec.into_iter().next().unwrap(),
-                                search_limit as u64,
+                                limit as u64,
                             )
                             .with_payload(false)
-                            .limit(limit as u64)
                             .offset(offset as u64),
                         )

coderabbitai · 2026-01-17T09:01:43Z

src/controller.rs

+        } else {
+            #[cfg(feature = "vsearch")]
+            if search_type == "vsearch" {
+                {
+                    let mut model = TextEmbedding::try_new(
+                        InitOptions::new(EmbeddingModel::BGESmallZHV15)
+                            .with_show_download_progress(true),
+                    )
+                    .unwrap();
+                    let query_vec = model.embed(vec![&search], None).unwrap();
+
+                    let client = state.qclient;
+                    let search_limit = limit + offset;
+                    if let Ok(search_result) = client
+                        .search_points(
+                            SearchPointsBuilder::new(
+                                "cases",
+                                query_vec.into_iter().next().unwrap(),
+                                search_limit as u64,
+                            )
+                            .with_payload(false)
+                            .limit(limit as u64)
+                            .offset(offset as u64),
+                        )
+                        .await
+                    {
+                        for point in &search_result.result {
+                            let id = point
+                                .id
+                                .as_ref()
+                                .unwrap()
+                                .point_id_options
+                                .as_ref()
+                                .unwrap();
+                            if let PointIdOptions::Num(id) = id {
+                                ids.insert(*id as u32);
+                            }
+                        }
+                    } else {
+                        tracing::error!("Qdrant search_points failed");
+                    }
+                }
            }


⚠️ Potential issue | 🟡 Minor

total is never populated for vsearch, breaking pagination display.

For keyword search, total is set on line 98. For vsearch, total remains 0. This will likely confuse users and break pagination UI. Qdrant can return the total count if you need it.

Also, line 117 has a redundant check—search_type == "vsearch" is always true in this branch since we're in the else of search_type == "keyword".

💡 Fix for redundant check

} else { #[cfg(feature = "vsearch")] - if search_type == "vsearch" { { // vsearch logic... } - } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

} else {

#[cfg(feature = "vsearch")]

if search_type == "vsearch" {

{

let mut model = TextEmbedding::try_new(

InitOptions::new(EmbeddingModel::BGESmallZHV15)

.with_show_download_progress(true),

)

.unwrap();

let query_vec = model.embed(vec![&search], None).unwrap();

let client = state.qclient;

let search_limit = limit + offset;

if let Ok(search_result) = client

.search_points(

SearchPointsBuilder::new(

"cases",

query_vec.into_iter().next().unwrap(),

search_limit as u64,

)

.with_payload(false)

.limit(limit as u64)

.offset(offset as u64),

)

.await

{

for point in &search_result.result {

let id = point

.id

.as_ref()

.unwrap()

.point_id_options

.as_ref()

.unwrap();

if let PointIdOptions::Num(id) = id {

ids.insert(*id as u32);

}

}

} else {

tracing::error!("Qdrant search_points failed");

}

}

}

} else {

#[cfg(feature = "vsearch")]

{

{

let mut model = TextEmbedding::try_new(

InitOptions::new(EmbeddingModel::BGESmallZHV15)

.with_show_download_progress(true),

)

.unwrap();

let query_vec = model.embed(vec![&search], None).unwrap();

let client = state.qclient;

let search_limit = limit + offset;

if let Ok(search_result) = client

.search_points(

SearchPointsBuilder::new(

"cases",

query_vec.into_iter().next().unwrap(),

search_limit as u64,

)

.with_payload(false)

.limit(limit as u64)

.offset(offset as u64),

)

.await

{

for point in &search_result.result {

let id = point

.id

.as_ref()

.unwrap()

.point_id_options

.as_ref()

.unwrap();

if let PointIdOptions::Num(id) = id {

ids.insert(*id as u32);

}

}

} else {

tracing::error!("Qdrant search_points failed");

}

}

}

coderabbitai · 2026-01-17T09:01:43Z

src/controller.rs

+                    let mut model = TextEmbedding::try_new(
+                        InitOptions::new(EmbeddingModel::BGESmallZHV15)
+                            .with_show_download_progress(true),
+                    )
+                    .unwrap();


⚠️ Potential issue | 🟠 Major

Performance: Embedding model is initialized on every request.

TextEmbedding::try_new() loads model weights and is expensive (potentially hundreds of milliseconds or more). This should be initialized once at startup and shared via AppState, similar to how searcher is handled.

🔧 Recommended approach

Initialize the embedding model once at startup in main.rs and add it to AppState:

// In AppState (src/lib.rs): #[cfg(feature = "vsearch")] pub embedding_model: Arc<Mutex<TextEmbedding>>, // In main.rs: #[cfg(feature = "vsearch")] let embedding_model = Arc::new(Mutex::new( TextEmbedding::try_new( InitOptions::new(EmbeddingModel::BGESmallZHV15) .with_show_download_progress(true), ).expect("Failed to initialize embedding model") ));

Then in the controller, use state.embedding_model.lock().unwrap().embed(...).

🤖 Prompt for AI Agents

In `@src/controller.rs` around lines 119 - 123, The embedding model is being constructed per-request via TextEmbedding::try_new(...) (with InitOptions and EmbeddingModel::BGESmallZHV15) which is expensive; instead initialize the TextEmbedding once at startup, add it to AppState (e.g. an Arc<Mutex<TextEmbedding>> field), create it in main.rs during bootstrap, and then in controller replace the per-request TextEmbedding::try_new call with using state.embedding_model.lock().unwrap().embed(...) (or equivalent locking) to reuse the single shared model instance.

coderabbitai · 2026-01-17T09:01:43Z

src/controller.rs

+                            .with_show_download_progress(true),
+                    )
+                    .unwrap();
+                    let query_vec = model.embed(vec![&search], None).unwrap();


⚠️ Potential issue | 🟠 Major

Multiple unwrap() calls can cause panics on malformed data.

Line 124: model.embed(...).unwrap() will panic if embedding fails.
Line 132: query_vec.into_iter().next().unwrap() will panic if the vector is empty.

Consider graceful error handling.

🔧 Suggested fix with error handling

- let query_vec = model.embed(vec![&search], None).unwrap(); + let query_vec = match model.embed(vec![&search], None) { + Ok(v) => v, + Err(e) => { + tracing::error!("Embedding failed: {e}"); + return into_response(&SearchPage { + search, + search_type, + offset, + cases: vec![], + total: 0, + enable_vsearch: true, + }); + } + }; + + let Some(embedding) = query_vec.into_iter().next() else { + tracing::error!("Empty embedding result"); + return into_response(&SearchPage { /* ... */ }); + };

Also applies to: 132-132

🤖 Prompt for AI Agents

In `@src/controller.rs` at line 124, The code currently calls model.embed(vec![&search], None).unwrap() and later uses query_vec.into_iter().next().unwrap(), which can panic on embed failures or empty results; change these to propagate or handle errors instead: replace the first unwrap by handling the Result from model.embed (using ? to return an Err from the enclosing function or match/if let to log and return an error) and replace the second unwrap by checking for Some(value) from query_vec.into_iter().next() (e.g., match or if let Some(vec) { ... } else { return Err(...) } ), ensuring the surrounding function signature returns a Result or otherwise returns a controlled error path; reference model.embed, query_vec, and the into_iter().next() usage when making these changes.

templates/search.html

…t empty

vsearch

bf8c975

coderabbitai bot reviewed Jan 17, 2026

View reviewed changes

cncases added 2 commits January 17, 2026 09:14

Add search_type parameter to export link in search.html

9754d03

Add condition to suggest semantic search only when search input is no…

edac046

…t empty

cncases merged commit 9cf3950 into main Jan 17, 2026
5 of 6 checks passed

cncases deleted the vsearch branch January 17, 2026 09:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vsearch #174

vsearch #174

Uh oh!

cncases commented Jan 17, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 17, 2026 •

edited

Loading

Review failed

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 17, 2026

Uh oh!

coderabbitai bot Jan 17, 2026

Uh oh!

coderabbitai bot Jan 17, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vsearch #174

vsearch #174

Uh oh!

Conversation

cncases commented Jan 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cncases commented Jan 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 17, 2026 •

edited

Loading