Add Search and AI Recommendations skill MCP-423#9
Add Search and AI Recommendations skill MCP-423#9AyaanH123 wants to merge 7 commits intomongodb:mainfrom
Conversation
- Rewrite description to be more concise and natural (634 → 506 chars) - Remove hedging language (consider → review, may have → if users have) - Trim verbose explanations in Core Principles, Discovery Phase, and Optimization sections - Consolidate all reference file mentions into Common Use Cases section - Simplify edge case descriptions and execution steps - Move evals to testing directory - Results: 247 fewer tokens (12,010 → 11,763), instruction specificity improved 0.68 → 0.86
d37d575 to
b3293ba
Compare
awjian
left a comment
There was a problem hiding this comment.
did a first pass, few recommendations and corrections for accuracy
| numCandidates: 150, | ||
| limit: 50 // Get more candidates | ||
| } | ||
| }, | ||
| { $limit: 20 }, // But only keep top 20 for merging |
There was a problem hiding this comment.
$vectorSearch.limit: 50 and $limit: 20 is redundant
| numCandidates: 150, | |
| limit: 50 // Get more candidates | |
| } | |
| }, | |
| { $limit: 20 }, // But only keep top 20 for merging | |
| numCandidates: 150, // Get more candidates | |
| limit: 20 | |
| } | |
| } |
| pipeline: [ | ||
| { $search: { /* ... */ } }, | ||
| { $limit: 20 } // Also limit lexical results | ||
| ] |
There was a problem hiding this comment.
not sure how comprehensive you want to be but you need to have the same $addFields for the searchScore here
| limit: 20 | ||
| } | ||
| }, | ||
| { $limit: 10 }, |
| **Or use MongoDB's built-in score fusion:** | ||
| ```javascript | ||
| db.collection.aggregate([ | ||
| { | ||
| $vectorSearch: { /* ... */ } | ||
| }, | ||
| { $group: { _id: null, docs: { $push: "$$ROOT" } } }, | ||
| { $unwind: "$docs" }, | ||
| { $replaceRoot: { newRoot: "$docs" } }, | ||
| { | ||
| $unionWith: { | ||
| coll: "collection", | ||
| pipeline: [{ $search: { /* ... */ } }] | ||
| } | ||
| }, | ||
| { | ||
| $group: { | ||
| _id: "$_id", | ||
| maxScore: { $max: "$score" }, | ||
| doc: { $first: "$$ROOT" } | ||
| } | ||
| } | ||
| ]) |
|
|
||
| ### Limiting Results Between Stages | ||
|
|
||
| In hybrid search, limit results from each pipeline before combining: |
There was a problem hiding this comment.
shouldn't you use $rankFusion for these instead of $unionWith?
| } | ||
| ``` | ||
|
|
||
| **Decision guide:** |
There was a problem hiding this comment.
prakul has done a lot of thinking on this for auto-embedding, i recommend checking with him on these guidelines
| ### 2. Lexical Prefilters (Fast - Text Search Criteria) | ||
|
|
||
| Pre-filtering using full-text search before vector similarity computation. | ||
|
|
||
| **When to use:** | ||
| - Text search criteria ("documents that mention 'security'") | ||
| - Fuzzy text matching needed | ||
| - Complex text queries (phrase matching, wildcards) | ||
|
|
||
| **Performance:** ⚡⚡ Fast - filters before similarity, but text search has overhead | ||
|
|
||
| **Example:** | ||
| ```javascript | ||
| db.collection.aggregate([ | ||
| { | ||
| $vectorSearch: { | ||
| queryVector: [...], | ||
| path: "embedding", | ||
| filter: { | ||
| text: { | ||
| query: "security", | ||
| path: "description" | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
this is incorrect, lexical prefilters is supported by $search, not $vectorSearch
| embeddingParameters: { | ||
| model: "voyage-4", | ||
| outputDimension: 512 // 256, 512, 1024, 2048, 4096 | ||
| } |
There was a problem hiding this comment.
embedding parameters are set at index time, not query time
|
|
||
| --- | ||
|
|
||
| ## Similar Items (moreLikeThis) |
There was a problem hiding this comment.
remove this, not frequently used operator
| - `minimum`: Use lowest score from array elements | ||
| - `mean`: Average scores from array elements | ||
|
|
||
| **Key considerations:** |
There was a problem hiding this comment.
performance can be degraded due to complexity of parent-child joins
- Rename skill directory from search-and-ai-recommendations to search-and-ai - Replace reference files with focused, topic-specific files: lexical-search-indexing.md, lexical-search-querying.md, vector-search.md, hybrid-search.md - Update SKILL.md to direct agent to the correct reference file(s) per task rather than providing inline schemas - Initial skill scope excludes Voyage API integration, auto-embedding, and reranking
dacharyc
left a comment
There was a problem hiding this comment.
Hey @AyaanH123 - overall this looks good, but I have a handful of suggestions for SKILL.md to move content out of the main skill body where it is only applicable to one of the search types, and a few similar optimizations we might make.
I'll follow up with specific suggestions related to the references.
dacharyc
left a comment
There was a problem hiding this comment.
Handful more comments across the reference files, which I didn't get to in the initial review.
|
|
||
| --- | ||
|
|
||
| ## Query Optimization |
There was a problem hiding this comment.
was there sections on $sort and $match after $search here?
dacharyc
left a comment
There was a problem hiding this comment.
Thanks for incorporating my feedback, @AyaanH123 ! I've done another pass here and it looks like you've successfully addressed everything I raised, and nothing new has been introduced, so I'm feeling good about this from my side.
If you end up making more substantive changes as a result of awjian's pending feedback, I'll probably want to make one final pass across those files, but I don't anticipate anything blocking so I'll go ahead and ✅ here for expediency.
No description provided.