Skip to content

Add Search and AI Recommendations skill MCP-423#9

Open
AyaanH123 wants to merge 7 commits intomongodb:mainfrom
AyaanH123:add-search-ai-recommendations-skill
Open

Add Search and AI Recommendations skill MCP-423#9
AyaanH123 wants to merge 7 commits intomongodb:mainfrom
AyaanH123:add-search-ai-recommendations-skill

Conversation

@AyaanH123
Copy link
Collaborator

No description provided.

- Rewrite description to be more concise and natural (634 → 506 chars)
- Remove hedging language (consider → review, may have → if users have)
- Trim verbose explanations in Core Principles, Discovery Phase, and Optimization sections
- Consolidate all reference file mentions into Common Use Cases section
- Simplify edge case descriptions and execution steps
- Move evals to testing directory
- Results: 247 fewer tokens (12,010 → 11,763), instruction specificity improved 0.68 → 0.86
@AyaanH123 AyaanH123 force-pushed the add-search-ai-recommendations-skill branch from d37d575 to b3293ba Compare March 11, 2026 15:55
Copy link

@awjian awjian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a first pass, few recommendations and corrections for accuracy

Comment on lines +29 to +33
numCandidates: 150,
limit: 50 // Get more candidates
}
},
{ $limit: 20 }, // But only keep top 20 for merging
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$vectorSearch.limit: 50 and $limit: 20 is redundant

Suggested change
numCandidates: 150,
limit: 50 // Get more candidates
}
},
{ $limit: 20 }, // But only keep top 20 for merging
numCandidates: 150, // Get more candidates
limit: 20
}
}

pipeline: [
{ $search: { /* ... */ } },
{ $limit: 20 } // Also limit lexical results
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how comprehensive you want to be but you need to have the same $addFields for the searchScore here

limit: 20
}
},
{ $limit: 10 },
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be after $rankFusion

Comment on lines +84 to +106
**Or use MongoDB's built-in score fusion:**
```javascript
db.collection.aggregate([
{
$vectorSearch: { /* ... */ }
},
{ $group: { _id: null, docs: { $push: "$$ROOT" } } },
{ $unwind: "$docs" },
{ $replaceRoot: { newRoot: "$docs" } },
{
$unionWith: {
coll: "collection",
pipeline: [{ $search: { /* ... */ } }]
}
},
{
$group: {
_id: "$_id",
maxScore: { $max: "$score" },
doc: { $first: "$$ROOT" }
}
}
])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this example doesn't use $scoreFusion?


### Limiting Results Between Stages

In hybrid search, limit results from each pipeline before combining:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you use $rankFusion for these instead of $unionWith?

}
```

**Decision guide:**
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prakul has done a lot of thinking on this for auto-embedding, i recommend checking with him on these guidelines

Comment on lines +191 to +216
### 2. Lexical Prefilters (Fast - Text Search Criteria)

Pre-filtering using full-text search before vector similarity computation.

**When to use:**
- Text search criteria ("documents that mention 'security'")
- Fuzzy text matching needed
- Complex text queries (phrase matching, wildcards)

**Performance:** ⚡⚡ Fast - filters before similarity, but text search has overhead

**Example:**
```javascript
db.collection.aggregate([
{
$vectorSearch: {
queryVector: [...],
path: "embedding",
filter: {
text: {
query: "security",
path: "description"
}
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is incorrect, lexical prefilters is supported by $search, not $vectorSearch

Comment on lines +286 to +289
embeddingParameters: {
model: "voyage-4",
outputDimension: 512 // 256, 512, 1024, 2048, 4096
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure this is hallucinated

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

embedding parameters are set at index time, not query time


---

## Similar Items (moreLikeThis)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this, not frequently used operator

- `minimum`: Use lowest score from array elements
- `mean`: Average scores from array elements

**Key considerations:**
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

performance can be degraded due to complexity of parent-child joins

@nirinchev nirinchev changed the title Add Search and AI Recommendations skill Add Search and AI Recommendations skill MCP-423 Mar 12, 2026
- Rename skill directory from search-and-ai-recommendations to search-and-ai
- Replace reference files with focused, topic-specific files: lexical-search-indexing.md, lexical-search-querying.md, vector-search.md, hybrid-search.md
- Update SKILL.md to direct agent to the correct reference file(s) per task rather than providing inline schemas
- Initial skill scope excludes Voyage API integration, auto-embedding, and reranking
@AyaanH123 AyaanH123 requested review from a team as code owners March 17, 2026 13:09
@AyaanH123 AyaanH123 requested a review from awjian March 17, 2026 13:12
Copy link
Collaborator

@dacharyc dacharyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @AyaanH123 - overall this looks good, but I have a handful of suggestions for SKILL.md to move content out of the main skill body where it is only applicable to one of the search types, and a few similar optimizations we might make.

I'll follow up with specific suggestions related to the references.

Copy link
Collaborator

@dacharyc dacharyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handful more comments across the reference files, which I didn't get to in the initial review.


---

## Query Optimization
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was there sections on $sort and $match after $search here?

Copy link
Collaborator

@dacharyc dacharyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for incorporating my feedback, @AyaanH123 ! I've done another pass here and it looks like you've successfully addressed everything I raised, and nothing new has been introduced, so I'm feeling good about this from my side.

If you end up making more substantive changes as a result of awjian's pending feedback, I'll probably want to make one final pass across those files, but I don't anticipate anything blocking so I'll go ahead and ✅ here for expediency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants