MongoDB Query Optimizer Skill MCP-421 by davidjahn · Pull Request #6 · mongodb/agent-skills

davidjahn · 2026-03-06T08:38:05Z

MongoDB Agent Skill Submission

Skill Information

Skill Name: mongodb-query-optimizer
Skill Directory: skills/mongodb-query-optimizer

Use Case

This skill will trigger in a few cases when explicitly requested:

User asks to optimize performance of a query
- Ensure query is indexed and following best practice
- Leverage collection-indexes MCP server command to suggest using an existing index or create a new one
User asks to debug a slow or badly performing query or cluster
- Leverage atlas-get-performance-advisor to make suggestions based on Atlas Performance Advisor results from a cluster
- Ensure queries are indexed and following best practice

This skill will not perform write operations against a cluster (such as creating or deleting an index), however will make suggestions to the user to create such indexes, or modify the structure of a query that is being created.

Value Proposition

When a user is writes a query or aggregation, either explicitly ("write me a MongoDB query to fetch documents with a particular user id") or implicitly ("modify my application to display all comments from a particular user on their profile page") this skill should ensure best practices for optimizing queries.

This most critically includes ensuring a query is indexed, but also includes:

Debugging existing slow queries
Following practice for aggregate pipelines

Special Considerations

This skill will optionally need an MCP server with the MongoDB cluster's connection string configured.
It will also optionally need the MCP server configured with Atlas access for the cluster to use specialized tools like Atlas Performance Advisor.

Validation Prompts

(with MCP server configured, skill loaded, and application codebase loaded in IDE)

/mongodb-query-optimizer how can i optimize the slow query logs in the cluster for this application?

/mongodb-query-optimizer could you investigate the aggregate pipelines in this application and suggest if there are any places for optimization?

(general query questions)

  I ran this query and it's really slow:
  db.orders.find({ status: "pending", customerId: ObjectId("..."), orderDate: { $gte: ISODate("2024-01-01") } }).sort({ orderDate: -1 })

  Can you help me optimize it? I have indexes on status and orderDate separately.

  My aggregation pipeline is timing out with large datasets:

  db.users.aggregate([
    { $lookup: { from: "orders", localField: "_id", foreignField: "userId", as: "orders" } },
    { $unwind: "$orders" },
    { $match: { "orders.status": "completed", region: "US" } },
    { $group: { _id: "$region", totalRevenue: { $sum: "$orders.amount" } } }
  ])

  How can I optimize this?

  I have a query on an array field that's not performing well:

  db.products.find({
    tags: { $in: ["electronics", "sale", "featured"] },
    "specs.features": "wifi"
  }).sort({ price: 1 })

  The tags array can have 50-100 values per product. What's the best way to index this?

Author Self-Validation

Verified structure and format requirements
Tested skill with LLMs using the five prompts above
Checked content quality and novelty
Validated all links and file references
(Optional but recommended) Ran skill-validator locally

skill-validator output

I executed /skill-validator mongodb-query-optimizer using Claude Sonnet 4.5 (model ID: us.anthropic.claude-sonnet-4-5-20250929-v1:0) running on AWS Bedrock with the skill-validator-ent tool on my local machine.

Review Summary: mongodb-query-optimizer

1. Structural Validation

✅ PASSED — No errors or warnings

The skill has:

Valid frontmatter (name, description, compatibility)
Clean markdown structure (no unclosed code fences)
All reference files properly linked
Valid external link (MongoDB MCP Server docs: HTTP 200)
Token count: 9,142 total
- SKILL.md: 2,390 tokens
- References: 6,752 tokens (aggregation: 1,252 | covered-queries: 1,416 | explain: 2,270 | multikey: 1,814)
Instruction specificity: 0.71 (excellent)
Medium contamination level (expected for MongoDB-specific tool)

2. SKILL.md Scores

Overall: 4.0 / 5.0 — Strong, production-ready quality

Dimension	Score	Meaning
Clarity	4	Good — well-structured with precise MCP instructions
Actionability	4	Good — concrete workflow with end-to-end example
Token Efficiency	3	Adequate — some verbosity in tables but justified
Scope Discipline	5	Excellent — tightly scoped to query optimization
Directive Precision	4	Good — precise directives with explicit conditional gates
Novelty	4	Good — proprietary MCP integration details

LLM Assessment: "Clear, well-structured skill with precise MCP tool invocation instructions and good optimization workflow. Some verbosity in tables and examples could be trimmed, but overall actionable and tightly scoped to MongoDB query optimization."

3. Reference File Scores

Reference Aggregate: 4.4 / 5.0 — Excellent supporting documentation

File	Overall	Clarity	Instructional	Token Eff	Novelty	Relevance
aggregation-optimization.md	4.6	5	5	5	3	5
multikey-arrays.md	4.4	5	5	4	3	5
explain-interpretation.md	4.4	5	5	4	3	5
covered-queries.md	4.2	5	5	4	2	5

Key strengths:

Perfect 5/5 on clarity and instructional value across all references
Perfect 5/5 on skill relevance — every reference directly supports optimization
Concrete, actionable examples throughout
Well-structured for agent consumption

4. Novelty Assessment

✅ Novelty score: 3.4 (threshold: 3.0)

Mean novelty: (4 + 3 + 2 + 3 + 3) / 5 = 3.0 → Aggregate 3.4 (weighted by importance)

The skill provides valuable proprietary content that justifies its context window cost.

Novel Information Identified:

SKILL.md (novelty: 4):

Custom MCP tool invocation pattern — Exact parameter structure (server, toolName, arguments as separate params, not nested) and server naming convention (user-MongoDB)
Internal reference file system — Organization-specific knowledge files and their lazy-loading workflow

Reference files:

aggregation-optimization.md (novelty: 3):

internalQueryExecMaxBlockingSortBytes parameter (MongoDB 4.2+)
Specific stage distribution on sharded clusters (shard-local vs merge)
Atlas tier upgrade operational practices

multikey-arrays.md (novelty: 3):

$expr with $size syntax for partial filters on array sizes
Compound multikey restriction error behavior and workarounds
Deduplication post-processing behavior details
ESR rule application nuances with multikey indexes

explain-interpretation.md (novelty: 3):

internalQueryExecMaxBlockingSortBytes for 32MB sort limit (MongoDB 3.4+)
Specific stage names (SORT_KEY_GENERATOR, PROJECTION_* types, SHARDING_FILTER)
Query targeting ratio interpretation thresholds

covered-queries.md (novelty: 2):

Standard MongoDB covered query concepts
Well-organized but public knowledge

5. Content Review

Check	Status	Details
Examples	✅ Excellent	Complete end-to-end workflow (lines 34-51), trigger phrases, MCP structures
Edge cases	✅ Good	MCP availability branching, verbosity selection (<10s guideline), multikey/sort limitations
Scope-gating	✅ Excellent	Clear invocation criteria, "Determine query type first" with "Always" directives, explicit boundaries
MongoDB data access	✅ Excellent	Comprehensive MCP server instructions with exact tool names and formats

6. Recommendation

✅ READY TO PUBLISH — Excellent quality, publication-ready

The skill demonstrates:

Strengths:

✅ Perfect structural validation (no errors/warnings)
✅ Excellent scope discipline (5.0/5.0)
✅ High instruction specificity (0.71)
✅ Complete end-to-end example workflow
✅ Explicit conditional gates with "Always" directives
✅ Comprehensive MCP integration instructions
✅ Four high-quality reference files (4.4 aggregate score)
✅ Novel proprietary content (3.4 novelty score)

Quality metrics:

SKILL.md overall: 4.0/5.0 (strong)
Reference aggregate: 4.4/5.0 (excellent)
Novelty: 3.4/5.0 (above threshold)
All content checks: Pass

The token count (9,142) is well within acceptable range, and the instruction specificity (0.71) indicates agents will be able to follow directives precisely. The skill successfully balances comprehensive guidance with actionable specificity.

The skill is production-ready and recommended for publication.

SME Review

SME: @asya999

betsybutton · 2026-03-06T20:33:44Z

skills/mongodb-query-optimizer/SKILL.md

+
+## Out of scope (for this skill)
+
+- Performance Advisor / slow query logs


@davidjahn Any particular reason to exclude Performance Advisor from this skill? It's already possible to access PA recommendations from the MCP server if the user has configured MCP with an Atlas service account

Main reason is I'm having trouble getting Performance Advisor access to work w/ the MongoDB MCP server.
I'm planning to add Atlas Performance Advisor support in a fast followup PR. Wanted to start with something simple that is working

asya999 · 2026-03-09T15:32:48Z

skills/mongodb-query-optimizer/SKILL.md

+
+3. **Match or suggest an index**
+   - **If an existing index can support the query**  
+     - Prefer equality on indexed fields, then range, then sort; compound indexes follow left-prefix.  


you mean then sort and then range, right? (absent other info)

yes will fix, planning to finalize this initial PR (for only index suggestion support) today

asya999 · 2026-03-09T15:33:13Z

skills/mongodb-query-optimizer/SKILL.md

+3. **Match or suggest an index**
+   - **If an existing index can support the query**  
+     - Prefer equality on indexed fields, then range, then sort; compound indexes follow left-prefix.  
+     - Shape the query so the planner can use that index (e.g. filter/sort order consistent with the index key).  


sort order, yes but filter order doesn't matter...

yes will fix this

asya999 · 2026-03-09T15:39:33Z

skills/mongodb-query-optimizer/SKILL.md

+- **Server**: `user-MongoDB`  
+- **Arguments**: `database` (string), `collection` (string)  
+- **Use**: Call after you know the target db and collection from the user's query or code. Use the returned `classicIndexes[].key` to see index key patterns; ignore search indexes unless the query is a search query.
+


what about various improvements for aggregations or find based on the actual query - things like suggesting missing projections for find or pointing out inefficiencies in aggregation pipelines? sort without limit, that sort of thing?

i'm limiting scope of this initial PR to only suggesting indexes on query

i will add query structure suggestions in a followup PR this week if i have time (looking like probably i will have time to do this before EOW)

asya999

left a few comments

Copilot

Pull request overview

Adds a new mongodb-query-optimizer agent skill intended to trigger only on explicit performance/indexing requests, and provides reference material + a small eval workspace to validate correct invocation/boundary behavior.

Changes:

Introduces skills/mongodb-query-optimizer/SKILL.md with MCP branching guidance for collection-indexes, explain, and Atlas Performance Advisor.
Adds optimization reference docs (aggregation, explain interpretation, covered queries, multikey/arrays).
Adds a testing workspace + eval registry with iteration-1 cases for invocation/boundary checks.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 22 comments.

Show a summary per file

File	Description
`skills/mongodb-query-optimizer/SKILL.md`	New skill definition and workflow (invocation rules, MCP tool usage, index principles).
`skills/mongodb-query-optimizer/references/aggregation-optimization.md`	New aggregation optimization reference.
`skills/mongodb-query-optimizer/references/covered-queries.md`	New covered-query reference.
`skills/mongodb-query-optimizer/references/explain-interpretation.md`	New explain-output interpretation reference.
`skills/mongodb-query-optimizer/references/multikey-arrays.md`	New multikey/array query reference.
`testing/mongodb-query-optimizer/evals/evals.json`	Eval registry describing the iteration-1 cases.
`testing/mongodb-query-optimizer/mongodb-query-optimizer-workspace/README.md`	Documentation for how to run the workspace evals.
`testing/mongodb-query-optimizer/mongodb-query-optimizer-workspace/iteration-1/benchmark.json`	Benchmark metadata for iteration-1.
`testing/mongodb-query-optimizer/mongodb-query-optimizer-workspace/iteration-1/why-slow/eval_metadata.json`	Case metadata for a “why is this slow” prompt.
`testing/mongodb-query-optimizer/mongodb-query-optimizer-workspace/iteration-1/slow-queries-cluster/eval_metadata.json`	Case metadata for “slow queries on my cluster” prompt.
`testing/mongodb-query-optimizer/mongodb-query-optimizer-workspace/iteration-1/optimize-this-query/eval_metadata.json`	Case metadata for optimizing a simple find+sort.
`testing/mongodb-query-optimizer/mongodb-query-optimizer-workspace/iteration-1/optimize-aggregation-pipeline/eval_metadata.json`	Case metadata for optimizing an aggregation pipeline.
`testing/mongodb-query-optimizer/mongodb-query-optimizer-workspace/iteration-1/negative-routine-find/eval_metadata.json`	Negative/boundary case metadata.
`testing/mongodb-query-optimizer/mongodb-query-optimizer-workspace/iteration-1/how-to-index/eval_metadata.json`	Case metadata for “how to index this $match” prompt.
`testing/mongodb-query-optimizer/mongodb-query-optimizer-workspace/iteration-1/fix-slow-queries/eval_metadata.json`	Case metadata for general slow-query help prompt.
`testing/mongodb-query-optimizer/mongodb-query-optimizer-workspace/iteration-1/explain-interpretation/eval_metadata.json`	Case metadata for explain interpretation prompt.
`testing/mongodb-query-optimizer/mongodb-query-optimizer-workspace/iteration-1/elemMatch-array-query/eval_metadata.json`	Case metadata for array correctness/perf prompt.
`testing/mongodb-query-optimizer/mongodb-query-optimizer-workspace/iteration-1/covered-query-optimization/eval_metadata.json`	Case metadata for covered-query optimization prompt.
`testing/mongodb-query-optimizer/mongodb-query-optimizer-workspace/iteration-1/array-field-query/eval_metadata.json`	Case metadata for array-field query optimization prompt.
`testing/mongodb-query-optimizer/mongodb-query-optimizer-workspace/iteration-1/aggregation-unwind-performance/eval_metadata.json`	Case metadata for `$unwind` performance prompt.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-12T21:02:13Z

skills/mongodb-query-optimizer/references/aggregation-optimization.md

+Aggregation stages have a 100MB memory limit per stage (MongoDB 4.2+, configurable with `internalQueryExecMaxBlockingSortBytes`). When exceeded:
+- Without `allowDiskUse`: pipeline fails
+- With `allowDiskUse: true`: spills to disk (slower but completes)
+
+**Use allowDiskUse when:**
+- Processing large datasets that exceed memory limits
+- $group or $sort operations on high-cardinality data
+- Acceptable to trade speed for completion
+
+**Better solutions:**
+- Filter more aggressively early in pipeline
+- Add indexes to enable `$sort` to use index order
+- Increase available memory (Atlas tier upgrade)
+- Consider materialized views for repeated aggregations


internalQueryExecMaxBlockingSortBytes is specifically a blocking sort limit; it doesn’t represent a universal “per stage” aggregation memory limit (e.g., $group has separate limits/knobs). Please adjust this text so it doesn’t point readers at the wrong setting and instead focuses on allowDiskUse + early filtering/indexing.

Suggested change

Aggregation stages have a 100MB memory limit per stage (MongoDB 4.2+, configurable with `internalQueryExecMaxBlockingSortBytes`). When exceeded:

- Without `allowDiskUse`: pipeline fails

- With `allowDiskUse: true`: spills to disk (slower but completes)

**Use allowDiskUse when:**

- Processing large datasets that exceed memory limits

- $group or $sort operations on high-cardinality data

- Acceptable to trade speed for completion

**Better solutions:**

- Filter more aggressively early in pipeline

- Add indexes to enable `$sort` to use index order

- Increase available memory (Atlas tier upgrade)

- Consider materialized views for repeated aggregations

Some aggregation operations (such as blocking `$sort` and certain `$group` patterns) have a 100MB in-memory limit per operation. When an operation exceeds its in-memory limit:

- Without `allowDiskUse`: the pipeline (or stage) fails

- With `allowDiskUse: true`: the operation can spill to disk (slower but completes)

**Use allowDiskUse when:**

- Processing large datasets that may exceed in-memory limits

- `$group` or `$sort` operations on high-cardinality or wide documents

- It is acceptable to trade speed for completion

**Better solutions:**

- Filter more aggressively early in the pipeline to reduce the working set

- Add or tune indexes so `$match` and `$sort` can use index order instead of in-memory sorts

- Increase available memory (for example, via an Atlas tier upgrade)

- Consider materialized views or pre-aggregated collections for repeated aggregations

Copilot · 2026-03-12T21:02:13Z

.../mongodb-query-optimizer-workspace/iteration-1/covered-query-optimization/eval_metadata.json

+{
+  "id": "covered-query-optimization",
+  "prompt": "I want to make this query as fast as possible without reading documents. Query: db.users.find({ status: 'active', plan: 'premium' }, { status: 1, plan: 1, email: 1, _id: 0 }).sort({ email: 1 }). Can this be a covered query?",
+  "expect_skill_invocation": true,


The workspace README and testing/mongodb-query-optimizer/evals/evals.json use expect_invoke, but this case metadata uses expect_skill_invocation. If your eval harness keys off expect_invoke, these cases won’t be asserted correctly. Please standardize the field name across the registry + per-case eval_metadata.json (either rename here to expect_invoke or update the registry/docs/harness to match).

Suggested change

"expect_skill_invocation": true,

"expect_invoke": true,

Copilot · 2026-03-12T21:02:14Z

testing/mongodb-query-optimizer/mongodb-query-optimizer-workspace/README.md

+- **Invocation**: Cases with `expect_invoke: true` should trigger the mongodb-query-optimizer skill (explicit optimization / performance / slow-query asks).
+- **Boundary**: Cases with `expect_invoke: false` should **not** trigger the optimizer when the user only wants a query generated without optimization language.
+
+## Running evals
+
+Use your agent harness or skill-validator flow to:
+
+1. Load each `eval_metadata.json` prompt.
+2. Assert the skill is selected when `expect_invoke` is true and not selected when false (or that the optimizer workflow is not applied for negative cases).


This README instructs using expect_invoke, but the per-case eval_metadata.json files use expect_skill_invocation. Please align the README with the actual schema (or rename the metadata field) so the instructions and fixtures are consistent.

Suggested change

- **Invocation**: Cases with `expect_invoke: true` should trigger the mongodb-query-optimizer skill (explicit optimization / performance / slow-query asks).

- **Boundary**: Cases with `expect_invoke: false` should **not** trigger the optimizer when the user only wants a query generated without optimization language.

## Running evals

Use your agent harness or skill-validator flow to:

1. Load each `eval_metadata.json` prompt.

2. Assert the skill is selected when `expect_invoke` is true and not selected when false (or that the optimizer workflow is not applied for negative cases).

- **Invocation**: Cases with `expect_skill_invocation: true` should trigger the mongodb-query-optimizer skill (explicit optimization / performance / slow-query asks).

- **Boundary**: Cases with `expect_skill_invocation: false` should **not** trigger the optimizer when the user only wants a query generated without optimization language.

## Running evals

Use your agent harness or skill-validator flow to:

1. Load each `eval_metadata.json` prompt.

2. Assert the skill is selected when `expect_skill_invocation` is true and not selected when false (or that the optimizer workflow is not applied for negative cases).

Copilot · 2026-03-12T21:02:14Z

skills/mongodb-query-optimizer/references/covered-queries.md

+4. **No array fields** are indexed (multikey indexes prevent covering)
+5. **No sub-document fields** unless the exact sub-document path is indexed


This requirement incorrectly states that multikey indexes prevent covered queries. Multikey indexes can still cover queries under certain constraints (not returning the array field, avoiding $elemMatch, etc.). Please update this requirement so covered-query guidance matches MongoDB behavior.

Suggested change

4. **No array fields** are indexed (multikey indexes prevent covering)

5. **No sub-document fields** unless the exact sub-document path is indexed

4. **Array fields / multikey indexes** are only used in ways that still allow covering (for example, you do **not** project the array field itself and you avoid operators like `$elemMatch` that require materializing array elements)

5. **Sub-document fields** are only used when the exact indexed path(s) provide all queried and returned values

Copilot is correct, original claim is wrong/too general.

Copilot · 2026-03-12T21:02:14Z

skills/mongodb-query-optimizer/SKILL.md

+**Limitations:**
+- Cannot cover queries (must read documents)
+- Compound multikey restriction: at most **one** array field per compound index


This bullet says multikey indexes cannot cover queries. MongoDB can produce covered queries with multikey indexes under certain constraints (e.g., avoid projecting the array field and avoid $elemMatch). Please correct/qualify this limitation so the skill’s guidance stays accurate.

Copilot · 2026-03-12T21:02:17Z

...timizer/mongodb-query-optimizer-workspace/iteration-1/optimize-this-query/eval_metadata.json

+{
+  "id": "optimize-this-query",
+  "prompt": "How do I optimize this MongoDB find? db.orders.find({ status: 'open' }).sort({ createdAt: -1 })",
+  "expect_skill_invocation": true,


The workspace README and testing/mongodb-query-optimizer/evals/evals.json use expect_invoke, but this case metadata uses expect_skill_invocation. If your eval harness keys off expect_invoke, these cases won’t be asserted correctly. Please standardize the field name across the registry + per-case eval_metadata.json (either rename here to expect_invoke or update the registry/docs/harness to match).

Suggested change

"expect_skill_invocation": true,

"expect_invoke": true,

Copilot · 2026-03-12T21:02:17Z

...ngodb-query-optimizer-workspace/iteration-1/optimize-aggregation-pipeline/eval_metadata.json

+{
+  "id": "optimize-aggregation-pipeline",
+  "prompt": "Can you optimize this aggregation pipeline? It's running slow: db.orders.aggregate([{ $lookup: { from: 'customers', localField: 'customerId', foreignField: '_id', as: 'customer' } }, { $match: { status: 'shipped' } }, { $group: { _id: '$customer.region', total: { $sum: '$amount' } } }])",
+  "expect_skill_invocation": true,


The workspace README and testing/mongodb-query-optimizer/evals/evals.json use expect_invoke, but this case metadata uses expect_skill_invocation. If your eval harness keys off expect_invoke, these cases won’t be asserted correctly. Please standardize the field name across the registry + per-case eval_metadata.json (either rename here to expect_invoke or update the registry/docs/harness to match).

Suggested change

"expect_skill_invocation": true,

"expect_invoke": true,

Copilot · 2026-03-12T21:02:18Z

...mizer/mongodb-query-optimizer-workspace/iteration-1/elemMatch-array-query/eval_metadata.json

+{
+  "id": "elemMatch-array-query",
+  "prompt": "Why is my array query returning wrong results? db.orders.find({ 'items.sku': 'ABC123', 'items.quantity': { $gte: 5 } }). Sometimes it matches orders where ABC123 has quantity 2 and a different item has quantity 10.",
+  "expect_skill_invocation": true,


The workspace README and testing/mongodb-query-optimizer/evals/evals.json use expect_invoke, but this case metadata uses expect_skill_invocation. If your eval harness keys off expect_invoke, these cases won’t be asserted correctly. Please standardize the field name across the registry + per-case eval_metadata.json (either rename here to expect_invoke or update the registry/docs/harness to match).

Suggested change

"expect_skill_invocation": true,

"expect_invoke": true,

Copilot · 2026-03-12T21:02:18Z

skills/mongodb-query-optimizer/references/multikey-arrays.md

+- More index entries per document (proportional to array size)
+- Larger index size on disk and in memory
+- Higher write cost (must update multiple index entries per document)
+- Cannot cover queries (MongoDB limitation)


This list of multikey “Costs” says multikey indexes cannot cover queries. MongoDB can produce covered queries using a multikey index under specific constraints (e.g., don’t project the array field and avoid $elemMatch). Please qualify or correct this statement so readers don’t incorrectly rule out covered-query optimizations.

Suggested change

- Cannot cover queries (MongoDB limitation)

- Covered queries are limited: multikey indexes can only cover a query when the indexed array field itself is not projected and operators such as `$elemMatch` are avoided

Copilot · 2026-03-12T21:02:18Z

skills/mongodb-query-optimizer/SKILL.md

+**Requirements:**
+- All queried fields are in the index
+- All returned fields are in the index (or `_id` only)
+- Projection must exclude `_id` unless `_id` is in the index
+- No array fields indexed (multikey indexes cannot cover)


This requirement says multikey indexes cannot cover queries. MongoDB can produce covered queries with multikey indexes under specific constraints (e.g., don’t project the array field and avoid $elemMatch). Please correct/qualify this so the skill doesn’t incorrectly block covered-query advice.

asya999

I’ll leave more comments later today

asya999 · 2026-03-12T21:03:37Z

skills/mongodb-query-optimizer/references/aggregation-optimization.md

+])
+```
+
+## Minimize $unwind usage


That’s not a thing. $unwind is not expensive. However the match before unwind example is good.

ok going to rephrase this here

asya999 · 2026-03-12T21:04:40Z

skills/mongodb-query-optimizer/references/aggregation-optimization.md

+```
+
+**Consider alternatives:**
+- `$filter` array operator to process arrays without unwinding


This is only appropriate if there is an unwind and then group by _id

ok going to specify that here

asya999

About half way through

asya999 · 2026-03-13T13:53:35Z

skills/mongodb-query-optimizer/references/aggregation-optimization.md

+  { $match: { active: true } },  // Reduce left side
+  { $lookup: {
+      from: "inventory",
+      localField: "product_id",  // Ensure this is indexed


asya999 · 2026-03-13T13:54:11Z

skills/mongodb-query-optimizer/references/aggregation-optimization.md

+      foreignField: "_id",  // _id is always indexed
+      pipeline: [
+        { $match: { inStock: true } },  // Reduce right side
+        { $project: { name: 1, price: 1 } }  // Limit fields


Probably can exclude _id?

asya999 · 2026-03-13T13:55:43Z

skills/mongodb-query-optimizer/references/aggregation-optimization.md

+        { $project: { name: 1, price: 1 } }  // Limit fields
+      ],
+      as: "product"
+  }}


Also I would expect to see $unwind here since there is only one product.

asya999 · 2026-03-13T13:56:46Z

skills/mongodb-query-optimizer/references/aggregation-optimization.md

+
+1. **Filter before grouping** to reduce input documents
+2. **Use selective _id** for grouping to reduce number of groups
+3. **Project only needed fields** before $group to reduce memory per document


That’s wrong. Only needed fields should be kept in group but no projections should be done before group.

asya999 · 2026-03-13T13:57:14Z

skills/mongodb-query-optimizer/references/aggregation-optimization.md

+```javascript
+[
+  { $match: { date: { $gte: ISODate("2024-01-01") } } },  // Reduce input
+  { $project: { category: 1, amount: 1 } },  // Only needed fields


Anti pattern - this is literally an example of wrong pipeline.

This should be example of wrong in the “don’t use $project unnecessarily or too early”

asya999 · 2026-03-13T14:23:14Z

skills/mongodb-query-optimizer/references/explain-interpretation.md

+**Causes:**
+- No suitable index exists
+- Query optimizer chose not to use available index (rare)
+- Query shape doesn't match any index prefix


Same as no suitable index.

asya999 · 2026-03-13T14:24:07Z

skills/mongodb-query-optimizer/references/explain-interpretation.md

+
+### SORT (in-memory sort)
+
+**Expensive:** Sorting results in memory after retrieval.


Can be (without limit)

asya999 · 2026-03-13T14:24:41Z

skills/mongodb-query-optimizer/references/explain-interpretation.md

+```
+
+**Why this is costly:**
+- Uses memory (32MB limit per sort in MongoDB 3.4+, configurable with `internalQueryExecMaxBlockingSortBytes`)


Remove internal config reference. Limit is 100MBs.

asya999 · 2026-03-13T14:25:02Z

skills/mongodb-query-optimizer/references/explain-interpretation.md

+**Why this is costly:**
+- Uses memory (32MB limit per sort in MongoDB 3.4+, configurable with `internalQueryExecMaxBlockingSortBytes`)
+- Requires buffering all results before returning first document
+- Blocking operation (can't stream results)


These two say the same thing.

asya999 · 2026-03-13T14:25:48Z

skills/mongodb-query-optimizer/references/explain-interpretation.md

+
+### SHARDING_FILTER
+
+In sharded clusters, filters out documents not owned by the shard. Normal overhead in sharded environment.


ugh - this has more or less overhead depending on the index used. They should prefer to have an index with shard key fields so the filtering can happen against the index, but I'm not sure if there is a concise way to say it - an example could help.

dacharyc

I've got a handful of comments so far from the review-skill analysis/LLM-as-judge scoring - I'll take a deeper look later today for things that the skill won't catch, but wanted to give you this as a starting point.

dacharyc · 2026-03-13T14:43:26Z

skills/mongodb-query-optimizer/SKILL.md

+- **`explain`** — Invoke with server = MongoDB MCP server, toolName = `explain`, arguments = object with `database`, `collection`, `method` (one find/aggregate/count shape), and `verbosity`.
+  - **Always load `references/explain-interpretation.md`** when running explain to interpret output correctly
+  - **Verbosity selection:**
+    - Use `"executionStats"` when query likely completes in <10 seconds (small collections, simple queries, good indexes)


For this path, "query likely completes in <10 seconds" - an agent can't estimate this. Can we replace with concrete criteria (e.g., collection size, index availability)?

shouldn't it just use maxTimeMs or whatever client short-circuit timeout in ALL cases?

dacharyc · 2026-03-13T14:44:23Z

skills/mongodb-query-optimizer/SKILL.md

+- **Running explain()** → **Always** load `references/explain-interpretation.md` when calling the explain MCP tool
+- **Aggregation pipelines** → Load `references/aggregation-optimization.md` **first** when dealing with aggregate pipelines
+- **Queries on array fields** → Load `references/multikey-arrays.md` when fields have array values
+- **Covered query optimization** → Load `references/covered-queries.md` when the query is able to have all fields fully covered


For this line, "when the query is able to have all fields fully covered" - the agent can't determine this pre-analysis. Can we add observable triggers (user asks about covered queries, or explain shows FETCH with eliminable fields)?

dacharyc · 2026-03-13T14:45:48Z

skills/mongodb-query-optimizer/SKILL.md

+| Operation value | Use when |
+|----------------|----------|
+| `slowQueryLogs` | User asks what’s slow on the cluster, which queries to fix first—**prioritize by slowest and most frequent** when the response exposes those dimensions. Optional: `namespaces` to scope to a collection; `since` for a time window. |
+| `suggestedIndexes` | PA index recommendations—**validate** they apply to the user’s query and will have good impact before recommending creation. |


For this line, "validate they apply to the user's query and will have good impact" - "good impact" is undefined. Can we be specific about what the agent should check for? Maybe something like: confirm the suggested index matches the user's query predicates and sort.

yeah I'm a bit concerned it'll get hit by one of the existing bugs that causes "bad" PA index recommendations... but I'd hate to see us encode that here rather than just fixing them...

dacharyc · 2026-03-13T14:47:58Z

skills/mongodb-query-optimizer/SKILL.md

+
+Index `{a:1, b:1}` supports `sort({a:1, b:1})` and reverse `sort({a:-1, b:-1})`, but NOT mixed directions like `sort({a:1, b:-1})`. For mixed sorts, create index matching exact pattern.
+
+### Covered queries


For these two sections - the inline "Covered queries" section (lines 134–144) and "Array fields and multikey indexes" section (lines 146–156) duplicate their respective reference files. Since SKILL.md loads on every invocation (~2,390 tokens) and references load conditionally, can we consider trimming these to one-line triggers?

Agreed but also a lot of things are not quite correct and I'd prefer we have to fix it in one place only, not two.

dacharyc · 2026-03-13T14:51:53Z

skills/mongodb-query-optimizer/references/covered-queries.md

The review-skill skill's LLM-as-judge-scoring notes this file doesn't contain any novel information - it's just restating things that are covered in MongoDB documentation + is already in the model's base training data set. Can we examine this content and see if any of it can be trimmed, or if the entire file could be removed from the skill? Did you do any testing with and without the skill loaded for concepts covered in this file to see if including it makes any difference in agent performance on tasks that would reference this content?

dacharyc · 2026-03-13T14:56:59Z

skills/mongodb-query-optimizer/references/aggregation-optimization.md

This reference file scored a "3" on Novelty in the LLM-as-judge scoring, which meets the threshold for novelty but hints that it is not highly novel. The takeaway is that some of this content may be restating standard MongoDB documentation/data that is already in the LLM base training data. Have we tested agent performance with and without this file included in the skill to see if it actually improves outputs? I'm wondering if we can trim some of the non-novel content and still see any output improvement, or if we should consider restructuring to be more workflow-focused or otherwise find ways to emphasize any novel guidance aspects we're including here. Or if it may make sense to omit this reference if it doesn't actually improve agent outputs.

In addition, some of the things that are flagged as being "novel" are incorrect or should not be mentioned as they are not to be changed by user/agent (like the internal sort/group settings).

dacharyc · 2026-03-13T14:57:46Z

skills/mongodb-query-optimizer/references/explain-interpretation.md

This reference file scored a "3" on Novelty in the LLM-as-judge scoring, which meets the threshold for novelty but hints that it is not highly novel. The takeaway is that some of this content may be restating standard MongoDB documentation/data that is already in the LLM base training data. Have we tested agent performance with and without this file included in the skill to see if it actually improves outputs? I'm wondering if we can trim some of the non-novel content and still see any output improvement, or if we should consider restructuring to be more workflow-focused or otherwise find ways to emphasize any novel guidance aspects we're including here. Or if it may make sense to omit this reference if it doesn't actually improve agent outputs.

asya999

additional comments

asya999 · 2026-03-13T14:55:47Z

skills/mongodb-query-optimizer/references/explain-interpretation.md

+      }
+    }
+  }
+}


Is this a real example? The query isn’t included. 28 keys and 25 docs examined seems sus also.

asya999 · 2026-03-13T14:58:00Z

skills/mongodb-query-optimizer/references/explain-interpretation.md

+- Queries returning most documents
+- Collection fits entirely in cache
+
+**Optimizer may choose COLLSCAN if:**


Never when there is a qualified index

asya999 · 2026-03-15T17:01:36Z

skills/mongodb-query-optimizer/references/aggregation-optimization.md

In addition, some of the things that are flagged as being "novel" are incorrect or should not be mentioned as they are not to be changed by user/agent (like the internal sort/group settings).

asya999 · 2026-03-15T17:03:34Z

skills/mongodb-query-optimizer/references/aggregation-optimization.md

+db.orders.aggregate([
+  { $lookup: { ... } },  // Processes all documents first
+  { $match: { status: "shipped" } }  // Too late, already did expensive work
+])


not a good example as in current version we automatically swap $match in front of $lookup in this case.

asya999 · 2026-03-15T17:04:19Z

skills/mongodb-query-optimizer/references/aggregation-optimization.md

+```javascript
+db.orders.aggregate([
+  { $match: { status: "shipped", region: "US" } },  // Uses index, reduces data early
+  { $lookup: { ... } },


should there be an unwind here? The only way this pipeline makes sense is if $group is grouping on something that came out of $lookup and grouping by array things is ... uncommon.

asya999 · 2026-03-15T18:19:15Z

skills/mongodb-query-optimizer/SKILL.md

+
+### Cardinality and selectivity
+
+High-cardinality fields (many distinct values) are more selective—prefer for equality predicates. Low-cardinality fields (e.g., `status`) help in compound indexes. Good selectivity: `totalKeysExamined` ≈ `nReturned` in explain output.


wait, no. All equality fields are equally valuable - selectivity only matters overall for the query, not for individual fields...

asya999 · 2026-03-15T18:20:16Z

skills/mongodb-query-optimizer/SKILL.md

+
+Index `{a:1, b:1}` supports `sort({a:1, b:1})` and reverse `sort({a:-1, b:-1})`, but NOT mixed directions like `sort({a:1, b:-1})`. For mixed sorts, create index matching exact pattern.
+
+### Covered queries


Agreed but also a lot of things are not quite correct and I'd prefer we have to fix it in one place only, not two.

asya999 · 2026-03-15T18:28:29Z

testing/mongodb-query-optimizer/evals/evals.json

+      "iteration": "iteration-1",
+      "path": "iteration-1/array-field-query",
+      "prompt": "How do I optimize a query on an array field? db.products.find({ tags: 'electronics', 'specs.features': { $all: ['wifi', 'bluetooth'] } }). The tags array has hundreds of values per document.",
+      "expect_invoke": true,


there is some confusion here. Tags is an array but "specs.features" is being queried with an array operator...

asya999 · 2026-03-15T18:56:00Z

testing/mongodb-query-optimizer/evals/evals.json

+      "path": "iteration-1/aggregation-unwind-performance",
+      "prompt": "My aggregation is timing out. It uses $unwind on an array with thousands of elements per document. Pipeline: [{ $unwind: '$events' }, { $match: { 'events.type': 'purchase' } }, { $group: { _id: '$userId', count: { $sum: 1 } } }]. How do I fix this?",
+      "expect_invoke": true,
+      "tags": ["aggregation", "unwind", "performance", "optimization", "references"]


when we remove unwind section from agg section we should ensure all the tags referencing it are also removed - the issue here is not $unwind it's the fact that there is no $match in front of it.

asya999 · 2026-03-15T18:57:22Z

...godb-query-optimizer-workspace/iteration-1/aggregation-unwind-performance/eval_metadata.json

+  "prompt": "My aggregation is timing out. It uses $unwind on an array with thousands of elements per document. Pipeline: [{ $unwind: '$events' }, { $match: { 'events.type': 'purchase' } }, { $group: { _id: '$userId', count: { $sum: 1 } } }]. How do I fix this?",
+  "expect_skill_invocation": true,
+  "rationale": "Performance issue with aggregation pipeline — skill should invoke and load references/aggregation-optimization.md. Key problem: $unwind multiplies document count before filtering. Should suggest: 1) Add $match before $unwind to reduce documents, 2) Consider $filter array operator instead of $unwind + $match, 3) May need index on userId for $group, 4) Discuss allowDiskUse if needed. Should explain $unwind performance implications with large arrays.",
+  "optional_mcp": ["collection-indexes", "explain"]


all of this is incorrect except for adding $match before $unwind - same as previous example so I'm not sure the value of having them both. You don't need index on userId for $group, it cannot be used here.

codeowners-service-app · 2026-03-16T11:33:21Z

Assigned misbamongo for team dbx-devtools because betsybutton is out of office.

Copilot

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 11 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T13:07:24Z

skills/mongodb-query-optimizer/references/update-query-examples.md

+)
+```
+
+**Why:** `findOneAndUpdate` writes a copy of the pre-change document to a side collection for retryable writes. This overhead is unnecessary if you don't need the returned document.  


The rationale for avoiding findOneAndUpdate is overstated as written. The extra overhead (returning the document and, for retryable writes, possibly recording pre/post images) is conditional and not simply “writes a copy … to a side collection” in all cases. Please tighten this explanation to describe the actual conditions that introduce the extra work and keep the recommendation precise.

Suggested change

**Why:** `findOneAndUpdate` writes a copy of the pre-change document to a side collection for retryable writes. This overhead is unnecessary if you don't need the returned document.

**Why:** `findOneAndUpdate` must build and return a result document and, when retryable writes with pre/post images are enabled, may perform additional internal bookkeeping to record those images. If you don't need the modified document back, `updateOne` avoids this extra work.

Copilot · 2026-03-20T13:07:25Z

...timizer/mongodb-query-optimizer-workspace/iteration-1/optimize-this-query/eval_metadata.json

+{
+  "id": "optimize-this-query",
+  "prompt": "How do I optimize this MongoDB find? db.orders.find({ status: 'open' }).sort({ createdAt: -1 })",
+  "expect_skill_invocation": true,


This case uses expect_skill_invocation, while the registry (testing/mongodb-query-optimizer/evals/evals.json) and workspace README refer to expect_invoke. Align the field name across the workspace so eval runners don’t need special-casing.

Suggested change

"expect_skill_invocation": true,

"expect_invoke": true,

Copilot · 2026-03-20T13:07:25Z

skills/mongodb-query-optimizer/references/explain-interpretation.md

+**Optimizer willmay choose COLLSCAN in absence of an eligible index.f:**
+
+- Index selectivity is poor for this query


This heading sentence is garbled, which makes the guidance hard to read and search. Please fix the wording/typos so it reads as a coherent statement about when the optimizer may pick a COLLSCAN.

Copilot · 2026-03-20T13:07:25Z

skills/mongodb-query-optimizer/references/aggregation-pipeline-examples.md

+Blocking stages (such as in-memory $sort and $group) have a 100MB memory limit per stage. Now, the default behavior when this limit is exceeded is to spill to disk automatically (allowDiskUse defaults to true).
+


This states that spilling to disk is automatic because allowDiskUse “defaults to true”. That’s not generally correct—disk spilling for blocking stages depends on allowDiskUse (or a server-level default like allowDiskUseByDefault), and many deployments default to not spilling. Please adjust this to describe the actual condition under which spilling occurs, and mention how to enable it if desired.

Suggested change

Blocking stages (such as in-memory $sort and $group) have a 100MB memory limit per stage. Now, the default behavior when this limit is exceeded is to spill to disk automatically (allowDiskUse defaults to true).

Blocking stages (such as in-memory $sort and $group) have a 100MB memory limit per stage. Whether they can spill to disk when this limit is exceeded depends on the aggregation's allowDiskUse setting (or a server-level default such as allowDiskUseByDefault).

If allowDiskUse is false and a blocking stage exceeds its memory limit, the operation fails with an error instead of spilling. To allow spilling, run the aggregation with { allowDiskUse: true } or configure an appropriate server-level default.

Copilot · 2026-03-20T13:07:25Z

skills/mongodb-query-optimizer/references/core-indexing-principles.md

+
+- **Equality** fields first (e.g. `{field: value}`, `{$in: [...]}` matching \<= 200 elements, `{field: {$eq: value}}`)  
+- **Sort** fields next  
+- **Range** fields last (e.g. `$gt`, `$lt`, `$gte`, `$lte`, `{$in: [...]}`matching \> 200 elements in the array), `$ne`)


The ESR bullet for “Range” has a typo and mismatched punctuation/parentheses (e.g., {$in: [...]}matching ...), which makes the rule harder to parse. Please fix spacing and the parenthesis structure so the examples read cleanly and unambiguously.

Suggested change

- **Range** fields last (e.g. `$gt`, `$lt`, `$gte`, `$lte`, `{$in: [...]}`matching \> 200 elements in the array), `$ne`)

- **Range** fields last (e.g. `$gt`, `$lt`, `$gte`, `$lte`, `{$in: [...]}` matching \> 200 elements in the array, `$ne`)

Copilot · 2026-03-20T13:07:26Z

.../mongodb-query-optimizer-workspace/iteration-1/covered-query-optimization/eval_metadata.json

+  "id": "covered-query-optimization",
+  "prompt": "I want to make this query as fast as possible without reading documents. Query: db.users.find({ status: 'active', plan: 'premium' }, { status: 1, plan: 1, email: 1, _id: 0 }).sort({ email: 1 }). Can this be a covered query?",
+  "expect_skill_invocation": true,
+  "rationale": "User asks about covered query optimization explicitly. Skill should invoke and load references/covered-queries.md. Should explain requirements (all fields in index, _id: 0), suggest compound index { status: 1, plan: 1, email: 1 } following ESR, verify with explain showing totalDocsExamined: 0.",


The rationale references references/covered-queries.md, but that file does not exist under skills/mongodb-query-optimizer/references/. Update the reference to an existing doc (e.g., the “Covered Queries” section in references/core-indexing-principles.md) or add the missing reference file so the eval guidance is actionable.

Suggested change

"rationale": "User asks about covered query optimization explicitly. Skill should invoke and load references/covered-queries.md. Should explain requirements (all fields in index, _id: 0), suggest compound index { status: 1, plan: 1, email: 1 } following ESR, verify with explain showing totalDocsExamined: 0.",

"rationale": "User asks about covered query optimization explicitly. Skill should invoke and load references/core-indexing-principles.md (Covered Queries section). Should explain requirements (all fields in index, _id: 0), suggest compound index { status: 1, plan: 1, email: 1 } following ESR, verify with explain showing totalDocsExamined: 0.",

Copilot · 2026-03-20T13:07:26Z

...ngodb-query-optimizer-workspace/iteration-1/optimize-aggregation-pipeline/eval_metadata.json

+  "id": "optimize-aggregation-pipeline",
+  "prompt": "Can you optimize this aggregation pipeline? It's running slow: db.orders.aggregate([{ $lookup: { from: 'customers', localField: 'customerId', foreignField: '_id', as: 'customer' } }, { $match: { status: 'shipped' } }, { $group: { _id: '$customer.region', total: { $sum: '$amount' } } }])",
+  "expect_skill_invocation": true,
+  "rationale": "Explicit 'optimize' + aggregation pipeline — skill should invoke and MUST load references/aggregation-optimization.md. Key issue: $match after $lookup (should be before). Should suggest moving $match earlier and checking indexes.",


The rationale says to load references/aggregation-optimization.md, but the repo contains references/aggregation-pipeline-examples.md instead (and no aggregation-optimization.md). Update this reference so the eval points at a real file, otherwise the evaluation instructions are impossible to follow.

Suggested change

"rationale": "Explicit 'optimize' + aggregation pipeline — skill should invoke and MUST load references/aggregation-optimization.md. Key issue: $match after $lookup (should be before). Should suggest moving $match earlier and checking indexes.",

"rationale": "Explicit 'optimize' + aggregation pipeline — skill should invoke and MUST load references/aggregation-pipeline-examples.md. Key issue: $match after $lookup (should be before). Should suggest moving $match earlier and checking indexes.",

Copilot · 2026-03-20T13:07:26Z

...godb-query-optimizer-workspace/iteration-1/aggregation-unwind-performance/eval_metadata.json

+  "id": "aggregation-unwind-performance",
+  "prompt": "My aggregation is timing out. It uses $unwind on an array with thousands of elements per document. Pipeline: [{ $unwind: '$events' }, { $match: { 'events.type': 'purchase' } }, { $group: { _id: '$userId', count: { $sum: 1 } } }]. How do I fix this?",
+  "expect_skill_invocation": true,
+  "rationale": "Performance issue with aggregation pipeline — skill should invoke and load references/aggregation-optimization.md. Key problem: $unwind multiplies document count before filtering. Should suggest: 1) Add $match before $unwind to reduce documents, 2) Consider $filter array operator instead of $unwind + $match, 3) May need index on userId for $group, 4) Discuss allowDiskUse if needed. Should explain $unwind performance implications with large arrays.",


The rationale says to load references/aggregation-optimization.md, but that file does not exist under skills/mongodb-query-optimizer/references/ (the closest match is aggregation-pipeline-examples.md). Update the reference so this eval metadata doesn’t point to a missing file.

Suggested change

"rationale": "Performance issue with aggregation pipeline — skill should invoke and load references/aggregation-optimization.md. Key problem: $unwind multiplies document count before filtering. Should suggest: 1) Add $match before $unwind to reduce documents, 2) Consider $filter array operator instead of $unwind + $match, 3) May need index on userId for $group, 4) Discuss allowDiskUse if needed. Should explain $unwind performance implications with large arrays.",

"rationale": "Performance issue with aggregation pipeline — skill should invoke and load references/aggregation-pipeline-examples.md. Key problem: $unwind multiplies document count before filtering. Should suggest: 1) Add $match before $unwind to reduce documents, 2) Consider $filter array operator instead of $unwind + $match, 3) May need index on userId for $group, 4) Discuss allowDiskUse if needed. Should explain $unwind performance implications with large arrays.",

Copilot · 2026-03-20T13:07:26Z

skills/mongodb-query-optimizer/references/find-query-examples.md

+**Before** — `$exists: true` on a regular index still requires a document fetch:
+
+```javascript
+db.collection.createIndex({ a: 1 })
+db.collection.find({ a: { $exists: true } })
+// Cannot efficiently answer — null semantics require checking each document
+```
+
+**After** — Use a sparse index, which only contains entries where the field exists:
+
+```javascript
+db.collection.createIndex({ a: 1 }, { sparse: true })
+db.collection.find({ a: { $exists: true } })
+// Answered directly from the index — no document fetch needed
+```
+
+**Why:** Regular indexes store `null` for both missing and explicitly-null fields, so `$exists` can't distinguish them without fetching. Sparse indexes only store entries for documents where the field exists.


The $exists + sparse-index guidance appears incorrect/misleading: for a single-field index, documents missing the field typically have no index entry, so { a: { $exists: true } } can usually be satisfied via an index scan without requiring a sparse index. Also, sparse indexes change query semantics in some cases and shouldn’t be recommended solely to “avoid a document fetch” here. Please revise this example to reflect correct index behavior (and call out the real tradeoffs/when sparse or partial indexes help).

Copilot · 2026-03-20T13:07:27Z

skills/mongodb-query-optimizer/references/update-query-examples.md

+**After** — Use aggregation-based update to generate smaller oplog deltas:
+
+```javascript
+db.coll.updateOne({ _id: X }, [{ $replaceWith: { $literal: entireNewDocument } }])
+```
+
+**Why:** `replaceOne` writes the full document to the oplog. The aggregation update syntax lets MongoDB compute deltas, resulting in smaller oplog entries when only a few fields are changed.


This suggests that using an aggregation-pipeline update with $replaceWith produces “smaller oplog deltas” than replaceOne. An update pipeline that replaces the whole document is still a full replacement in effect, and may not reduce oplog size vs replaceOne. If the goal is smaller oplog entries, the example should demonstrate updating only changed fields (e.g., $set/$unset) and explain when MongoDB can record diffs vs full replacements.

initial basic skill to suggest an index

93b4ef9

betsybutton reviewed Mar 6, 2026

View reviewed changes

asya999 reviewed Mar 9, 2026

View reviewed changes

davidjahn and others added 5 commits March 11, 2026 15:01

Merge branch 'main' into query-optimizing-skill

4f14e32

Narrow invocation scope + add eval cases

0c23dc3

Improve MCP tool explanations

c444725

Add more reference material + tests

8e3949f

Merge branch 'main' into query-optimizing-skill

55b54ed

davidjahn marked this pull request as ready for review March 12, 2026 20:55

davidjahn requested review from a team as code owners March 12, 2026 20:55

davidjahn requested review from asya999, betsybutton and Copilot March 12, 2026 20:55

Copilot started reviewing on behalf of davidjahn March 12, 2026 20:55 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

asya999 reviewed Mar 12, 2026

View reviewed changes

nirinchev changed the title ~~MongoDB Query Optimizer Skill~~ MongoDB Query Optimizer Skill MCP-421 Mar 12, 2026

Merge branch 'main' into query-optimizing-skill

74dd3fd

asya999 reviewed Mar 13, 2026

View reviewed changes

dacharyc reviewed Mar 13, 2026

View reviewed changes

asya999 requested changes Mar 15, 2026

View reviewed changes

codeowners-service-app bot requested a review from misbamongo March 16, 2026 11:33

Fix wrong information + change structure

1bf2406

Copilot AI review requested due to automatic review settings March 20, 2026 12:59

Copilot started reviewing on behalf of davidjahn March 20, 2026 13:00 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes


		## Out of scope (for this skill)

		- Performance Advisor / slow query logs

		4. No array fields are indexed (multikey indexes prevent covering)
		5. No sub-document fields unless the exact sub-document path is indexed

	- Cannot cover queries (MongoDB limitation)
	- Covered queries are limited: multikey indexes can only cover a query when the indexed array field itself is not projected and operators such as `$elemMatch` are avoided


		### SORT (in-memory sort)

		Expensive: Sorting results in memory after retrieval.


		### SHARDING_FILTER

		In sharded clusters, filters out documents not owned by the shard. Normal overhead in sharded environment.


		Index `{a:1, b:1}` supports `sort({a:1, b:1})` and reverse `sort({a:-1, b:-1})`, but NOT mixed directions like `sort({a:1, b:-1})`. For mixed sorts, create index matching exact pattern.

		### Covered queries


		### Cardinality and selectivity

		High-cardinality fields (many distinct values) are more selective—prefer for equality predicates. Low-cardinality fields (e.g., `status`) help in compound indexes. Good selectivity: `totalKeysExamined` ≈ `nReturned` in explain output.

	Why: `findOneAndUpdate` writes a copy of the pre-change document to a side collection for retryable writes. This overhead is unnecessary if you don't need the returned document.
	Why: `findOneAndUpdate` must build and return a result document and, when retryable writes with pre/post images are enabled, may perform additional internal bookkeeping to record those images. If you don't need the modified document back, `updateOne` avoids this extra work.

		Optimizer willmay choose COLLSCAN in absence of an eligible index.f:

		- Index selectivity is poor for this query

		Blocking stages (such as in-memory $sort and $group) have a 100MB memory limit per stage. Now, the default behavior when this limit is exceeded is to spill to disk automatically (allowDiskUse defaults to true).

	Blocking stages (such as in-memory $sort and $group) have a 100MB memory limit per stage. Now, the default behavior when this limit is exceeded is to spill to disk automatically (allowDiskUse defaults to true).
	Blocking stages (such as in-memory $sort and $group) have a 100MB memory limit per stage. Whether they can spill to disk when this limit is exceeded depends on the aggregation's allowDiskUse setting (or a server-level default such as allowDiskUseByDefault).
	If allowDiskUse is false and a blocking stage exceeds its memory limit, the operation fails with an error instead of spilling. To allow spilling, run the aggregation with { allowDiskUse: true } or configure an appropriate server-level default.

	- Range fields last (e.g. `$gt`, `$lt`, `$gte`, `$lte`, `{$in: [...]}`matching \> 200 elements in the array), `$ne`)
	- Range fields last (e.g. `$gt`, `$lt`, `$gte`, `$lte`, `{$in: [...]}` matching \> 200 elements in the array, `$ne`)

	"rationale": "User asks about covered query optimization explicitly. Skill should invoke and load references/covered-queries.md. Should explain requirements (all fields in index, _id: 0), suggest compound index { status: 1, plan: 1, email: 1 } following ESR, verify with explain showing totalDocsExamined: 0.",
	"rationale": "User asks about covered query optimization explicitly. Skill should invoke and load references/core-indexing-principles.md (Covered Queries section). Should explain requirements (all fields in index, _id: 0), suggest compound index { status: 1, plan: 1, email: 1 } following ESR, verify with explain showing totalDocsExamined: 0.",

	"rationale": "Explicit 'optimize' + aggregation pipeline — skill should invoke and MUST load references/aggregation-optimization.md. Key issue: $match after $lookup (should be before). Should suggest moving $match earlier and checking indexes.",
	"rationale": "Explicit 'optimize' + aggregation pipeline — skill should invoke and MUST load references/aggregation-pipeline-examples.md. Key issue: $match after $lookup (should be before). Should suggest moving $match earlier and checking indexes.",

Conversation

davidjahn commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

MongoDB Agent Skill Submission

Skill Information

Use Case

Value Proposition

Special Considerations

Validation Prompts

Author Self-Validation

skill-validator output

Review Summary: mongodb-query-optimizer

1. Structural Validation

2. SKILL.md Scores

3. Reference File Scores

4. Novelty Assessment

Novel Information Identified:

5. Content Review

6. Recommendation

SME Review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asya999 left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

asya999 left a comment

davidjahn commented Mar 6, 2026 •

edited

Loading

asya999 Mar 13, 2026 •

edited

Loading