Skip to content

Conversation

@rdimitrov
Copy link
Member

Motivation and Context

The following PR adds a query parameter for searching by the distribution type (remote, npm, nuget, etc). This is so registry clients supporting nuget can query just for those (less load on the registry server + less load on the client to go through all results).

Note that this is not a breaking change.

How Has This Been Tested?

Locally + unit tests

Breaking Changes

No

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Fixes: #781

Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
@rdimitrov rdimitrov requested a review from domdomegg November 19, 2025 10:59
description: |
Filter by distribution type. Allows discovering servers based on how they are made available to users.
Example values include: 'remote', 'npm', 'pypi', 'oci', 'nuget', 'mcpb'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth changing remote to sse and streamable-http? I know that at Anthropic we maybe have plans to deprecate sse so being able to filter by the actual remote type might be useful - and it also aligns with filtering by the package type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's reasonable 👍 Would it be a problem if we continue to have remote though as an option that aggregates all remote servers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a slight preference for making this composable in some way, e.g. you can specify the type multiple times or you can comma-separate types or something? then we don't need remote, you can just specify streamable-http,sse

@domdomegg
Copy link
Member

Meta: I might want one extra reviewer to check this as well - adding an endpoint is something that is hard to take back later so better to double check. Maybe we can discuss later today in maintainers meeting?

@rdimitrov
Copy link
Member Author

Meta: I might want one extra reviewer to check this as well - adding an endpoint is something that is hard to take back later so better to double check. Maybe we can discuss later today in maintainers meeting?

Just to chime in that this adds a property to an existing endpoint and not a new one 👍

Copy link
Member

@tadasant tadasant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In concept, I am supportive of adding "filter" functionality to the registry spec. It's clear that for ETL processes pulling down entries from the Registry or a sub-registry, they will often just want to pull down a subset based on downstream support of features.

However, I'm not a fan of adding a type query parameter here. If we do this for type, there's an explosion of other filters we would create a precedent for adding. For example uses_template_variables, has_git_repository, uses_semantic_versioning, etc etc. It wouldn't be obvious to me why we would support type and not those others.

To land this, I think we need to make the feature more generic. Rather then encoding every possible "filter" into the spec, the caller should be able to express a filter that the server then applies on its response.

One option is to take a page from WoT Discovery and use JSONPath. The queries it would create are ugly, but it'd be a nice standardized way to allow consumers to filter whatever fields they care about. My main concern with this approach would be whether the server side implementation is very nonperformant, and whether we would need to lock down what is allowed. A solution for that: the generic schema could support JSONPath, but the public registry API could just support very specific predefined JSONPath queries (this type one being one of them).

I think I'd be good with a JSONPath solution, though also am curious for more/alternative ideas that would fulfill the "flexible for consumers" kind of requirement here.

@claude do you have any ideas? Propose a few alternatives to JSONPath, and share your opinion on whether JSONPath is a good solution here for a generic filtering capability.

@claude
Copy link
Contributor

claude bot commented Nov 19, 2025

Claude finished @rdimitrov's task —— View job


Generic Filtering Approaches for Registry API

I've analyzed the PR and filtering requirements. Here are several alternatives to JSONPath, along with my evaluation:

Proposed Alternatives

1. OData-style Query Parameters (Recommended)

Uses query string syntax inspired by OData but simplified for this use case.

Examples:

?filter=packages/any(p: p/registryType eq 'npm')
?filter=version eq '1.0.0' and remotes ne null
?filter=repository/source eq 'github'

Pros:

  • Industry standard (Microsoft, SAP, etc.)
  • Human-readable and URL-friendly
  • Easily validated and rate-limited
  • Can be incrementally adopted (start with simple equality, add operators later)
  • Good tooling support in many languages

Cons:

  • Custom parser needed (though libraries exist)
  • Slightly verbose syntax
  • Need to define supported operators carefully

2. GraphQL-style Field Selection

Allow clients to specify what they want using GraphQL-like syntax in query params.

Examples:

?where={packages:{registryType:"npm"}}
?where={repository:{source:"github"},version:"1.0.0"}

Pros:

  • Modern, widely understood syntax
  • JSON-based so familiar to developers
  • Easy to parse (just JSON)
  • Can express complex nested conditions

Cons:

  • Can become large in URLs
  • Requires URL encoding which hurts readability
  • May encourage overly complex queries

3. RSQL/FIQL (RESTful Service Query Language)

A query language designed specifically for REST APIs.

Examples:

?filter=packages.registryType==npm
?filter=version==1.0.0;repository.source==github
?filter=packages.registryType=in=(npm,pypi)

Pros:

  • Designed for REST/URLs specifically
  • Clean, compact syntax
  • Well-defined operators (==, !=, =lt=, =gt=, =in=, etc.)
  • Libraries available in multiple languages

Cons:

  • Less well-known than OData or GraphQL
  • Operators like =in= may look unusual to some developers

4. Predefined Named Filters

Create a set of common filter presets that can be extended over time.

Examples:

?filter=nuget-only
?filter=npm-with-github
?filter=has-remotes

Pros:

  • Dead simple for common cases
  • Easy to implement and maintain
  • Can be optimized per filter
  • Great developer experience for common patterns
  • Backwards compatible when adding new filters

Cons:

  • Not flexible for arbitrary queries
  • Need to add new filters over time
  • Doesn't solve the "explosion of filters" problem, just organizes it

5. Simplified JSONPath (Restricted Subset)

Use JSONPath but limit to simple expressions only.

Examples:

?filter=$.packages[?(@.registryType=='npm')]
?filter=$.repository.source=='github'

Pros:

  • Standard specification (RFC 9535)
  • Powerful when needed
  • Good library support

Cons:

  • Complex syntax, poor readability in URLs
  • Overkill for most filtering needs
  • Harder to validate and secure
  • Performance concerns with complex expressions
  • The examples from WoT are indeed ugly in practice

My Recommendation

I recommend a hybrid approach:

  1. Short term (for this PR): Use simple query parameters for the most common filters:

    • ?type=npm (current PR approach)
    • ?type=npm,pypi (comma-separated for OR)
    • Keep it simple and solve the immediate NuGet client need
  2. Medium term (next few months): Implement RSQL/FIQL or simplified OData for generic filtering:

    • Both are designed for REST APIs
    • URL-friendly
    • Can be incrementally adopted
    • Easy to whitelist/blacklist certain fields or operators
    • RSQL slight edge for simplicity; OData for familiarity
  3. Include an escape hatch: For complex ETL needs, consider:

    • Allowing bulk export/streaming endpoints
    • Supporting filter presets for complex queries
    • Providing client SDKs that handle complexity

Evaluation of JSONPath

Is JSONPath a good solution here?

No, I don't think so, for these reasons:

  1. URL readability: JSONPath expressions become unwieldy in URLs. Compare:

    • JSONPath: ?filter=$.packages[?(@.registryType=='npm')]
    • RSQL: ?filter=packages.registryType==npm
    • Simple: ?type=npm
  2. Overkill: JSONPath's power (selecting array elements, recursive descent, complex predicates) isn't needed for filtering tabular results. The registry is returning a flat list of servers, not navigating complex document structures.

  3. Security: JSONPath can be expensive to evaluate and hard to limit. RSQL/OData make it easier to whitelist operators and fields.

  4. Developer experience: Most developers won't be familiar with JSONPath syntax, whereas simple equality/comparison operators are universal.

  5. Performance: JSONPath expressions may encourage inefficient queries that scan entire datasets rather than using indexes.


Implementation Path Forward

For this specific PR, I suggest:

  1. Keep the simple type parameter but enhance it to accept multiple values:

    ?type=npm           # Single type
    ?type=npm,pypi      # Multiple types (OR logic)
    
  2. Document this as v0.1 of filtering with a note that more sophisticated filtering will come in future versions

  3. Create a follow-up issue to design the generic filtering approach using RSQL or OData

  4. In the meantime, clients like NuGet can use ?type=nuget to solve their immediate needs

This approach:

  • ✅ Solves the immediate problem (NuGet filtering)
  • ✅ Doesn't create bad precedent (multiple values via comma-separation is extensible)
  • ✅ Respects the API freeze (additive change only)
  • ✅ Buys time to design generic filtering properly
  • ✅ Aligns with discussion in review comments about making it composable

References:

Analysis of Current PR Implementation

The current PR adds a simple type query parameter that filters by distribution type. The implementation:

  • ✅ Is non-breaking (optional parameter)
  • ✅ Validates input against known types
  • ✅ Works at the database level for efficiency
  • ✅ Solves the immediate NuGet client need
  • ⚠️ Could be enhanced to support multiple types (comma-separated)
  • ⚠️ Needs decision on whether to keep as-is or evolve to generic filtering

@tadasant
Copy link
Member

After poking around some more, I can't find any good precedent for introducing something like what I'm suggesting into a major piece of ecosystem infra. In fact it looks like NuGet used to use OData and migrated away in their v3, and no other reasonable precedent exists.

I actually wonder if 4. Predefined Named Filters from Claude's response might be the best way to go. No performance concerns, still flexible and every sub-registry can have their own "predefined" sets that they care about. It's actually not that different from your original proposal conceptually, where you are introducing a new notion of type that could mean remote, could mean local npm, etc. We just tweak by:

  • Stepping up the abstraction layer (filter instead of type as the key)
  • Only define the key and not the values in the sub-registry spec (we don't need to make decisions on whether 'remote', 'npm', 'pypi', 'oci', 'nuget', 'mcpb' etc are the right values - any sub-registry can come up with any filters they please; maybe the official registry can offer some popular ones that folks might standardize around, but we could tweak them without breaking API changes over time)

What do you think @rdimitrov? Would that still fulfill your needs with the benefit of avoiding the explosion of modeling for us?

@claude what's your deeper take on this path for the MCP Registry?

@claude
Copy link
Contributor

claude bot commented Nov 20, 2025

Claude finished @tadasant's task —— View job


Deep Analysis: Predefined Named Filters for MCP Registry

After reviewing the PR implementation and ecosystem architecture, I have strong opinions on the predefined filters approach. TL;DR: I recommend it highly, with some refinements.


Why Predefined Named Filters Work Well Here

The MCP Registry has unique characteristics that make predefined filters particularly suitable:

1. Subregistry ETL Is Core to the Architecture

From the ecosystem vision docs, the registry is explicitly designed for subregistries to ETL from it:

"Additionally, we expect clients pull from subregistries. These subregistries add value to the registry ecosystem by providing curation, or extending it with additional metadata. The Official MCP registry expects a lot of API requests from ETL jobs from these subregistries."

Key insight: Subregistries will be pulling the entire dataset or large subsets regularly. They're not doing one-off queries that need complex filtering. The pattern is:

  • ETL job runs periodically (hourly, daily)
  • Pulls entire dataset or broad category (e.g., "all npm packages")
  • Processes locally with their own custom logic
  • Adds curation, ratings, enhanced metadata

This means the registry doesn't need to support arbitrary complex queries. It needs to support efficient bulk operations with a few common filters.

2. Finite, Stable Set of Distribution Types

Looking at the current implementation, there are exactly 6 distribution types:

  • remote (SSE, streamable-http)
  • npm
  • pypi
  • oci
  • nuget
  • mcpb

This is not an unbounded problem space. Package ecosystems are stable and well-defined. We're not likely to add 50 more distribution types. Even if we do add a few over time (e.g., cargo, gem, composer), this remains manageable.

3. API Freeze Makes Extensibility Critical

You're in an API freeze (v0.1). The spec says:

"For the next month or more, the API will remain stable with no breaking changes, allowing integrators to confidently implement support."

Predefined filters are additive and non-breaking. You can:

  • Start with: ?filter=nuget-only
  • Add later: ?filter=npm-with-github, ?filter=has-remote-transport, ?filter=uses-semver
  • Never break existing clients

Generic filter languages (JSONPath, RSQL, OData) would be hard to evolve without breaking changes because:

  • Parser behavior changes are breaking
  • Adding/removing operator support is breaking
  • Field name changes break queries

Recommended Implementation: Hybrid Approach

I recommend a two-tier system that combines the benefits of both approaches:

Tier 1: Simple Field Filters (What This PR Does)

Keep simple query params for common, stable fields:

?type=npm              # Single type
?type=npm,pypi         # Multiple types (OR semantics)
?version=latest        # Already exists
?search=filesystem     # Already exists

Why this tier?

  • These fields are fundamental to the data model
  • They're stable (won't change)
  • They map directly to database queries (efficient)
  • They're composable with each other

Enhancement for this PR: Support comma-separated values for type:

// Handle type parameter with comma-separated support
if input.Type != "" {
    types := strings.Split(input.Type, ",")
    for _, t := range types {
        t = strings.TrimSpace(t)
        if !model.IsValidDistributionType(t) {
            return nil, huma.Error400BadRequest(fmt.Sprintf("Invalid type parameter: '%s' is not a valid distribution type", t))
        }
    }
    filter.ConfigTypes = types  // plural, pass as array
}

Tier 2: Named Composite Filters (Future Addition)

Add a preset or profile parameter for complex, curated queries:

?preset=official-recommended    # Servers from verified publishers
?preset=github-only            # Only servers with GitHub repos
?preset=production-ready       # Servers with semver, docs, tests
?preset=enterprise-safe        # Servers with security audits, SLAs

Why this tier?

  • Solves the "explosion of filters" problem through curation
  • Allows complex logic without exposing it in the API
  • Sub-registries can define their own presets
  • Can be optimized server-side (materialized views, caching)
  • Provides semantic meaning ("production-ready" is clearer than a complex filter expression)

Addressing the "Explosion of Filters" Concern

@tadasant raised a valid concern:

"If we do this for type, there's an explosion of other filters we would create a precedent for adding. For example uses_template_variables, has_git_repository, uses_semantic_versioning, etc etc."

My take: There are actually two categories of filters:

Category A: Core Structural Filters (Bounded Set)

These map to fundamental data model properties:

  • type - distribution type (6 values, rarely changes)
  • version - version filtering (already exists)
  • search - name search (already exists)
  • updated_since - temporal filtering (already exists)

These should be simple query params because:

  • They're intrinsic to what a server is
  • They're efficiently indexed in the database
  • They're stable across versions
  • They compose well: ?type=npm,pypi&version=latest&search=weather

Category B: Semantic/Quality Filters (Unbounded Set)

These represent judgments or complex conditions:

  • uses_template_variables - requires parsing package config
  • has_git_repository - requires checking repository field
  • production_ready - subjective, requires multiple conditions
  • enterprise_safe - highly subjective, requires external data

These should be named presets because:

  • They're subjective or complex
  • They may require external data or computation
  • They change over time as standards evolve
  • They benefit from curation

Concrete Proposal for This PR

For immediate merge:

  1. Enhance the type parameter to support multiple values:

    ?type=npm              # Single type
    ?type=npm,pypi,nuget   # Multiple types (OR logic)
    
  2. Document this as the "simple field filter" pattern:

    # In OpenAPI spec
    - name: type
      in: query
      description: |
        Filter by distribution type. Supports multiple comma-separated values for OR logic.
        
        Example values: 'remote', 'npm', 'pypi', 'oci', 'nuget', 'mcpb'
        
        Examples:
        - ?type=npm                    # Only npm packages
        - ?type=npm,pypi               # npm OR pypi packages
        - ?type=remote                 # Only remote servers
        - ?type=npm,pypi,nuget         # Any of npm, pypi, or nuget
  3. Add implementation note in docs/design/:

    "The registry supports two types of filtering: simple field filters (like type, version) that map directly to data model properties, and named presets (future) for complex composite queries. This keeps the API simple while allowing extensibility."

For follow-up issues:

  1. Create issue: "Design named preset/profile filter system"
  2. Create issue: "Should remote expand to sse,streamable-http?" (per review comment)

Performance and Scaling Considerations

One concern with generic filtering is performance. The predefined approach handles this well:

Simple field filters:

  • Map to database indexes
  • Can be optimized in SQL
  • Predictable query patterns

Named presets:

  • Can be materialized views
  • Can be cached aggressively
  • Can be optimized per-preset
  • Can evolve without API changes

Generic filter languages (JSONPath, RSQL):

  • Unpredictable query patterns
  • Difficult to index effectively
  • May require full table scans
  • Hard to rate-limit fairly

Given that the registry expects "a lot of API requests from ETL jobs," performance predictability is important.


Does This Meet @rdimitrov's Needs?

Looking at the original motivation:

"The following PR adds a query parameter for searching by the distribution type (remote, npm, nuget, etc). This is so registry clients supporting nuget can query just for those (less load on the registry server + less load on the client to go through all results)."

Yes, this approach fully solves the problem:

  • NuGet clients can use ?type=nuget
  • It's efficient (database-level filtering)
  • It reduces load on both server and client
  • It's simple and obvious

The enhancement to support ?type=npm,pypi,nuget makes it even more flexible for clients that support multiple ecosystems.


Comparison to Other Registry APIs

Let me check what precedents exist in practice:

NPM Registry (v3):

  • Removed OData filtering
  • Uses simple query params: ?text=search, ?size=20
  • No complex filtering - clients ETL and process locally

PyPI (JSON API):

  • Simple endpoints: /pypi/{package}/json
  • No filtering API - clients scrape the entire simple index
  • Focus on bulk downloads

Docker Hub:

  • Simple params: ?name=foo, ?is_official=true
  • No complex query language
  • Pagination-focused

NuGet:

  • Migrated away from OData in v3
  • Search API with simple parameters
  • Focus on text search and pagination

Pattern: Major package registries converged on simple parameters + client-side processing. None use complex filter languages at scale.


My Recommendation Summary

For this PR: ✅ Approve with enhancement

  • Add comma-separated value support: ?type=npm,pypi
  • Keep validation of individual types
  • Document the pattern clearly

Future direction: 🎯 Two-tier filtering

  • Tier 1: Simple field filters for core properties (type, version, search)
  • Tier 2: Named presets for complex queries (preset=production-ready)

Rationale:

  1. ✅ Solves immediate NuGet client need
  2. ✅ Aligns with ecosystem architecture (bulk ETL, not ad-hoc queries)
  3. ✅ Matches patterns from successful package registries
  4. ✅ Non-breaking and extensible
  5. ✅ Performance-predictable
  6. ✅ Simple developer experience
  7. ✅ Respects API freeze constraints

What I don't recommend:

  • ❌ Generic filter languages (JSONPath, RSQL, OData) - overkill and performance unpredictable
  • ❌ GraphQL - wrong tool for this use case
  • ❌ Unlimited field filtering - creates precedent problems

Implementation Notes

If you enhance this PR to support comma-separated types, here's what needs to change:

  1. Update internal/database/database.go - Change ConfigType *string to ConfigTypes []string
  2. Update internal/database/postgres.go - Use = ANY($n) instead of = $n in SQL
  3. Update internal/api/handlers/v0/servers.go - Parse comma-separated values
  4. Update tests - Add test cases for multiple types
  5. Update OpenAPI spec - Document comma-separated format

This is a small, isolated change that significantly improves composability without adding complexity.


Does this analysis help clarify the path forward? I'm happy to dive deeper into any aspect or help with implementation details if needed.


@tadasant
Copy link
Member

I don't agree with Claude's suggestion to split into 2 functionalities (would rather we just focus on filter as a key), FWIW.

@jonathanhefner
Copy link
Member

Rather than type, perhaps category? Then servers could automatically be placed in zero or more categories (e.g., ["remote", "sse", "streamable-http"]). And we could add arbitrary categories in the future.

Though that does introduce the question of whether ?category= with multiple categories should be a union versus an intersection.

@tadasant
Copy link
Member

Rather than type, perhaps category? Then servers could automatically be placed in zero or more categories (e.g., ["remote", "sse", "streamable-http"]). And we could add arbitrary categories in the future.

Though that does introduce the question of whether ?category= with multiple categories should be a union versus an intersection.

@jonathanhefner did you see my suggestion of filter instead of type? Curious what you think of filter.

I don't love category personally, because category is typically a mutually exclusive designator (items can only be in one category). tags would be more typically like what you are proposing (item can be in multiple), but indeed the comma-separation being union or intersection is a very real problem.

I do think making the key filter and allowing custom values solves all these problems (a sub-registry could implement remote-and-npm as a filter just as easily as remote-or-npm). Maybe eventually (or even out of the gate) we make a list of "recommended filter values to implement".

@jonathanhefner
Copy link
Member

did you see my suggestion of filter instead of type? Curious what you think of filter.

I don't love category personally, because category is typically a mutually exclusive designator (items can only be in one category).

I was interpreting filter as mutually exclusive. 😆 But also, I was interpreting filter as manually defined rather than automatically defined. For example, if we add support for a new package registry, would we automatically have a new filter for that registry? For category or tag, I would assume "yes". If the answer for filter is also "yes", then I think that sounds good! 😃

@tadasant
Copy link
Member

For example, if we add support for a new package registry, would we automatically have a new filter for that registry? For category or tag, I would assume "yes". If the answer for filter is also "yes", then I think that sounds good! 😃

Hmm - I think my position would be that I don't think we should automatically register these values (whether they be categories, tags, or filters) and rather either leave them completely free form, or manually provide guidance (for example, start with saying remote, npm, pypi, oci, nuget, and mcpb are the recommended values to implement; but not required for spec compliance). We could "automate" the registration if we find this list becomes hard to maintain for some reason.

But yes, I think whether its categories, tags, or filters; we could do some automatic registration scheme all the same at some point if we want to.

@rdimitrov
Copy link
Member Author

Hey, sorry for being out for a bit and not replying.

I usually like the direction of making things flexible, but I’m a bit worried about leaving values undefined in the registry spec.

Whether that's type or filter or some other parameter we add, if we do it without specifying how its values are interpreted we’re opening the door to divergence between implementations. One registry might interpret it one way, another registry in a completely different way (same goes for clients) and suddenly clients have to implement special-case logic for each registry. We can see how this adds fragmentation and is probably the opposite of what we want from a “standard” API.

In my view the API spec is effectively the contract we are creating for the ecosystem and I think it's fair to be more explicit and deterministic about how some of these things are defined and expected to work. In a way it's our responsibility to ensure clients have predictable behavior, registries have clear requirements and we keep interoperability intact.

I imagine @BobDickinson can chime in here too since it is related to the problem he's helping us solve on the server.json schema side.

@tadasant
Copy link
Member

tadasant commented Dec 11, 2025

Hey, sorry for being out for a bit and not replying.

I usually like the direction of making things flexible, but I’m a bit worried about leaving values undefined in the registry spec.

Whether that's type or filter or some other parameter we add, if we do it without specifying how its values are interpreted we’re opening the door to divergence between implementations. One registry might interpret it one way, another registry in a completely different way (same goes for clients) and suddenly clients have to implement special-case logic for each registry. We can see how this adds fragmentation and is probably the opposite of what we want from a “standard” API.

In my view the API spec is effectively the contract we are creating for the ecosystem and I think it's fair to be more explicit and deterministic about how some of these things are defined and expected to work. In a way it's our responsibility to ensure clients have predictable behavior, registries have clear requirements and we keep interoperability intact.

I imagine @BobDickinson can chime in here too since it is related to the problem he's helping us solve on the server.json schema side.

What is your current view on what the right design is that addresses (or deliberately trades off) all the concerns outlined to-date?

I think the following works quite well:

  • filter= is the query parameter key
  • It takes a comma separated string (that implies OR logic)
  • We provide example values of sse, streamable, npm, pypi, oci, nuget, and mcpb
  • Implementation guidance is that it is compliant to add a new value if you need some combination of something different (e.g. "streamable HTTP of all server.json's of schema version 2025-10-17 or later"). If it's generalizable enough for other folks, open a PR with it as another recommended example.

I think this works because:

  • It provides foundational structure (the filter field) that registry implementors are encouraged to rally around.
  • We continue to follow MCP's general principle of empowering use cases in the wild before trying to standardize them across vendors (example values that implementors can choose to extend as they nudge forward the set of use cases they're solving for)
  • We do not create a long-tail bottleneck of maintainers adding one-off filter combinations to the spec [this is the big one I feel fairly strongly that we need some solution for] and block implementors on being spec-compliant on this bottleneck when they have legitimate long tail filter scenarios they want to enable

@rdimitrov
Copy link
Member Author

rdimitrov commented Dec 12, 2025

Yeah, of course, let me sanity-check my understanding of the filter direction as I think it touches a broader question about what we consider part of the API contract.

I also want to acknowledge that this argument applies to my type proposal as well (so not exactly in ones favour or the other 😃). Overall because we haven’t standardised the package types yet values like npm and pypi are only examples and aren’t meaningful from a spec/contract perspective unless they’re required (which they are not). Adding a loosely defined property like filter on top feels like it compounds the ambiguity (type at least implies using the available package types thus why I think I prefer it more)

So up in the thread when we say things like “define the key but not the values” or “recommended but not required values” I'm cautious to support it as it seems to imply:

  • registries can be compliant while ignoring or redefining filter=npm
  • clients can’t rely on consistent cross-registry behavior
  • the spec introduces one more parameter without deterministic semantics

In my mind it's not spec compliant if an implementation changes while the spec stays the same and the result is broken interoperability. For example, if tomorrow we decide to rename npm to node that would break clients and probably sub-registries all over the place while still being “spec-compliant” which is usually a sign that the contract wasn’t well specified in the first place.

I’m all in and very supportive of keeping things simple and extensible but this makes me wonder what “spec-compliant” is actually guaranteeing here? Are we comfortable with a situation where clients and registries cannot work together, yet still claim spec compliance?

If we agree on that then whether we have type and/or filter properties it's okay from my side since we should explicitly define the allowed values (or their subset at least). In any case I don't think we should loose a lot of time discussing this and would rather prefer to make a decision so we can move forward, but it would help if we acknowledge our stance on what spec compliant means and what not because this defines what a breaking change means too.

@tadasant
Copy link
Member

Yeah, of course, let me sanity-check my understanding of the filter direction as I think it touches a broader question about what we consider part of the API contract.

I also want to acknowledge that this argument applies to my type proposal as well (so not exactly in ones favour or the other 😃). Overall because we haven’t standardised the package types yet values like npm and pypi are only examples and aren’t meaningful from a spec/contract perspective unless they’re required (which they are not). Adding a loosely defined property like filter on top feels like it compounds the ambiguity (type at least implies using the available package types thus why I think I prefer it more)

So up in the thread when we say things like “define the key but not the values” or “recommended but not required values” I'm cautious to support it as it seems to imply:

  • registries can be compliant while ignoring or redefining filter=npm
  • clients can’t rely on consistent cross-registry behavior
  • the spec introduces one more parameter without deterministic semantics

In my mind it's not spec compliant if an implementation changes while the spec stays the same and the result is broken interoperability. For example, if tomorrow we decide to rename npm to node that would break clients and probably sub-registries all over the place while still being “spec-compliant” which is usually a sign that the contract wasn’t well specified in the first place.

I’m all in and very supportive of keeping things simple and extensible but this makes me wonder what “spec-compliant” is actually guaranteeing here? Are we comfortable with a situation where clients and registries cannot work together, yet still claim spec compliance?

If we agree on that then whether we have type and/or filter properties it's okay from my side since we should explicitly define the allowed values (or their subset at least). In any case I don't think we should loose a lot of time discussing this and would rather prefer to make a decision so we can move forward, but it would help if we acknowledge our stance on what spec compliant means and what not because this defines what a breaking change means too.

@rdimitrov all fair points, you've probably convinced me that filter=* with no further definition is probably the wrong path forward, primarily for this point:

In my mind it's not spec compliant if an implementation changes while the spec stays the same and the result is broken interoperability. For example, if tomorrow we decide to rename npm to node that would break clients and probably sub-registries all over the place while still being “spec-compliant” which is usually a sign that the contract wasn’t well specified in the first place.

How about this, modifying my above proposal:

  • filter= is the query parameter key
  • It takes a comma separated string (that implies OR logic)
  • We provide example reserved values of sse, streamable, npm, pypi, oci, nuget, and mcpb. Sub-registries SHOULD implement these. If they choose to support these values, they MUST implement them in alignment with how our official spec treats them.
  • Sub-registries MAY implement additional filter values, but MUST vendor prefix them if they do so. e.g. com.pulsemcp%2Fsse-and-streamable, com.pulsemcp%2F2025-12-11-or-newer

If we find sub-registries using patterns often, we can lift those patterns into the reserved non-prefixed values over time.

@rdimitrov
Copy link
Member Author

Update: So after our last community meeting we decided we'll go with the last proposal by @tadasant above, i.e. have a filter property that supports a reserved set of values + allow other registry implementations to extend on top.

I'll rebase my implementation to reflect that 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Discover remote servers (filter by type)

5 participants