- 
                Notifications
    You must be signed in to change notification settings 
- Fork 134
Proposal to refactor the MCP Registry controller #2301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
    
      
        
          +322
        
        
          −0
        
        
          
        
      
    
  
  
     Merged
                    Changes from all commits
      Commits
    
    
            Show all changes
          
          
            9 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      abc4960
              
                draft
              
              
                dmartinol ef84812
              
                draft
              
              
                dmartinol baa626b
              
                Plan for moving registry source handling logic to thv-registry-api
              
              
                dmartinol f531c34
              
                renamed to match validation pattern
              
              
                dmartinol c336698
              
                reviewed API paths
              
              
                dmartinol 0035ce5
              
                Merge branch 'stacklok:main' into refactor_proposal
              
              
                dmartinol fd98768
              
                Merge branch 'main' into refactor_proposal
              
              
                dmartinol f2fbdbe
              
                added option 4 for sync updates
              
              
                dmartinol 51a162e
              
                changed to PUT /registry/sync
              
              
                dmartinol File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
        
          
          
            322 changes: 322 additions & 0 deletions
          
          322 
        
  docs/proposals/THV-2301-refactor-mcpregistry-controller.md
  
  
      
      
   
        
      
      
    
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,322 @@ | ||
| # Refactor MCPRegistry controller | ||
| ## Requirement Overview | ||
|  | ||
| The MCPRegistry operator uses the `MCPRegistry` CRD to deploy Registry API servers pre-populated with curated MCP server definitions. The controller fetches registry data from multiple source types (`ConfigMap`, Git repositories, or external Registry APIs), applies filtering rules, and stores the result in a `ConfigMap` mounted by the Registry API server. | ||
|  | ||
| Currently, these data source management capabilities exist only in the Kubernetes operator. However, ToolHive users typically start with the CLI before deploying to Kubernetes. To provide a consistent experience, we need to move data source management into `thv-registry-api` itself, making these features available in both CLI and Kubernetes environments. | ||
|  | ||
| **Affected features:** | ||
| - Data source configuration (ConfigMap, Git, Registry API) | ||
| - Periodic and on-demand synchronization | ||
| - Tag and name filtering | ||
|  | ||
| This proposal describes moving functionality from the `thv-operator` module ([toolhive](https://github.com/stacklok/toolhive)) to the `thv-registry-api` module ([toolhive-registry-server](https://github.com/stacklok/toolhive-registry-server)). | ||
|  | ||
| ### Design Constraints | ||
|  | ||
| This refactoring maintains feature parity without introducing new capabilities: | ||
|  | ||
| - **API stability**: The `MCPRegistry` CRD spec remains unchanged | ||
| - **Status evolution**: Status field changes will be documented if required | ||
| - **CLI extension**: New runtime flags may be added to `thv-registry-api` | ||
| - **Storage**: File-based storage will be implemented; database storage is out of scope | ||
|  | ||
| ## Current status | ||
| ### Data source configuration | ||
| `MCPRegistry` resources define data sources and filtering rules in the `spec.source` and `spec.filter` fields: | ||
| ```yaml | ||
| spec: | ||
| source: | ||
| type: configmap | ||
| configmap: | ||
| name: minimal-registry-data | ||
| syncPolicy: | ||
| interval: "30m" | ||
| filter: | ||
| tags: | ||
| include: ["database", "production"] | ||
| exclude: ["experimental", "deprecated", "beta"] | ||
| ``` | ||
| ### Current Implementation | ||
| The `MCPRegistry` controller implements data source management in these operator packages: | ||
| | Package| Purpose | | ||
| |--------|---------| | ||
| |`cmd/thv-operator/pkg/sources`|`SourceHandler` interface and implementations for `ConfigMap`, Git, and Registry API sources; StorageManager implementation using `ConfigMap` | | ||
| |`cmd/thv-operator/pkg/httpclient`|HTTP client for Registry API sources| | ||
| |`cmd/thv-operator/pkg/git`|Git repository operations| | ||
| |`cmd/thv-operator/pkg/sync`|`Manager` interface for periodic and on-demand synchronization| | ||
| |`cmd/thv-operator/pkg/filtering`|`FilterService` interface and implementation for tag and name filtering| | ||
|  | ||
| These packages run exclusively within the operator controller with no external dependencies. | ||
|  | ||
| ### Controller Responsibilities | ||
| The controller currently handles two distinct concerns: | ||
| 1. Deployment management: Creates and maintains Registry API server Deployments and Services | ||
| 1. Data synchronization: Fetches, filters, and stores registry data in `ConfigMap`s | ||
|  | ||
| After deploying the Registry API server, the controller does not communicate with it. Instead, the controller directly fetches registry data from the configured source (Git, ConfigMap, or remote API) and stores it in a `ConfigMap`. The deployed Registry API server reads from this same `ConfigMap` to serve requests from external clients. | ||
|  | ||
| ### Status Reporting | ||
| The MCPRegistry status tracks deployment, synchronization and storage state information. | ||
|  | ||
| #### API deployment status | ||
| ```yaml | ||
| apiStatus: | ||
| endpoint: http://thv-git-api.toolhive-system:8080 | ||
| phase: Ready | ||
| readySince: "2025-10-22T08:30:38Z" | ||
| ``` | ||
| #### Synchronization status | ||
| ```yaml | ||
| syncStatus: | ||
| phase: Complete | ||
| lastSyncTime: "2025-10-22T08:30:27Z" | ||
| serverCount: 88 | ||
| lastSyncHash: c013b6b36286ab... | ||
| ``` | ||
| #### Storage reference | ||
| ```yaml | ||
| storageRef: | ||
| type: configmap | ||
| configMapRef: | ||
| name: thv-git-registry-storage | ||
| ``` | ||
| #### Standard conditions | ||
| ```yaml | ||
| conditions: | ||
| - type: APIReady | ||
| status: "True" | ||
| lastTransitionTime: "2025-10-22T08:30:38Z" | ||
| ``` | ||
|  | ||
| ## Refactoring Proposal | ||
|  | ||
| ### Component Responsibilities | ||
|  | ||
| **Controller:** | ||
|         
                  jhrozek marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
| - Deploy and manage Registry API (`Deployment`, `Service`) | ||
| - Create and update configuration `ConfigMap` from `MCPRegistry` spec | ||
|         
                  jhrozek marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
| - Monitor deployment health and update `apiStatus` | ||
| - Trigger manual syncs via Registry API endpoint | ||
| - Update `syncStatus`, see [Sync Status Updates](#sync-status-updates) options | ||
|  | ||
| **Registry API:** | ||
| - Watch configuration file for changes | ||
| - Execute periodic and manual synchronization | ||
| - Fetch, filter, and store registry data | ||
| - Expose sync status via HTTP endpoint | ||
|  | ||
| ### Configuration Management | ||
|  | ||
| **Flow:** | ||
| 1. Controller creates `ConfigMap` `<registry-name>-config` containing `MCPRegistry` spec | ||
|         
                  jhrozek marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
| 2. `ConfigMap` mounted at `/config/spec.yaml` in Registry API pod | ||
| 3. Registry API launched with `--config /config/spec.yaml` flag | ||
| 4. Registry API watches file and triggers sync on changes | ||
|  | ||
| ### Storage | ||
|  | ||
| **File-based (Phase 1):** | ||
| - Registry API stores data at path defined by `--storage-path` flag | ||
| - Controller sets `--storage-path /data` using mounted `emptyDir` volume | ||
|         
                  dmartinol marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
| - Data format: ToolHive registry JSON schema (unchanged from current) | ||
|         
                  dmartinol marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
| - Persistence: Data survives pod restarts, lost on deployment deletion | ||
|  | ||
| **Database-backed (Future):** Deferred to separate proposal for multi-replica support. | ||
|         
                  dmartinol marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
|  | ||
| ### Sync Status Updates | ||
|  | ||
| **Options:** | ||
|  | ||
| **Option 1 - HTTP Polling:** | ||
| - Registry API exposes `GET /registry/status` returning sync history, stored in the configured storage path | ||
| - Controller polls every 60 seconds and updates `syncStatus` | ||
| - Pros: Simple, no new CRDs, works for CLI and Kubernetes | ||
| - Cons: 60-second status latency | ||
|         
                  dmartinol marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
|  | ||
| **Option 2 - Event CRD:** | ||
| - New `SyncCompletionEvent` CRD created by Registry API on sync completion | ||
| - Controller watches events and updates `MCPRegistry.status.syncStatus` | ||
| - Pros: Real-time updates | ||
| - Cons: Requires RBAC, adds CRD complexity, Kubernetes-only | ||
|  | ||
| **Option 3 - Remove syncStatus:** | ||
| - Remove `syncStatus` from `MCPRegistry` entirely | ||
| - Users query Registry API `GET /registry/status` directly for sync information | ||
| - Pros: Simplest implementation | ||
| - Cons: Breaking change, loses kubectl visibility | ||
|  | ||
| **Option 4 - Hybrid polling (Recommended):** | ||
| - Registry API exposes `GET /registry/status` returning sync history, stored in the configured storage path | ||
| - Controller polls every 60 seconds and creates an instance of the new `SyncCompletionEvent` CRD on sync completion | ||
| - Controller watches events and updates `MCPRegistry.status.syncStatus` | ||
| - Pros: Anticipate changes for the final solution (option 2) | ||
| - Cons: Adds CRD complexity, 60-second status latency | ||
|  | ||
| **Recommendation:** Option 4 provides best balance of simplicity and functionality. | ||
|  | ||
| **Status endpoint response:** | ||
| ```json | ||
| { | ||
| "lastSync": { | ||
| "type": "automatic", | ||
| "startTime": "2025-10-22T08:30:38Z", | ||
| "completionTime": "2025-10-22T08:32:38Z", | ||
| "hash": "c013b6b36286ab...", | ||
| "serverCount": 88, | ||
| "result": "completed", | ||
| "attemptCount": 0, | ||
| "errorMessage": null | ||
| } | ||
| } | ||
| ``` | ||
|  | ||
| ### Automatic Sync | ||
|  | ||
| - Registry API uses `time.Ticker` for periodic sync based on `syncPolicy.interval` | ||
| - On each interval: check source hash, sync only if changed | ||
| - Record result in sync history | ||
|         
                  jhrozek marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
|  | ||
| ### Manual Sync | ||
|  | ||
| - Registry API exposes `PUT /registry/sync` endpoint | ||
| - Controller calls endpoint when `toolhive.stacklok.dev/sync-trigger` annotation changes | ||
|         
                  dmartinol marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
| - Endpoint returns 201 (created) or 409 (sync in progress) | ||
| - Sync result recorded in history | ||
|  | ||
| ### Sync Failure Handling | ||
|  | ||
| **Current behavior:** | ||
| - Fixed 5-minute retry interval | ||
| - Unlimited retry attempts | ||
| - `syncStatus.attemptCount` increments on failure, resets on success | ||
|         
                  jhrozek marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
|  | ||
| **Proposed (Phase 1):** | ||
| Match current behavior - Registry API retries every 5 minutes indefinitely. | ||
|  | ||
| **Future enhancement:** | ||
| Add configurable retry policy to `syncPolicy.retryPolicy` (tracked separately). | ||
|  | ||
| ### Data Filtering | ||
|  | ||
| - Registry API applies name and tag filters during sync processing | ||
| - Filter changes trigger immediate sync (hash-based detection) | ||
| - Move `cmd/thv-operator/pkg/filtering` package to `thv-registry-api` | ||
|  | ||
| ### API Status Updates | ||
|  | ||
| Controller continues managing `apiStatus` with no changes: | ||
| - Monitors Deployment and Service health | ||
| - Updates phase: NotStarted, Deploying, Ready, Unhealthy, Error | ||
| - Existing `apiStatus` structure preserved | ||
|  | ||
| ## Implementation Plan | ||
|  | ||
| ### Phase 1: Move Data Source Management to Registry API | ||
|  | ||
| **Goal:** Full-functioning solution with sync logic in `thv-registry-api`, maintaining backward compatibility with existing MCPRegistry resources. | ||
|  | ||
| **Task Order Rationale:** Registry API must be enhanced first before operator changes, to avoid breaking existing deployments. | ||
|         
                  dmartinol marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
|  | ||
| #### Registry API Development | ||
|  | ||
| 1. **Add configuration file support** | ||
| - Implement `--config` flag in `thv-registry-api serve` command | ||
| - Add YAML configuration parsing for `source`, `syncPolicy`, and `filter` fields | ||
| - Implement file watcher for configuration changes (using `fsnotify` or similar) | ||
| - Graceful reload on configuration changes | ||
|  | ||
| 2. **Add file-based storage** | ||
| - Implement `--storage-path` flag for registry data location | ||
| - Add file-based storage backend matching current ConfigMap JSON format | ||
| - Atomic file writes (temp file + rename pattern) | ||
| - Load existing data on startup if present | ||
|         
                  dmartinol marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
|  | ||
| 3. **Port source handler packages** | ||
| - Copy `cmd/thv-operator/pkg/sources` to `thv-registry-api` | ||
| - Copy `cmd/thv-operator/pkg/httpclient` to `thv-registry-api` | ||
| - Copy `cmd/thv-operator/pkg/git` to `thv-registry-api` | ||
| - Adapt packages to work without Kubernetes client dependencies | ||
| - Update import paths | ||
|  | ||
| 4. **Port sync manager** | ||
| - Copy `cmd/thv-operator/pkg/sync` to `thv-registry-api` | ||
| - Adapt sync logic to use file-based storage instead of ConfigMap | ||
| - Implement retry logic (5-minute interval, unlimited attempts) | ||
| - Remove Kubernetes-specific dependencies | ||
|         
                  dmartinol marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
|  | ||
| 5. **Port filtering logic** | ||
| - Copy `cmd/thv-operator/pkg/filtering` to `thv-registry-api` | ||
| - Keep filtering interface and implementation unchanged | ||
| - Update import paths | ||
|  | ||
| 6. **Implement automatic sync** | ||
| - Add `time.Ticker` based periodic sync using `syncPolicy.interval` | ||
| - Implement hash-based change detection before syncing | ||
| - Store sync history in file at storage path | ||
|  | ||
| 7. **Add sync status endpoint** | ||
| - Implement `GET /registry/status` returning sync history | ||
| - Return format matching proposed JSON schema | ||
| - Read from sync history file | ||
|         
                  dmartinol marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
|  | ||
| 8. **Add manual sync endpoint** | ||
| - Implement `POST /registry/sync` for manual sync triggering | ||
| - Return 202 (accepted) or 409 (sync in progress) | ||
| - Prevent concurrent syncs | ||
|  | ||
| 9. **Add integration tests** | ||
| - Test configuration file watching and reload | ||
| - Test all source types (ConfigMap, Git, API) | ||
| - Test filtering logic | ||
| - Test automatic and manual sync | ||
| - Test retry behavior on failures | ||
|  | ||
| #### Operator Updates | ||
|  | ||
| 10. **Update operator to use new Registry API** | ||
| - Modify MCPRegistry controller to create configuration ConfigMap | ||
| - Update Deployment spec to mount config ConfigMap at `/config/spec.yaml` | ||
| - Update Deployment spec to mount emptyDir volume at `/data` | ||
| - Add `--config /config/spec.yaml --storage-path /data` flags to registry-api container | ||
| - Add polling logic to call `GET /registry/status` every 60 seconds (depends on [selected update option](#sync-status-updates)) | ||
| - Update `syncStatus` from polling results | ||
|  | ||
| 11. **Add manual sync trigger via API** | ||
| - Detect `toolhive.stacklok.dev/sync-trigger` annotation changes | ||
| - Call `POST /registry/sync` on Registry API endpoint | ||
| - Update `lastManualSyncTrigger` after successful API call | ||
|  | ||
| 12. **Remove sync logic from operator** | ||
| - Remove sync execution code from MCPRegistry controller (keep status update only) | ||
| - Mark `cmd/thv-operator/pkg/sources` as deprecated (keep for reference) | ||
| - Mark `cmd/thv-operator/pkg/sync` as deprecated (keep for reference) | ||
| - Mark `cmd/thv-operator/pkg/filtering` as deprecated (keep for reference) | ||
|  | ||
| 13. **Update operator integration tests** | ||
| - Update tests to verify controller creates config ConfigMap | ||
| - Test controller polls Registry API status endpoint | ||
| - Test manual sync via annotation | ||
| - Test backward compatibility with existing MCPRegistry resources | ||
|  | ||
| 14. **Update documentation** | ||
| - Document new Registry API flags and endpoints | ||
| - Update MCPRegistry CRD examples | ||
| - Add migration guide for users (if any operator upgrade steps needed) | ||
|  | ||
| ### Phase 2: Future Enhancements | ||
|         
                  jhrozek marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
|  | ||
| **Goal:** Improvements deferred from Phase 1 to keep initial scope manageable. | ||
|  | ||
| 1. **Configurable retry policy** | ||
| - Add `spec.syncPolicy.retryPolicy` field to MCPRegistry CRD | ||
| - Support `maxAttempts`, `retryInterval`, `backoff` configuration | ||
| - Implement in Registry API sync logic | ||
| - Update operator to pass retry config via ConfigMap | ||
|  | ||
| 2. **Real-time status updates (if needed)** | ||
| - Evaluate if 60-second polling latency is acceptable | ||
| - If not: Implement Option 2 (Event CRD) from proposal | ||
| - Add `SyncCompletionEvent` CRD | ||
| - Update Registry API to create events | ||
| - Update controller to watch events | ||
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Uh oh!
There was an error while loading. Please reload this page.