Skip to content

Design: gh CLI DIFC filtering proxy #2093

@lpcox

Description

@lpcox

Problem

In GitHub Agentic Workflows, agents receive pre-populated GitHub data before the MCP gateway starts enforcing DIFC policies. This data is fetched via the gh CLI during workflow setup jobs (search_issues, search_prs, activation steps) and delivered through environment variables, temporary files, and prompt interpolation — none of which passes through the guard system.

An agent operating under a policy like {"allow-only":{"repos":["org/repo"],"min-integrity":"approved"}} may receive unrestricted data from gh issue list or gh pr list that would otherwise be filtered.

Proposed Solution

A read-only HTTP filtering proxy that intercepts gh CLI API calls and applies the same DIFC guard policies enforced by the MCP gateway. The proxy reuses the existing GitHub guard WASM module.

Design document: docs/GH_CLI_PROXY_DESIGN.md on branch design/gh-cli-proxy

Key Design Decisions

Decision Choice Rationale
Interception GH_HOST redirect to local HTTP server No TLS complexity; gh-only scope
Guard reuse Same WASM module, no call_backend Guard gets full API responses inline
Binary awmg proxy subcommand Single binary, shared init
API coverage REST + GraphQL in V1 Core gh commands use GraphQL
Write ops Passthrough Writes handled by MCP gateway safe-outputs
Pagination Backfill filtered pages Maintain expected per_page counts
Unknown endpoints Conservative labels, coarse filtering Fail-safe

Research Findings

How gh CLI talks to GitHub API

  • Uses Go net/http — respects HTTPS_PROXY and GH_HOST (confirmed experimentally)
  • GH_HOST=localhost:PORT redirects all API calls to local server (adds /api/v3/ prefix)
  • Mix of REST and GraphQL: gh issue list/gh pr list use GraphQL; gh search/gh api use REST
  • Auth via Authorization: token ghp_... header

Pre-population data flows in gh-aw workflows

  • search_issues/search_prs parallel jobs fetch data before agent starts
  • Data propagated via GH_AW_* env vars, /tmp/gh-aw/ files, prompt interpolation
  • safeinputs-gh tool wraps authenticated gh commands at agent runtime
  • None of this currently passes through DIFC enforcement

Guard architecture compatibility

  • WasmGuard.LabelResource(toolName, args) and LabelResponse(toolName, result) are transport-agnostic
  • Proxy maps REST URLs + GraphQL queries → same tool names the guard already recognizes
  • BackendCaller can be a no-op stub (ErrNotSupported) since guard receives full API responses
  • FilterCollection() + Evaluator from internal/difc are directly reusable

API coverage gap

The gh CLI covers more endpoints than the GitHub MCP server (~50 tools). Unmapped endpoints get conservative labels (broadest secrecy, lowest integrity) to fail-safe. Key gaps: Actions runs/workflows, commit comparisons, git trees/blobs, org members, collaborators.

Implementation Phases

  1. Core proxy — HTTP forwarding, path rewriting, write passthrough
  2. Route mapping + coarse filtering — REST + GraphQL → tool names, LabelResource checks
  3. Response filtering + pagination backfillLabelResponse, FilterCollection, page backfill
  4. Integrationawmg proxy subcommand, workflow integration, smoke tests
  5. Advanced — Agent messaging headers, streaming filtering, metrics

Shared Packages (direct import from MCPG)

  • internal/guardWasmGuard, BackendCaller
  • internal/difcEvaluator, FilterCollection, label types
  • internal/configGuardPolicy, AllowOnlyPolicy
  • internal/loggerJSONLLogger, LogDifcFilteredItem

Open Question

  • gh CLI caching: Does gh cache API responses locally? Initial research suggests no persistent caching, but needs verification during Phase 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions