-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Problem
In GitHub Agentic Workflows, agents receive pre-populated GitHub data before the MCP gateway starts enforcing DIFC policies. This data is fetched via the gh CLI during workflow setup jobs (search_issues, search_prs, activation steps) and delivered through environment variables, temporary files, and prompt interpolation — none of which passes through the guard system.
An agent operating under a policy like {"allow-only":{"repos":["org/repo"],"min-integrity":"approved"}} may receive unrestricted data from gh issue list or gh pr list that would otherwise be filtered.
Proposed Solution
A read-only HTTP filtering proxy that intercepts gh CLI API calls and applies the same DIFC guard policies enforced by the MCP gateway. The proxy reuses the existing GitHub guard WASM module.
Design document: docs/GH_CLI_PROXY_DESIGN.md on branch design/gh-cli-proxy
Key Design Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Interception | GH_HOST redirect to local HTTP server |
No TLS complexity; gh-only scope |
| Guard reuse | Same WASM module, no call_backend |
Guard gets full API responses inline |
| Binary | awmg proxy subcommand |
Single binary, shared init |
| API coverage | REST + GraphQL in V1 | Core gh commands use GraphQL |
| Write ops | Passthrough | Writes handled by MCP gateway safe-outputs |
| Pagination | Backfill filtered pages | Maintain expected per_page counts |
| Unknown endpoints | Conservative labels, coarse filtering | Fail-safe |
Research Findings
How gh CLI talks to GitHub API
- Uses Go
net/http— respectsHTTPS_PROXYandGH_HOST(confirmed experimentally) GH_HOST=localhost:PORTredirects all API calls to local server (adds/api/v3/prefix)- Mix of REST and GraphQL:
gh issue list/gh pr listuse GraphQL;gh search/gh apiuse REST - Auth via
Authorization: token ghp_...header
Pre-population data flows in gh-aw workflows
search_issues/search_prsparallel jobs fetch data before agent starts- Data propagated via
GH_AW_*env vars,/tmp/gh-aw/files, prompt interpolation safeinputs-ghtool wraps authenticatedghcommands at agent runtime- None of this currently passes through DIFC enforcement
Guard architecture compatibility
WasmGuard.LabelResource(toolName, args)andLabelResponse(toolName, result)are transport-agnostic- Proxy maps REST URLs + GraphQL queries → same tool names the guard already recognizes
BackendCallercan be a no-op stub (ErrNotSupported) since guard receives full API responsesFilterCollection()+Evaluatorfrominternal/difcare directly reusable
API coverage gap
The gh CLI covers more endpoints than the GitHub MCP server (~50 tools). Unmapped endpoints get conservative labels (broadest secrecy, lowest integrity) to fail-safe. Key gaps: Actions runs/workflows, commit comparisons, git trees/blobs, org members, collaborators.
Implementation Phases
- Core proxy — HTTP forwarding, path rewriting, write passthrough
- Route mapping + coarse filtering — REST + GraphQL → tool names,
LabelResourcechecks - Response filtering + pagination backfill —
LabelResponse,FilterCollection, page backfill - Integration —
awmg proxysubcommand, workflow integration, smoke tests - Advanced — Agent messaging headers, streaming filtering, metrics
Shared Packages (direct import from MCPG)
internal/guard—WasmGuard,BackendCallerinternal/difc—Evaluator,FilterCollection, label typesinternal/config—GuardPolicy,AllowOnlyPolicyinternal/logger—JSONLLogger,LogDifcFilteredItem
Open Question
- gh CLI caching: Does
ghcache API responses locally? Initial research suggests no persistent caching, but needs verification during Phase 1.