Implement GraphQL-based issue fetching for 91% API call reduction #28
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements efficient GraphQL-based issue fetching to replace multiple REST API calls with a single GraphQL query, reducing API calls by up to 91% and significantly improving latency for issue processing.
Problem
The previous implementation used REST API calls with the following performance issues:
For an issue with 250 comments, this resulted in ~11 REST API calls and significant cumulative latency.
Solution
This PR introduces a GraphQL-based fetching system with:
Core Components
New
graphql_client.pymodule with:GraphQLExecutor: Wrapper around PyGithub's GraphQL API with retry logicIssueSnapshot: Immutable dataclass representing complete issue stateCommentNode: Dataclass for type-safe comment representationKey Features
Single Request Fetching: Fetches issue metadata, labels, and up to 100 comments in one GraphQL query
Smart Pagination with Early Exit: Stops fetching additional pages as soon as both the disabled marker and AI evaluation comment are found, preventing unnecessary API calls.
Exponential Backoff Retry: Handles transient errors (502/503) with configurable retry logic:
Full Backward Compatibility: All functions support both
Issue(PyGithub) andIssueSnapshottypes using Union types:Performance Improvements
Latency: 40-70% reduction for multi-page issues due to single round-trip fetching.
Technical Details
GraphQL Query
The implementation uses a single optimized query that fetches:
Integration Points
get_github_issue(): Now uses GraphQL by defaultget_pygithub_issue(): Helper for operations requiring PyGithub Issue (mutations)Union[Issue, IssueSnapshot]Error Handling
Testing
Migration & Compatibility
No breaking changes: All existing code continues to work without modification. The GraphQL fetching is transparent to consumers of the API.
For maintainers: No configuration changes required. GraphQL is used automatically.
For contributors: New code should use
IssueSnapshotwhere possible for consistency.Documentation
Dependencies
No new dependencies required. Uses existing
PyGithub>=2.7.0with built-in GraphQL support viarequester.graphql_query().Related Issue
Closes #[issue_number]
Implements the complete specification from the GraphQL Issue Fetch Optimization issue, including:
Original prompt
This section details on the original issue you should resolve
<issue_title>Use graphql to get issue context</issue_title>
<issue_description># GraphQL Issue Fetch Optimization Specification
User Story
As the repo agent, I want to fetch an issue and all of its (potentially paginated) comments plus core metadata (labels, author, state, last updated) in one network round trip using the GitHub GraphQL API so that I reduce latency, API quota consumption, and race conditions compared to multiple REST calls.
Current Problem
Current flow (REST, via PyGithub):
issue.get_comments()), each page up to 100 comments.Issues:
Goals
Non-Goals (Initial)
Data Requirements
For single issue processing:
GraphQL Query (Single Issue)
Batching Strategy
hasNextPagetrue and disabling marker not yet found and AI comment not yet found AND we need full scan => fetch subsequent pages lazily until criteria satisfied (stop early once found for either). Each additional fetch is still one request; worst case factor = ceil(totalComments / 100).Data Model Abstraction
Introduce
IssueSnapshotdataclass:Integration Points
get_github_issueandissue.get_comments()with new functiongraphql_fetch_issue(owner, repo, number, need_full_comment_scan: bool).Library & Auth
We will leverage the existing PyGithub client instead of introducing
requestsorhttpx.Approach:
Githubinstance:gh = Github(token).gh.graphql(query_string, **variables); PyGithub injects the requiredAuthorization: bearer <token>header and targets the standardhttps://api.github.com/graphqlendpoint (no extra env var needed unless supporting GHES; in that case PyGithub can be initialized with a custombase_url).Wrapper (conceptual):
Rate limit / cost m...
💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.