Skip to content

Commit cb19d07

Browse files
ryanaiagentxuanyang15
authored andcommitted
fix: Optimize Stale Agent with GraphQL and Search API to resolve 429 Quota errors
Merge #3700 ### Description This PR refactors the `adk_stale_agent` to address `429 RESOURCE_EXHAUSTED` errors encountered during workflow execution. The previous implementation was inefficient in fetching issue history (using pagination over the REST API) and lacked server-side filtering, causing excessive API calls and huge token consumption that breached Gemini API quotas. The new implementation switches to a **GraphQL-first approach**, implements server-side filtering via the Search API, adds robust concurrency controls, and significantly improves code maintainability through modular refactoring. ### Root Cause of Failure The previous workflow failed with the following error due to passing too much context to the LLM and processing too many irrelevant issues: ```text google.genai.errors.ClientError: 429 RESOURCE_EXHAUSTED. Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_paid_tier_input_token_count ``` ### Key Changes #### 1. Optimization: REST → GraphQL (`agent.py`) * **Old:** Fetched issue comments and timeline events using multiple paginated REST API calls (`/timeline`). * **New:** Implemented `get_issue_state` using a single **GraphQL** query. This fetches comments, `userContentEdits`, and specific timeline events (Labels, Renames) in one network request. * **Refactoring:** The complex analysis logic has been decomposed into focused helper functions (_fetch_graphql_data, _build_history_timeline, _replay_history_to_find_state) for better readability and testing. * **Configurable:** Added GRAPHQL_COMMENT_LIMIT and GRAPHQL_TIMELINE_LIMIT settings to tune context depth * **Impact:** Drastically reduces the data payload size and eliminates multiple API round-trips, significantly lowering the token count sent to the LLM. #### 2. Optimization: Server-Side Filtering (`utils.py`) * **Old:** Fetched *all* open issues via REST and filtered them in Python memory. * **New:** Uses the GitHub Search API (`get_old_open_issue_numbers`) with `created:<DATE` syntax. * **Impact:** Only fetches issue numbers that actually meet the age threshold, preventing the agent from wasting cycles and tokens on brand-new issues. #### 3. Concurrency & Rate Limiting (`main.py` & `settings.py`) * **Old:** Sequential execution loop. * **New:** Implemented `asyncio.gather` with a configurable `CONCURRENCY_LIMIT` (set to 3). * **New:** Added `urllib3` retry strategies (exponential backoff) in `utils.py` to handle GitHub API rate limits (HTTP 429) gracefully. #### 4. Logic Improvements ("Ghost Edits") * **New Feature:** The agent now detects "Ghost Edits" (where an author updates the issue description without posting a new comment). * **Action:** If a silent edit is detected on a stale candidate, the agent now alerts maintainers instead of marking it stale, preventing false positives. ### File Comparison Summary | File | Change | | :--- | :--- | | `main.py` | Switched from `InMemoryRunner` loop to `asyncio` chunked processing. Added execution timing and API usage logging. | | `agent.py` | Replaced REST logic with GraphQL query. Added logic to handle silent body edits. Decomposed giant get_issue_state into helper functions with docstrings. Added _format_days helper. | | `utils.py` | Added `HTTPAdapter` with Retries. Added `get_old_open_issue_numbers` using Search API. | | `settings.py` | Removed `ISSUES_PER_RUN`; added configuration for CONCURRENCY_LIMIT, SLEEP_BETWEEN_CHUNKS, and GraphQL limits. | | `PROMPT_INSTRUCTIONS.txt` | Simplified decision tree; removed date calculation responsibility from LLM. | ### Verification The new logic minimizes token usage by offloading date calculations to Python and strictly limiting the context passed to the LLM to semantic intent analysis (e.g., "Is this a question?"). * **Metric Check:** The workflow now tracks API calls per issue to ensure we stay within limits. * **Safety:** Silent edits by users now correctly reset the "Stale" timer. * **Maintainability:** All complex logic is now isolated in typed helper functions with comprehensive docstrings. Co-authored-by: Xuan Yang <xygoogle@google.com> COPYBARA_INTEGRATE_REVIEW=#3700 from ryanaiagent:feat/improve-stale-agent 888064e PiperOrigin-RevId: 838885530
1 parent 2a1a41d commit cb19d07

File tree

7 files changed

+930
-393
lines changed

7 files changed

+930
-393
lines changed

.github/workflows/stale-bot.yml

Lines changed: 6 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,57 +1,43 @@
1-
# .github/workflows/stale-issue-auditor.yml
2-
3-
# Best Practice: Always have a 'name' field at the top.
41
name: ADK Stale Issue Auditor
52

6-
# The 'on' block defines the triggers.
73
on:
8-
# The 'workflow_dispatch' trigger allows manual runs.
94
workflow_dispatch:
105

11-
# The 'schedule' trigger runs the bot on a timer.
126
schedule:
13-
# This runs at 6:00 AM UTC (e.g., 10 PM PST).
7+
# This runs at 6:00 AM UTC (10 PM PST)
148
- cron: '0 6 * * *'
159

16-
# The 'jobs' block contains the work to be done.
1710
jobs:
18-
# A unique ID for the job.
1911
audit-stale-issues:
20-
# The runner environment.
2112
runs-on: ubuntu-latest
13+
timeout-minutes: 60
2214

23-
# Permissions for the job's temporary GITHUB_TOKEN.
24-
# These are standard and syntactically correct.
2515
permissions:
2616
issues: write
2717
contents: read
2818

29-
# The sequence of steps for the job.
3019
steps:
3120
- name: Checkout repository
32-
uses: actions/checkout@v4
21+
uses: actions/checkout@v5
3322

3423
- name: Set up Python
35-
uses: actions/setup-python@v5
24+
uses: actions/setup-python@v6
3625
with:
3726
python-version: '3.11'
3827

3928
- name: Install dependencies
40-
# The '|' character allows for multi-line shell commands.
4129
run: |
4230
python -m pip install --upgrade pip
4331
pip install requests google-adk
4432
4533
- name: Run Auditor Agent Script
46-
# The 'env' block for setting environment variables.
4734
env:
4835
GITHUB_TOKEN: ${{ secrets.ADK_TRIAGE_AGENT }}
4936
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
50-
OWNER: google
37+
OWNER: ${{ github.repository_owner }}
5138
REPO: adk-python
52-
ISSUES_PER_RUN: 100
39+
CONCURRENCY_LIMIT: 3
5340
LLM_MODEL_NAME: "gemini-2.5-flash"
5441
PYTHONPATH: contributing/samples
5542

56-
# The final 'run' command.
5743
run: python -m adk_stale_agent.main
Lines changed: 64 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,68 @@
1-
You are a highly intelligent and transparent repository auditor for '{OWNER}/{REPO}'.
2-
Your job is to analyze all open issues and report on your findings before taking any action.
1+
You are a highly intelligent repository auditor for '{OWNER}/{REPO}'.
2+
Your job is to analyze a specific issue and report findings before taking action.
33

44
**Primary Directive:** Ignore any events from users ending in `[bot]`.
5-
**Reporting Directive:** For EVERY issue you analyze, you MUST output a concise, human-readable summary, starting with "Analysis for Issue #[number]:".
5+
**Reporting Directive:** Output a concise summary starting with "Analysis for Issue #[number]:".
6+
7+
**THRESHOLDS:**
8+
- Stale Threshold: {stale_threshold_days} days.
9+
- Close Threshold: {close_threshold_days} days.
610

711
**WORKFLOW:**
8-
1. **Context Gathering**: Call `get_repository_maintainers` and `get_all_open_issues`.
9-
2. **Per-Issue Analysis**: For each issue, call `get_issue_state`, passing in the maintainers list.
10-
3. **Decision & Reporting**: Based on the summary from `get_issue_state`, follow this strict decision tree in order.
11-
12-
--- **DECISION TREE & REPORTING TEMPLATES** ---
13-
14-
**STEP 1: CHECK FOR ACTIVITY (IS THE ISSUE ACTIVE?)**
15-
- **Condition**: Was the last human action NOT from a maintainer? (i.e., `last_human_commenter_is_maintainer` is `False`).
16-
- **Action**: The author or a third party has acted. The issue is ACTIVE.
17-
- **Report and Action**: If '{STALE_LABEL_NAME}' is present, report: "Analysis for Issue #[number]: Issue is ACTIVE. The last action was a [action type] by a non-maintainer. To get the [action type], you MUST use the value from the 'last_human_action_type' field in the summary you received from the tool." Action: Removing stale label and then call `remove_label_from_issue` with the label name '{STALE_LABEL_NAME}'. Otherwise, report: "Analysis for Issue #[number]: Issue is ACTIVE. No stale label to remove. Action: None."
18-
- **If this condition is met, stop processing this issue.**
19-
20-
**STEP 2: IF PENDING, MANAGE THE STALE LIFECYCLE.**
21-
- **Condition**: The last human action WAS from a maintainer (`last_human_commenter_is_maintainer` is `True`). The issue is PENDING.
22-
- **Action**: You must now determine the correct state.
23-
24-
- **First, check if the issue is already STALE.**
25-
- **Condition**: Is the `'{STALE_LABEL_NAME}'` label present in `current_labels`?
26-
- **Action**: The issue is STALE. Your only job is to check if it should be closed.
27-
- **Get Time Difference**: Call `calculate_time_difference` with the `stale_label_applied_at` timestamp.
28-
- **Decision & Report**: If `hours_passed` > **{CLOSE_HOURS_AFTER_STALE_THRESHOLD}**: Report "Analysis for Issue #[number]: STALE. Close threshold met ({CLOSE_HOURS_AFTER_STALE_THRESHOLD} hours) with no author activity." Action: Closing issue and then call `close_as_stale`. Otherwise, report "Analysis for Issue #[number]: STALE. Close threshold not yet met. Action: None."
29-
30-
- **ELSE (the issue is PENDING but not yet stale):**
31-
- **Analyze Intent**: Semantically analyze the `last_maintainer_comment_text`. Is it either a question, a request for information, a suggestion, or a request for changes?
32-
- **If YES (it is either a question, a request for information, a suggestion, or a request for changes)**:
33-
- **CRITICAL CHECK**: Now, you must verify the author has not already responded. Compare the `last_author_event_time` and the `last_maintainer_comment_time`.
34-
- **IF the author has NOT responded** (i.e., `last_author_event_time` is older than `last_maintainer_comment_time` or is null):
35-
- **Get Time Difference**: Call `calculate_time_difference` with the `last_maintainer_comment_time`.
36-
- **Decision & Report**: If `hours_passed` > **{STALE_HOURS_THRESHOLD}**: Report "Analysis for Issue #[number]: PENDING. Stale threshold met ({STALE_HOURS_THRESHOLD} hours)." Action: Marking as stale and then call `add_stale_label_and_comment` and if label name '{REQUEST_CLARIFICATION_LABEL}' is missing then call `add_label_to_issue` with the label name '{REQUEST_CLARIFICATION_LABEL}'. Otherwise, report: "Analysis for Issue #[number]: PENDING. Stale threshold not met. Action: None."
37-
- **ELSE (the author HAS responded)**:
38-
- **Report**: "Analysis for Issue #[number]: PENDING, but author has already responded to the last maintainer request. Action: None."
39-
- **If NO (it is not a request):**
40-
- **Report**: "Analysis for Issue #[number]: PENDING. Maintainer's last comment was not a request. Action: None."
12+
1. **Context Gathering**: Call `get_issue_state`.
13+
2. **Decision**: Follow this strict decision tree using the data returned by the tool.
14+
15+
--- **DECISION TREE** ---
16+
17+
**STEP 1: CHECK IF ALREADY STALE**
18+
- **Condition**: Is `is_stale` (from tool) **True**?
19+
- **Action**:
20+
- **Check Role**: Look at `last_action_role`.
21+
22+
- **IF 'author' OR 'other_user'**:
23+
- **Context**: The user has responded. The issue is now ACTIVE.
24+
- **Action 1**: Call `remove_label_from_issue` with '{STALE_LABEL_NAME}'.
25+
- **Action 2 (ALERT CHECK)**: Look at `maintainer_alert_needed`.
26+
- **IF True**: User edited description silently.
27+
-> **Action**: Call `alert_maintainer_of_edit`.
28+
- **IF False**: User commented normally. No alert needed.
29+
- **Report**: "Analysis for Issue #[number]: ACTIVE. User activity detected. Removed stale label."
30+
31+
- **IF 'maintainer'**:
32+
- **Check Time**: Check `days_since_stale_label`.
33+
- **If `days_since_stale_label` > {close_threshold_days}**:
34+
- **Action**: Call `close_as_stale`.
35+
- **Report**: "Analysis for Issue #[number]: STALE. Close threshold met. Closing."
36+
- **Else**:
37+
- **Report**: "Analysis for Issue #[number]: STALE. Waiting for close threshold. No action."
38+
39+
**STEP 2: CHECK IF ACTIVE (NOT STALE)**
40+
- **Condition**: `is_stale` is **False**.
41+
- **Action**:
42+
- **Check Role**: If `last_action_role` is 'author' or 'other_user':
43+
- **Context**: The issue is Active.
44+
- **Action (ALERT CHECK)**: Look at `maintainer_alert_needed`.
45+
- **IF True**: The user edited the description silently, and we haven't alerted yet.
46+
-> **Action**: Call `alert_maintainer_of_edit`.
47+
-> **Report**: "Analysis for Issue #[number]: ACTIVE. Silent update detected (Description Edit). Alerted maintainer."
48+
- **IF False**:
49+
-> **Report**: "Analysis for Issue #[number]: ACTIVE. Last action was by user. No action."
50+
51+
- **Check Role**: If `last_action_role` is 'maintainer':
52+
- **Proceed to STEP 3.**
53+
54+
**STEP 3: ANALYZE MAINTAINER INTENT**
55+
- **Context**: The last person to act was a Maintainer.
56+
- **Action**: Read the text in `last_comment_text`.
57+
- **Question Check**: Does the text ask a question, request clarification, ask for logs, or suggest trying a fix?
58+
- **Time Check**: Is `days_since_activity` > {stale_threshold_days}?
59+
60+
- **DECISION**:
61+
- **IF (Question == YES) AND (Time == YES)**:
62+
- **Action**: Call `add_stale_label_and_comment`.
63+
- **Check**: If '{REQUEST_CLARIFICATION_LABEL}' is not in `current_labels`, call `add_label_to_issue` for it.
64+
- **Report**: "Analysis for Issue #[number]: STALE. Maintainer asked question [days_since_activity] days ago. Marking stale."
65+
- **IF (Question == YES) BUT (Time == NO)**:
66+
- **Report**: "Analysis for Issue #[number]: PENDING. Maintainer asked question, but threshold not met yet. No action."
67+
- **IF (Question == NO)** (e.g., "I am working on this"):
68+
- **Report**: "Analysis for Issue #[number]: ACTIVE. Maintainer gave status update (not a question). No action."

contributing/samples/adk_stale_agent/README.md

Lines changed: 55 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,89 @@
11
# ADK Stale Issue Auditor Agent
22

3-
This directory contains an autonomous agent designed to audit a GitHub repository for stale issues, helping to maintain repository hygiene and ensure that all open items are actionable.
3+
This directory contains an autonomous, **GraphQL-powered** agent designed to audit a GitHub repository for stale issues. It maintains repository hygiene by ensuring all open items are actionable and responsive.
44

5-
The agent operates as a "Repository Auditor," proactively scanning all open issues rather than waiting for a specific trigger. It uses a combination of deterministic Python tools and the semantic understanding of a Large Language Model (LLM) to make intelligent decisions about the state of a conversation.
5+
Unlike traditional "Stale Bots" that only look at timestamps, this agent uses a **Unified History Trace** and an **LLM (Large Language Model)** to understand the *context* of a conversation. It distinguishes between a maintainer asking a question (stale candidate) vs. a maintainer providing a status update (active).
66

77
---
88

99
## Core Logic & Features
1010

11-
The agent's primary goal is to identify issues where a maintainer has requested information from the author, and to manage the lifecycle of that issue based on the author's response (or lack thereof).
11+
The agent operates as a "Repository Auditor," proactively scanning open issues using a high-efficiency decision tree.
1212

13-
**The agent follows a precise decision tree:**
13+
### 1. Smart State Verification (GraphQL)
14+
Instead of making multiple expensive API calls, the agent uses a single **GraphQL** query per issue to reconstruct the entire history of the conversation. It combines:
15+
* **Comments**
16+
* **Description/Body Edits** ("Ghost Edits")
17+
* **Title Renames**
18+
* **State Changes** (Reopens)
1419

15-
1. **Audits All Open Issues:** On each run, the agent fetches a batch of the oldest open issues in the repository.
16-
2. **Identifies Pending Issues:** It analyzes the full timeline of each issue to see if the last human action was a comment from a repository maintainer.
17-
3. **Semantic Intent Analysis:** If the last comment was from a maintainer, the agent uses the LLM to determine if the comment was a **question or a request for clarification**.
18-
4. **Marks as Stale:** If the maintainer's question has gone unanswered by the author for a configurable period (e.g., 7 days), the agent will:
19-
* Apply a `stale` label to the issue.
20-
* Post a comment notifying the author that the issue is now considered stale and will be closed if no further action is taken.
21-
* Proactively add a `request clarification` label if it's missing, to make the issue's state clear.
22-
5. **Handles Activity:** If any non-maintainer (the author or a third party) comments on an issue, the agent will automatically remove the `stale` label, marking the issue as active again.
23-
6. **Closes Stale Issues:** If an issue remains in the `stale` state for another configurable period (e.g., 7 days) with no new activity, the agent will post a final comment and close the issue.
20+
It sorts these events chronologically to determine the **Last Active Actor**.
2421

25-
### Self-Configuration
22+
### 2. The "Last Actor" Rule
23+
The agent follows a precise logic flow based on who acted last:
2624

27-
A key feature of this agent is its ability to self-configure. It does not require a hard-coded list of maintainer usernames. On each run, it uses the GitHub API to dynamically fetch the list of users with write access to the repository, ensuring its logic is always based on the current team.
25+
* **If Author/User acted last:** The issue is **ACTIVE**.
26+
* This includes comments, title changes, and *silent* description edits.
27+
* **Action:** The agent immediately removes the `stale` label.
28+
* **Silent Update Alert:** If the user edited the description but *did not* comment, the agent posts a specific alert: *"Notification: The author has updated the issue description..."* to ensure maintainers are notified (since GitHub does not trigger notifications for body edits).
29+
* **Spam Prevention:** The agent checks if it has already alerted about a specific silent edit to avoid spamming the thread.
30+
31+
* **If Maintainer acted last:** The issue is **POTENTIALLY STALE**.
32+
* The agent passes the text of the maintainer's last comment to the LLM.
33+
34+
### 3. Semantic Intent Analysis (LLM)
35+
If the maintainer was the last person to speak, the LLM analyzes the comment text to determine intent:
36+
* **Question/Request:** "Can you provide logs?" / "Please try v2.0."
37+
* **Verdict:** **STALE** (Waiting on Author).
38+
* **Action:** If the time threshold is met, the agent adds the `stale` label. It also checks for the `request clarification` label and adds it if missing.
39+
* **Status Update:** "We are working on a fix." / "Added to backlog."
40+
* **Verdict:** **ACTIVE** (Waiting on Maintainer).
41+
* **Action:** No action taken. The issue remains open without stale labels.
42+
43+
### 4. Lifecycle Management
44+
* **Marking Stale:** After `STALE_HOURS_THRESHOLD` (default: 7 days) of inactivity following a maintainer's question.
45+
* **Closing:** After `CLOSE_HOURS_AFTER_STALE_THRESHOLD` (default: 7 days) of continued inactivity while marked stale.
46+
47+
---
48+
49+
## Performance & Safety
50+
51+
* **GraphQL Optimized:** Fetches comments, edits, labels, and timeline events in a single network request to minimize latency and API quota usage.
52+
* **Search API Filtering:** Uses the GitHub Search API to pre-filter issues created recently, ensuring the bot doesn't waste cycles analyzing brand-new issues.
53+
* **Rate Limit Aware:** Includes intelligent sleeping and retry logic (exponential backoff) to handle GitHub API rate limits (HTTP 429) gracefully.
54+
* **Execution Metrics:** Logs the time taken and API calls consumed for every issue processed.
2855

2956
---
3057

3158
## Configuration
3259

33-
The agent is configured entirely via environment variables, which should be set as secrets in the GitHub Actions workflow environment.
60+
The agent is configured via environment variables, typically set as secrets in GitHub Actions.
3461

3562
### Required Secrets
3663

3764
| Secret Name | Description |
3865
| :--- | :--- |
39-
| `GITHUB_TOKEN` | A GitHub Personal Access Token (PAT) with the required permissions. It's recommended to use a PAT from a dedicated "bot" account.
40-
| `GOOGLE_API_KEY` | An API key for the Google AI (Gemini) model used for the agent's reasoning.
41-
42-
### Required PAT Permissions
43-
44-
The `GITHUB_TOKEN` requires the following **Repository Permissions**:
45-
* **Issues**: `Read & write` (to read issues, add labels, comment, and close)
46-
* **Administration**: `Read-only` (to read the list of repository collaborators/maintainers)
66+
| `GITHUB_TOKEN` | A GitHub Personal Access Token (PAT) or Service Account Token with `repo` scope. |
67+
| `GOOGLE_API_KEY` | An API key for the Google AI (Gemini) model used for reasoning. |
4768

4869
### Optional Configuration
4970

50-
These environment variables can be set in the workflow file to override the defaults in `settings.py`.
71+
These variables control the timing thresholds and model selection.
5172

5273
| Variable Name | Description | Default |
5374
| :--- | :--- | :--- |
54-
| `STALE_HOURS_THRESHOLD` | The number of hours of inactivity after a maintainer's question before an issue is marked as `stale`. | `168` (7 days) |
55-
| `CLOSE_HOURS_AFTER_STALE_THRESHOLD` | The number of hours after being marked `stale` before an issue is closed. | `168` (7 days) |
56-
| `ISSUES_PER_RUN` | The maximum number of oldest open issues to process in a single workflow run. | `100` |
57-
| `LLM_MODEL_NAME`| LLM model to use. | `gemini-2.5-flash` |
75+
| `STALE_HOURS_THRESHOLD` | Hours of inactivity after a maintainer's question before marking as `stale`. | `168` (7 days) |
76+
| `CLOSE_HOURS_AFTER_STALE_THRESHOLD` | Hours after being marked `stale` before the issue is closed. | `168` (7 days) |
77+
| `LLM_MODEL_NAME`| The specific Gemini model version to use. | `gemini-2.5-flash` |
78+
| `OWNER` | Repository owner (auto-detected in Actions). | (Environment dependent) |
79+
| `REPO` | Repository name (auto-detected in Actions). | (Environment dependent) |
5880

5981
---
6082

6183
## Deployment
6284

63-
To deploy this agent, a GitHub Actions workflow file (`.github/workflows/stale-bot.yml`) is included. This workflow runs on a daily schedule and executes the agent's main script.
85+
To deploy this agent, a GitHub Actions workflow file (`.github/workflows/stale-bot.yml`) is recommended.
86+
87+
### Directory Structure Note
88+
Because this agent resides within the `adk-python` package structure, the workflow must ensure the script is executed correctly to handle imports.
6489

65-
Ensure the necessary repository secrets are configured and the `stale` and `request clarification` labels exist in the repository.

0 commit comments

Comments
 (0)