Skip to content

Conversation

@snomiao
Copy link
Member

@snomiao snomiao commented Oct 20, 2025

Summary

Implements automated test evidence checking for PRs in desktop and ComfyUI repos, solving issue #61 in a smarter way.

Changes

  • ✅ Created app/tasks/gh-test-evidence/gh-test-evidence.ts task
  • ✅ Scans open PRs in Comfy-Org/desktop and comfyanonymous/ComfyUI
  • ✅ Uses GPT-4o-mini (cheaper, faster) to analyze PR bodies
  • ✅ Detects test explanations, screenshots, and videos
  • ✅ Posts warning comments when evidence is missing
  • ✅ Auto-updates comments when PR changes
  • ✅ Deletes comments when all evidence is present
  • ✅ Follows ComfyUI_frontend workflow message format
  • ✅ Added comprehensive test file

Smart Improvements

  1. Efficient AI model: Uses GPT-4o-mini instead of GPT-4o for faster, cheaper analysis
  2. Clean architecture: Follows existing task patterns (coreping, gh-bounty)
  3. Idempotent: Re-analyzes only when PR updates
  4. Smart comment management: Updates existing comments instead of spamming
  5. Database tracking: Uses MongoDB to track task state
  6. Type-safe: Full TypeScript with Zod validation
  7. Bot marker: Uses HTML comment marker for identifying bot comments

Testing

  • Added gh-test-evidence.spec.ts with test structure
  • Follows project test patterns
  • Tests cover: draft PRs, missing evidence, complete evidence, comment updates

Workflow Integration

Added to app/tasks/run-gh-tasks.ts to run on schedule with other GitHub tasks.

Closes #61

🤖 Generated with Claude Code

Implements automated test evidence checking for PRs in desktop and ComfyUI repos.

- Creates gh-test-evidence task to scan open PRs
- Uses GPT-4o-mini to analyze PR bodies for test evidence
- Posts warning comments when test explanations or visual proof are missing
- Auto-updates or deletes comments based on PR changes
- Follows the same comment pattern as ComfyUI_frontend workflow

Resolves #61

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings October 20, 2025 18:15
@vercel
Copy link

vercel bot commented Oct 20, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
comfy-pr Ready Ready Preview Comment Nov 4, 2025 11:24pm

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements automated test evidence checking for pull requests in the Comfy-Org/desktop and comfyanonymous/ComfyUI repositories. The solution uses GPT-4o-mini to analyze PR descriptions for test explanations, screenshots, and videos, then posts/updates/deletes warning comments based on what evidence is present.

Key changes:

  • New automated task that scans open PRs and validates test evidence using AI
  • Smart comment management system that updates existing comments instead of creating duplicates
  • Database-backed state tracking to avoid redundant analysis

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
app/tasks/run-gh-tasks.ts Registers the new test evidence task and reformats existing task entries for consistency
app/tasks/gh-test-evidence/gh-test-evidence.ts Core implementation of the test evidence checker with OpenAI integration, comment management, and database persistence
app/tasks/gh-test-evidence/gh-test-evidence.spec.ts Test suite structure with mocked dependencies for validating the task behavior

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

snomiao and others added 2 commits October 21, 2025 04:12
…dd CI cleanup

- Extract main logic into runCorePingTask() function for better testability
- Add isCI check to properly close DB and exit in CI environments
- Add todo comment about deprecating custom webhook types in favor of @octokit/webhooks-types
- Add llm-api, @keyv/mongo, and @octokit/webhooks-types dependencies
- Remove trailing whitespace

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Resolve conflicts in coreping.ts by keeping new refactored structure
- Resolve conflicts in run-gh-tasks.ts by including all task imports
- Resolve conflicts in package.json by keeping both new dependencies
- Accept incoming bun.lock changes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
snomiao and others added 2 commits October 22, 2025 13:50
Switch from gpt-4o-mini to gpt-5-mini for analyzing PR test evidence.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Fixed typo 'Explaination' -> 'Explanation' throughout the codebase:
- Updated schema field name in TestEvidenceSchema
- Updated all references in code and tests
- Updated OpenAI prompt and JSON schema
- Updated warning message generation

Addresses review comments from Copilot.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Member Author

@snomiao snomiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @copilot! All spelling corrections from 'Explaination' to 'Explanation' have been addressed in commit 317d1ce. The fixes include:

  • TestEvidenceSchema field name
  • All code references
  • OpenAI prompt and JSON schema
  • Warning message generation
  • Test files

Corrects zod version that was accidentally changed during merge from ^4.0.5 to ^4.0.0.
This should resolve the Vercel build failure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…ubRepoUrl

The function parseUrlRepoOwner does not exist in @/src/parseOwnerRepo.
The correct function name is parseGithubRepoUrl.

This fixes the TypeScript compilation error that was causing the Vercel build to fail.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@socket-security
Copy link

socket-security bot commented Oct 30, 2025

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Added@​octokit/​webhooks-types@​7.6.11001008778100
Added@​biomejs/​biome@​2.3.29410010099100
Added@​keyv/​mongo@​3.0.5100100100100100

View full report

@socket-security
Copy link

socket-security bot commented Oct 30, 2025

All alerts resolved. Learn more about Socket for GitHub.

This PR previously contained dependency changes with security issues that have been resolved, removed, or ignored.

View full report

Replace bun:mock with MSW (Mock Service Worker) for more realistic HTTP mocking:
- Mock GitHub API endpoints (pulls, comments) and OpenAI API
- Add proper MSW server lifecycle (beforeAll, afterEach, afterAll)
- Mock database module to avoid MongoDB connection in tests
- All tests passing (4/4)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
snomiao and others added 3 commits October 30, 2025 06:07
…nation'

Address Copilot review feedback to use singular form 'explanation' for consistency.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Use the centralized MSW setup from @/src/test/msw-setup instead of duplicating server configuration. This addresses the review comment to use the unified MSW setup.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@snomiao
Copy link
Member Author

snomiao commented Oct 30, 2025

Fixed spelling from 'explanations' to 'explanation' in b963f43

@snomiao
Copy link
Member Author

snomiao commented Oct 30, 2025

Refactored to use unified MSW setup from @/src/test/msw-setup in 0800956

Comment on lines 2 to 5
type S = GithubApiComponents["schemas"];
// todo(sno): deprecate this and use @octokit/webhooks-types
export type WEBHOOK_EVENTS = {
branch_protection_configuration: S[`webhook-branch-protection-configuration${string}` & keyof S];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets' remove this file

Updated OpenAI model from invalid 'gpt-5-mini' to correct 'gpt-4o-mini'
for test evidence analysis.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test Evidence

2 participants