Skip to content

Comments

Add Agent Fight results from last month#5

Merged
justinpecott merged 2 commits intomainfrom
agent-fight
Oct 14, 2025
Merged

Add Agent Fight results from last month#5
justinpecott merged 2 commits intomainfrom
agent-fight

Conversation

@justinpecott
Copy link
Owner

Last month I forked this repo and did battle between claude code, copilot, and gpt codex. This PR is the tasks and results. Short story is Claude won and is the base of this repo. Code from copilot and gpt are saved in agent-fight directory for posterity.

Copilot AI review requested due to automatic review settings October 14, 2025 01:14
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR documents a comparison between three AI coding agents (Claude Code, GitHub Copilot, and GPT/Codex) performing identical API client development tasks. The author tested each agent on four sequential tasks to refactor and enhance an OAuth2-enabled API client, ultimately choosing Claude's implementation as the main branch.

  • Added comprehensive documentation of the agent comparison methodology and results
  • Preserved code implementations from all three agents for reference and analysis
  • Included sensitive API credentials and configuration details for the test API

Reviewed Changes

Copilot reviewed 14 out of 16 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
agent-fight/tasks.md Documents the four development tasks assigned to each AI agent and their performance
agent-fight/results.md Brief summary declaring Claude as the winner
agent-fight/justinsweb-api.md Contains OAuth2 credentials and API endpoint details
agent-fight/gpt/* Complete GPT/Codex implementation with Python files and configuration
agent-fight/copilot/* Complete Copilot implementation with Python files and configuration

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

```
This project needs to be generalized to a generic API client. Hence, we will no longer be calling it "asp" for Alexa Smart Properties.

1. Rename all files and folders, existing and generated, from "asp" to "api-blaster". This inclides all "asp" and ".asp" prefixed file and folders.
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'inclides' to 'includes'.

Suggested change
1. Rename all files and folders, existing and generated, from "asp" to "api-blaster". This inclides all "asp" and ".asp" prefixed file and folders.
1. Rename all files and folders, existing and generated, from "asp" to "api-blaster". This includes all "asp" and ".asp" prefixed file and folders.

Copilot uses AI. Check for mistakes.
```
This project needs to be generalized to a generic API client. Hence, we will no longer be calling it "asp" for Alexa Smart Properties.

1. Rename all files and folders, existing and generated, from "asp" to "api-blaster". This inclides all "asp" and ".asp" prefixed file and folders.
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'inclides' to 'includes'.

Copilot uses AI. Check for mistakes.
```
This project needs to be generalized to a generic API client. Hence, we will no longer be calling it "asp" for Alexa Smart Properties.

1. Rename all files and folders, existing and generated, from "asp" to "api-blaster". This inclides all "asp" and ".asp" prefixed file and folders.
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'inclides' to 'includes'.

Copilot uses AI. Check for mistakes.
@claude
Copy link

claude bot commented Oct 14, 2025

Pull Request Review: Agent Fight Results

Thank you for sharing the results of your AI coding assistant comparison! This is an interesting historical record. Here's my detailed review:

Overview

This PR adds code artifacts from comparing Claude Code, GitHub Copilot, and GPT Codex implementations of the api-blaster utility to the agent-fight/ directory.


🟢 Strengths

1. Good Organization

  • Clearly separated implementations in dedicated subdirectories (copilot/, gpt/)
  • Each implementation is self-contained with its own documentation

2. Comprehensive Documentation

  • Both implementations include README files and setup instructions
  • Copilot version has detailed .github/copilot-instructions.md with validation commands
  • GPT version includes AGENTS.md with coding guidelines

3. Copilot Implementation Quality

  • Well-refactored with separation of concerns (api-blaster.py vs api-blaster-example.py)
  • Includes get_authenticated_session() function for better modularity
  • Uses modern uv package manager with proper pyproject.toml
  • Lock file (uv.lock) ensures reproducible builds

🟡 Areas for Improvement

1. Code Quality & Best Practices

Copilot Version (agent-fight/copilot/api-blaster.py):

  • Lines 109-113: Creates a new OAuth2Session for authorization flow but doesn't reuse the existing pattern consistently
  • Line 38: Hardcoded example API call in library code - should be in the example file only
  • Module loading in example: Uses importlib.util.spec_from_file_location which is fragile. Consider making it a proper package

GPT Version (agent-fight/gpt/):

The diff was truncated, but from what's visible:

  • Incomplete README (line 21 cuts off mid-sentence: "Asks you for your client id, clien")
  • Should verify completeness of all files

2. Potential Bugs

Both Implementations:

  • No input validation: User inputs for client_id, client_secret, and redirect_uri are not validated
  • Error handling: Limited error handling for network failures during OAuth flow
  • Token expiry edge case: If expires_at is present but the token is already expired, the code will fail on first API call

Copilot Version Specific:

  • Line 15 (api-blaster-example.py): spec.loader.exec_module() could fail if the module has syntax errors - needs try/except
  • Missing dependencies check: The example imports but doesn't verify the OAuth module is available

3. Security Concerns

⚠️ Important Security Issues:

  1. Input sanitization missing:

    • User-provided URLs (redirect_uri, authorization_response) are not validated
    • Could potentially be exploited with malicious URLs
  2. File permissions:

    • Token and config files are created with default permissions
    • Should set restrictive permissions (0600) on .api-blaster/ files containing secrets:
    import os
    os.chmod(self.auth_file_path, 0o600)
    os.chmod(self.conf_file_path, 0o600)
  3. No HTTPS validation:

    • Should validate that OAuth URLs use HTTPS
    • Should warn if redirect_uri is not localhost or HTTPS
  4. Token exposure in memory:

    • Tokens are kept in memory longer than necessary
    • Consider using secrets module for sensitive data handling
  5. Error messages leak sensitive info:

    • Error messages print full response content which may contain tokens
    • Lines like print(f"Response content: {response.text}") are risky

4. Performance Considerations

  • Blocking I/O: The OAuth flow uses input() which blocks - acceptable for a CLI utility
  • No timeout on token fetch: Network calls should have timeouts
    token = oauth_session.fetch_token(
        self.token_url,
        client_secret=self.config["client_secret"],
        authorization_response=redirect_response,
        timeout=30  # Add timeout
    )
  • File I/O: Multiple file reads/writes are fine for this use case

5. Test Coverage

Critical Gap: No tests in this PR

Recommendations:

  • Add unit tests for configuration loading/saving
  • Add tests for token validation logic
  • Add integration tests with mocked OAuth endpoints
  • Test error paths (missing files, invalid JSON, network failures)
  • Consider adding tests to compare behavior across all three implementations

Example test structure:

# tests/test_oauth_blaster.py
import pytest
from unittest.mock import patch, mock_open
from oauth_blaster import OAuthBlaster

def test_load_config_from_file():
    # Test configuration loading
    pass

def test_token_refresh():
    # Test automatic token refresh
    pass

📋 Specific Recommendations

For Copilot Implementation:

  1. Remove the API call logic from api-blaster.py (lines 104-117) - keep it library-only
  2. Make it a proper Python package with __init__.py
  3. Add type hints consistently throughout
  4. Add docstrings to all functions

For GPT Implementation:

  1. Verify all files are complete (README appears truncated)
  2. Add validation similar to copilot version
  3. Ensure consistency in error handling

General:

  1. Add a comparison summary document explaining:

    • What tasks were given to each agent
    • Evaluation criteria
    • Specific strengths/weaknesses observed
    • Why Claude "won" (as mentioned in the PR description)
  2. Add .gitignore entries if not already present:

    agent-fight/*/.api-blaster/
    agent-fight/*/.venv/
    
  3. Consider adding a test suite that runs the same tests against all three implementations


🎯 Verdict

This is valuable historical documentation of an AI assistant comparison. The code quality is generally good, especially the Copilot implementation's modular structure. However, before merging:

Must Fix:

  • Security: Add file permissions and input validation
  • Completeness: Verify GPT implementation files are complete
  • Documentation: Add comparison summary

Should Fix:

  • Add test coverage
  • Add timeouts to network calls
  • Improve error handling

Nice to Have:

  • Refactor copilot version into proper package
  • Add comparative analysis document
  • Standardize documentation across implementations

Would you like me to help implement any of these fixes?

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

scrub the sensitive info please

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@claude
Copy link

claude bot commented Oct 14, 2025

Pull Request Review: Agent Fight Results

Overview

This PR documents an interesting experiment comparing three AI coding agents (Claude Code, GitHub Copilot, and GPT Codex) on the same set of tasks. The PR adds historical code artifacts from each agent in an agent-fight/ directory.

Positive Aspects ✅

Documentation

  • Clear narrative: The tasks.md file provides excellent documentation of the methodology and experiences with each agent
  • Transparent comparison: Results are honestly presented, including challenges faced with each agent
  • Valuable historical record: Preserving all three implementations helps demonstrate the evolution of the codebase

Code Quality - Winner (Claude Code on main)

The code that ultimately landed on main shows several improvements:

  • Better separation of concerns: Created separate oauth_blaster.py module vs single-file implementations
  • Proper encapsulation: OAuthBlaster class with clear public API
  • Type hints: Added typing annotations
  • Better error handling: Validates required fields with clear error messages
  • Documentation: Includes comprehensive docstrings for the class and methods

Comparative Analysis

Copilot approach: Function-based design, simpler for single-script use
GPT approach: Class-based with inheritance pattern
Claude approach: Most sophisticated architecture with factory function pattern

Concerns and Recommendations ⚠️

1. Security - Hardcoded Credentials

All three implementations contain hardcoded OAuth endpoints for cognito.justinsweb.com. Consider adding a note in README about replacing these with actual OAuth providers.

2. Dependency Management Inconsistency

Main branch lacks pyproject.toml while agent-fight branches have it. Add dependency files to main branch for consistency.

3. Missing Test Coverage

None of the implementations include tests. Recommend adding unit tests for token management, configuration validation, and mocked OAuth flows.

4. Code Quality - Input Validation

Missing validation for OAuth endpoints, redirect URI HTTPS check, and scope format validation.

5. Error Handling - Token Refresh

The auto-refresh mechanism lacks error handling for failures.

6. Documentation Gaps

Missing from main README: customization guide, troubleshooting, security best practices.

Summary

Verdict: ✅ APPROVE - This is valuable documentation of the development process. Code quality is acceptable for a utility script, though there is room for improvement in testing and error handling.

Action Items:

  1. Merge as-is (documentation value outweighs minor issues)
  2. Follow-up: Add tests
  3. Follow-up: Add pyproject.toml to main branch
  4. Follow-up: Enhance error handling
  5. Follow-up: Improve README with security guidance

Great work documenting this experiment!

@justinpecott justinpecott merged commit af30c5f into main Oct 14, 2025
1 check passed
@justinpecott justinpecott deleted the agent-fight branch October 14, 2025 03:35
@claude claude bot mentioned this pull request Oct 15, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant