Skip to content

Conversation

@balaraj74
Copy link

Summary

This PR implements the bi-directional GitHub sync feature as requested in issue #4760 (originally discussed in PR #3565).

Currently, EvalAI only supports one-way sync — changes made in the linked GitHub repository are pulled into EvalAI. However, when challenge organizers make changes through the EvalAI web UI (like updating the description, dates, or evaluation details), those changes are not synced back to the GitHub repository. This creates a mismatch between what's in GitHub and what's actually running on EvalAI.

This PR adds the missing reverse sync capability so that changes made via the EvalAI UI are automatically pushed back to the linked GitHub repository.


What's Changed

New Files Added

  1. apps/challenges/github_interface.py

    • A clean interface class (GithubInterface) for communicating with the GitHub API
    • Handles reading/writing file content via the GitHub Contents API
    • Supports updating challenge_config.yaml with proper field processing
    • Commits are prefixed with evalai_bot: for easy identification
  2. apps/challenges/github_sync_config.py

    • Configuration file defining which fields should be synced
    • Challenge fields: title, description, start_date, end_date, evaluation_details, terms_and_conditions, submission_guidelines, etc.
    • ChallengePhase fields: name, description, start_date, end_date, max_submissions, etc.
  3. apps/challenges/github_utils.py

    • Celery tasks for async sync: github_challenge_sync, github_challenge_phase_sync
    • Request-scoped context management to prevent infinite loops
    • Helper functions to determine which field changed
  4. apps/challenges/migrations/0114_add_github_token_field.py

    • Adds github_token field to the Challenge model
    • This token is used to authenticate GitHub API requests for the reverse sync

Modified Files

  1. apps/challenges/models.py

    • Added github_token field to store the GitHub PAT for each challenge
    • Added post_save signals for both Challenge and ChallengePhase
    • Signals trigger the sync tasks when models are updated
  2. apps/challenges/serializers.py

    • Added github_token to both ChallengeSerializer and ZipChallengeSerializer

How It Works

  1. Challenge organizer configures their challenge with:

    • github_repository (e.g., cloudcv/evalai-starter)
    • github_branch (e.g., master)
    • github_token (GitHub Personal Access Token with repo access)
  2. When the organizer updates a challenge field through the UI:

    • Django's post_save signal fires
    • The signal checks if GitHub sync is configured
    • If yes, it queues a Celery task to sync the change
  3. The Celery task:

    • Fetches the current challenge_config.yaml from GitHub
    • Updates only the changed field
    • Commits the change with a descriptive message like:
      evalai_bot: Update challenge_config.yaml - changed field: description

Loop Prevention

Several safeguards are in place to prevent infinite sync loops:

  • Skip on create: Only updates trigger sync, not initial creation
  • Per-request deduplication: Each model instance is only synced once per request
  • GitHub source detection: Changes originating from GitHub are not synced back
  • Bot commit prefix: Commits from EvalAI are prefixed with evalai_bot: for identification

Testing

To test this feature:

  1. Create a challenge with a linked GitHub repository
  2. Add a GitHub Personal Access Token with repo scope
  3. Update a challenge field (e.g., description) via the UI
  4. Check the GitHub repository — you should see a new commit with the change

Related Issues


Checklist

  • Added new github_token field with migration
  • Implemented GithubInterface for GitHub API communication
  • Added Celery tasks for async sync
  • Added post_save signals for Challenge and ChallengePhase
  • Updated serializers to include github_token
  • Added loop prevention mechanisms

This implements the bi-directional GitHub sync feature that allows
changes made through the EvalAI UI to be automatically synced back
to the linked GitHub repository.

Closes Cloud-CV#4760
Copilot AI review requested due to automatic review settings December 3, 2025 06:14
Copilot finished reviewing on behalf of balaraj74 December 3, 2025 06:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements bi-directional GitHub synchronization for challenges, enabling changes made through the EvalAI web UI to be automatically pushed back to linked GitHub repositories. Previously, EvalAI only supported one-way sync (GitHub → EvalAI), creating potential inconsistencies between the repository and the running challenge configuration.

Key Changes:

  • Adds github_token field to Challenge model for storing GitHub Personal Access Tokens
  • Implements Django post_save signals to trigger sync on Challenge and ChallengePhase updates
  • Creates GitHub API interface layer with YAML processing for challenge_config.yaml updates
  • Implements Celery tasks for async synchronization with loop prevention mechanisms

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
apps/challenges/models.py Adds github_token field and post_save signals for Challenge and ChallengePhase to trigger GitHub sync
apps/challenges/serializers.py Exposes github_token in ChallengeSerializer and ZipChallengeSerializer for API access
apps/challenges/migrations/0114_add_github_token_field.py Database migration adding github_token CharField to Challenge model
apps/challenges/github_interface.py New GitHub API client for reading/writing repository files and updating challenge_config.yaml
apps/challenges/github_utils.py Celery tasks and sync triggers with request-scoped context for loop prevention
apps/challenges/github_sync_config.py Configuration defining syncable fields for Challenge and ChallengePhase models

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"cli_version",
"github_repository",
"github_branch",
"github_token",
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The github_token field should be marked as write_only=True in the serializer to prevent it from being exposed in API responses. GitHub Personal Access Tokens are sensitive credentials that should only be accepted on write operations but never returned in GET responses. Consider declaring it explicitly with write_only=True to ensure the token is never leaked through the API.

Copilot uses AI. Check for mistakes.
Comment on lines 165 to 226
@shared_task
def github_challenge_phase_sync(
serialized_challenge_phase, serialized_challenge, changed_field
):
"""
Celery task to sync a ChallengePhase change to GitHub.
Args:
serialized_challenge_phase: JSON serialized ChallengePhase object
serialized_challenge: JSON serialized Challenge object (parent)
changed_field: The field that was changed
"""
try:
challenge_phase = deserialize_object(serialized_challenge_phase)
challenge = deserialize_object(serialized_challenge)

if not challenge_phase or not challenge:
logger.error(
"Failed to deserialize challenge_phase or challenge for GitHub sync"
)
return False

# Verify GitHub config is present on parent challenge
if not challenge.github_repository or not challenge.github_token:
logger.debug(
f"Challenge {challenge.pk} missing github_repository or github_token, "
"skipping phase sync"
)
return False

# Create GitHub interface
github = GithubInterface(
challenge.github_repository,
challenge.github_branch,
challenge.github_token,
)

# Verify repository access
if not github.is_repository():
logger.error(
f"Cannot access GitHub repository: {challenge.github_repository}"
)
return False

# Sync the changed field
success = github.update_challenge_phase_config(challenge_phase, changed_field)
if success:
logger.info(
f"Successfully synced {changed_field} for challenge phase "
f"{challenge_phase.pk}"
)
else:
logger.error(
f"Failed to sync {changed_field} for challenge phase "
f"{challenge_phase.pk}"
)
return success

except Exception as e:
logger.error(f"Error in github_challenge_phase_sync: {str(e)}")
return False

Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new Celery task github_challenge_phase_sync lacks test coverage. This critical functionality should have comprehensive tests covering serialization, GitHub API interaction, error handling, and the various conditional branches. Tests should be added to tests/unit/challenges/ directory.

Copilot uses AI. Check for mistakes.
Comment on lines +322 to +342
@receiver(signals.post_save, sender="challenges.Challenge")
def challenge_details_sync(sender, instance, created, **kwargs):
"""
Signal to sync Challenge changes to GitHub repository.
Triggered on post_save when a Challenge is updated.
"""
# Skip on create - only sync updates
if created:
return

# Skip if no GitHub config
if not instance.github_repository or not instance.github_token:
return

# Import here to avoid circular imports
from challenges.github_utils import trigger_challenge_sync

# Get update_fields from kwargs if available
update_fields = kwargs.get("update_fields")
trigger_challenge_sync(instance, update_fields)

Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new signal challenge_details_sync lacks test coverage. Tests should verify that the signal fires correctly on updates (but not creates), properly checks for GitHub configuration, and triggers the sync task with correct parameters. Add tests to tests/unit/challenges/test_models.py.

Copilot uses AI. Check for mistakes.
Comment on lines 358 to 362
return (
True
if self.update_data_from_path(file_path, new_text, changed_field)
else False
)
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional expression uses a ternary operator that's harder to read. Consider replacing return (True if self.update_data_from_path(file_path, new_text, changed_field) else False) with a clearer if-else statement or a direct return of a boolean check.

Suggested change
return (
True
if self.update_data_from_path(file_path, new_text, changed_field)
else False
)
return self.update_data_from_path(file_path, new_text, changed_field)

Copilot uses AI. Check for mistakes.
success = self.update_data_from_path(
"challenge_config.yaml", yaml_content, changed_field
)
return True if success else False
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant boolean conversion: return True if success else False can be simplified to return bool(success) or just return success is not None since the function should return a boolean indicating success or failure.

Suggested change
return True if success else False
return bool(success)

Copilot uses AI. Check for mistakes.
Comment on lines 278 to 335
def trigger_challenge_phase_sync(challenge_phase, update_fields=None):
"""
Trigger GitHub sync for a ChallengePhase if conditions are met.
Args:
challenge_phase: ChallengePhase model instance
update_fields: List of fields that were updated (from model.save())
"""
# Get parent challenge
challenge = challenge_phase.challenge

# Skip if not configured for GitHub sync
if not challenge.github_repository or not challenge.github_token:
return

# Skip if already synced in this request (prevent loops)
if is_synced("ChallengePhase", challenge_phase.pk):
logger.debug(
f"ChallengePhase {challenge_phase.pk} already synced in this request"
)
return

# Skip if request came from GitHub (prevent loops)
if is_github_source():
logger.debug("Skipping phase sync - request originated from GitHub")
return

# Determine which field changed
changed_field = get_changed_field_from_update_fields(
update_fields, CHALLENGE_PHASE_SYNC_FIELDS
)

# If not from update_fields, try payload keys
if not changed_field:
changed_field = get_changed_field_from_payload(
get_payload_keys(), CHALLENGE_PHASE_SYNC_FIELDS
)

if not changed_field:
logger.debug(
f"Could not determine changed field for challenge phase "
f"{challenge_phase.pk}"
)
return

# Mark as synced to prevent re-entry
mark_synced("ChallengePhase", challenge_phase.pk)

# Serialize and queue the sync task
serialized_challenge_phase = serializers.serialize("json", [challenge_phase])
serialized_challenge = serializers.serialize("json", [challenge])
github_challenge_phase_sync.delay(
serialized_challenge_phase, serialized_challenge, changed_field
)
logger.info(
f"Queued GitHub sync for challenge phase {challenge_phase.pk}, "
f"field: {changed_field}"
)
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trigger_challenge_phase_sync function lacks test coverage. Tests should verify all conditional branches, loop prevention, and proper integration with the parent challenge. Add tests to tests/unit/challenges/ directory.

Copilot uses AI. Check for mistakes.
Comment on lines +486 to +509
@receiver(signals.post_save, sender="challenges.ChallengePhase")
def challenge_phase_details_sync(sender, instance, created, **kwargs):
"""
Signal to sync ChallengePhase changes to GitHub repository.
Triggered on post_save when a ChallengePhase is updated.
"""
# Skip on create - only sync updates
if created:
return

# Get parent challenge
challenge = instance.challenge

# Skip if no GitHub config on parent challenge
if not challenge.github_repository or not challenge.github_token:
return

# Import here to avoid circular imports
from challenges.github_utils import trigger_challenge_phase_sync

# Get update_fields from kwargs if available
update_fields = kwargs.get("update_fields")
trigger_challenge_phase_sync(instance, update_fields)

Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new signal challenge_phase_details_sync lacks test coverage. Tests should verify proper signal behavior for ChallengePhase updates, parent challenge configuration checks, and sync task triggering. Add tests to tests/unit/challenges/test_models.py.

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +18
# Thread-local storage for per-request sync context
_sync_context = threading.local()
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using threading.local() for sync context in a Celery/async environment may not work as intended. Celery tasks run in separate worker processes, not threads, so thread-local storage won't be shared across requests or provide the intended loop prevention. Consider using a cache backend (Redis/Memcached) with short TTL, or a request-scoped middleware pattern for web requests, to properly track sync state across the distributed system.

Copilot uses AI. Check for mistakes.
success = self.update_data_from_path(
"challenge_config.yaml", yaml_content, changed_field
)
return True if success else False
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant boolean conversion: return True if success else False can be simplified to return bool(success) or return success is not None for better code clarity.

Suggested change
return True if success else False
return bool(success)

Copilot uses AI. Check for mistakes.
"sqs_retention_period",
"github_repository",
"github_branch",
"github_token",
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The github_token field should be marked as write_only=True in the serializer to prevent it from being exposed in API responses. GitHub Personal Access Tokens are sensitive credentials that should only be accepted on write operations (POST/PUT/PATCH) but never returned in GET responses. Consider adding: github_token = serializers.CharField(write_only=True, required=False, allow_blank=True)

Copilot uses AI. Check for mistakes.
- Mark github_token as write_only in serializers to prevent API exposure
- Fix undefined response variable in make_request exception handler
- Simplify ternary operators to use bool() for cleaner code
- Remove inconsistent help_text from migration
- Add fallback field detection using original value comparison
- Add write_only github_token to ZipChallengeSerializer
- Apply black formatting to github_interface.py and github_utils.py
- Fix import ordering in github_utils.py per isort requirements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rework and clarify bi‑directional GitHub sync feature (#3565)

1 participant