feat: Implement bi-directional GitHub sync for challenges (#4760) #4825

balaraj74 · 2025-12-03T06:14:53Z

Summary

This PR implements the bi-directional GitHub sync feature as requested in issue #4760 (originally discussed in PR #3565).

Currently, EvalAI only supports one-way sync — changes made in the linked GitHub repository are pulled into EvalAI. However, when challenge organizers make changes through the EvalAI web UI (like updating the description, dates, or evaluation details), those changes are not synced back to the GitHub repository. This creates a mismatch between what's in GitHub and what's actually running on EvalAI.

This PR adds the missing reverse sync capability so that changes made via the EvalAI UI are automatically pushed back to the linked GitHub repository.

What's Changed

New Files Added

apps/challenges/github_interface.py
- A clean interface class (GithubInterface) for communicating with the GitHub API
- Handles reading/writing file content via the GitHub Contents API
- Supports updating challenge_config.yaml with proper field processing
- Commits are prefixed with evalai_bot: for easy identification
apps/challenges/github_sync_config.py
- Configuration file defining which fields should be synced
- Challenge fields: title, description, start_date, end_date, evaluation_details, terms_and_conditions, submission_guidelines, etc.
- ChallengePhase fields: name, description, start_date, end_date, max_submissions, etc.
apps/challenges/github_utils.py
- Celery tasks for async sync: github_challenge_sync, github_challenge_phase_sync
- Request-scoped context management to prevent infinite loops
- Helper functions to determine which field changed
apps/challenges/migrations/0114_add_github_token_field.py
- Adds github_token field to the Challenge model
- This token is used to authenticate GitHub API requests for the reverse sync

Modified Files

apps/challenges/models.py
- Added github_token field to store the GitHub PAT for each challenge
- Added post_save signals for both Challenge and ChallengePhase
- Signals trigger the sync tasks when models are updated
apps/challenges/serializers.py
- Added github_token to both ChallengeSerializer and ZipChallengeSerializer

How It Works

Challenge organizer configures their challenge with:
- github_repository (e.g., cloudcv/evalai-starter)
- github_branch (e.g., master)
- github_token (GitHub Personal Access Token with repo access)
When the organizer updates a challenge field through the UI:
- Django's post_save signal fires
- The signal checks if GitHub sync is configured
- If yes, it queues a Celery task to sync the change
The Celery task:
- Fetches the current challenge_config.yaml from GitHub
- Updates only the changed field
- Commits the change with a descriptive message like:
  evalai_bot: Update challenge_config.yaml - changed field: description

Loop Prevention

Several safeguards are in place to prevent infinite sync loops:

Skip on create: Only updates trigger sync, not initial creation
Per-request deduplication: Each model instance is only synced once per request
GitHub source detection: Changes originating from GitHub are not synced back
Bot commit prefix: Commits from EvalAI are prefixed with evalai_bot: for identification

Testing

To test this feature:

Create a challenge with a linked GitHub repository
Add a GitHub Personal Access Token with repo scope
Update a challenge field (e.g., description) via the UI
Check the GitHub repository — you should see a new commit with the change

Related Issues

Closes Rework and clarify bi‑directional GitHub sync feature (#3565) #4760
Implements functionality originally proposed in PR GitHub: Add bi-directional sync #3565

Checklist

Added new github_token field with migration
Implemented GithubInterface for GitHub API communication
Added Celery tasks for async sync
Added post_save signals for Challenge and ChallengePhase
Updated serializers to include github_token
Added loop prevention mechanisms

This implements the bi-directional GitHub sync feature that allows changes made through the EvalAI UI to be automatically synced back to the linked GitHub repository. Closes Cloud-CV#4760

Copilot

Pull request overview

This PR implements bi-directional GitHub synchronization for challenges, enabling changes made through the EvalAI web UI to be automatically pushed back to linked GitHub repositories. Previously, EvalAI only supported one-way sync (GitHub → EvalAI), creating potential inconsistencies between the repository and the running challenge configuration.

Key Changes:

Adds github_token field to Challenge model for storing GitHub Personal Access Tokens
Implements Django post_save signals to trigger sync on Challenge and ChallengePhase updates
Creates GitHub API interface layer with YAML processing for challenge_config.yaml updates
Implements Celery tasks for async synchronization with loop prevention mechanisms

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 19 comments.

Show a summary per file

File	Description
`apps/challenges/models.py`	Adds `github_token` field and post_save signals for Challenge and ChallengePhase to trigger GitHub sync
`apps/challenges/serializers.py`	Exposes `github_token` in ChallengeSerializer and ZipChallengeSerializer for API access
`apps/challenges/migrations/0114_add_github_token_field.py`	Database migration adding github_token CharField to Challenge model
`apps/challenges/github_interface.py`	New GitHub API client for reading/writing repository files and updating challenge_config.yaml
`apps/challenges/github_utils.py`	Celery tasks and sync triggers with request-scoped context for loop prevention
`apps/challenges/github_sync_config.py`	Configuration defining syncable fields for Challenge and ChallengePhase models

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-03T06:21:40Z

apps/challenges/serializers.py

            "cli_version",
            "github_repository",
            "github_branch",
+            "github_token",


The github_token field should be marked as write_only=True in the serializer to prevent it from being exposed in API responses. GitHub Personal Access Tokens are sensitive credentials that should only be accepted on write operations but never returned in GET responses. Consider declaring it explicitly with write_only=True to ensure the token is never leaked through the API.

Copilot · 2025-12-03T06:21:41Z

apps/challenges/github_utils.py

+@shared_task
+def github_challenge_phase_sync(
+    serialized_challenge_phase, serialized_challenge, changed_field
+):
+    """
+    Celery task to sync a ChallengePhase change to GitHub.
+
+    Args:
+        serialized_challenge_phase: JSON serialized ChallengePhase object
+        serialized_challenge: JSON serialized Challenge object (parent)
+        changed_field: The field that was changed
+    """
+    try:
+        challenge_phase = deserialize_object(serialized_challenge_phase)
+        challenge = deserialize_object(serialized_challenge)
+
+        if not challenge_phase or not challenge:
+            logger.error(
+                "Failed to deserialize challenge_phase or challenge for GitHub sync"
+            )
+            return False
+
+        # Verify GitHub config is present on parent challenge
+        if not challenge.github_repository or not challenge.github_token:
+            logger.debug(
+                f"Challenge {challenge.pk} missing github_repository or github_token, "
+                "skipping phase sync"
+            )
+            return False
+
+        # Create GitHub interface
+        github = GithubInterface(
+            challenge.github_repository,
+            challenge.github_branch,
+            challenge.github_token,
+        )
+
+        # Verify repository access
+        if not github.is_repository():
+            logger.error(
+                f"Cannot access GitHub repository: {challenge.github_repository}"
+            )
+            return False
+
+        # Sync the changed field
+        success = github.update_challenge_phase_config(challenge_phase, changed_field)
+        if success:
+            logger.info(
+                f"Successfully synced {changed_field} for challenge phase "
+                f"{challenge_phase.pk}"
+            )
+        else:
+            logger.error(
+                f"Failed to sync {changed_field} for challenge phase "
+                f"{challenge_phase.pk}"
+            )
+        return success
+
+    except Exception as e:
+        logger.error(f"Error in github_challenge_phase_sync: {str(e)}")
+        return False
+


The new Celery task github_challenge_phase_sync lacks test coverage. This critical functionality should have comprehensive tests covering serialization, GitHub API interaction, error handling, and the various conditional branches. Tests should be added to tests/unit/challenges/ directory.

Copilot · 2025-12-03T06:21:41Z

apps/challenges/models.py

+@receiver(signals.post_save, sender="challenges.Challenge")
+def challenge_details_sync(sender, instance, created, **kwargs):
+    """
+    Signal to sync Challenge changes to GitHub repository.
+    Triggered on post_save when a Challenge is updated.
+    """
+    # Skip on create - only sync updates
+    if created:
+        return
+
+    # Skip if no GitHub config
+    if not instance.github_repository or not instance.github_token:
+        return
+
+    # Import here to avoid circular imports
+    from challenges.github_utils import trigger_challenge_sync
+
+    # Get update_fields from kwargs if available
+    update_fields = kwargs.get("update_fields")
+    trigger_challenge_sync(instance, update_fields)
+


The new signal challenge_details_sync lacks test coverage. Tests should verify that the signal fires correctly on updates (but not creates), properly checks for GitHub configuration, and triggers the sync task with correct parameters. Add tests to tests/unit/challenges/test_models.py.

Copilot · 2025-12-03T06:21:41Z

apps/challenges/github_interface.py

+                return (
+                    True
+                    if self.update_data_from_path(file_path, new_text, changed_field)
+                    else False
+                )


The conditional expression uses a ternary operator that's harder to read. Consider replacing return (True if self.update_data_from_path(file_path, new_text, changed_field) else False) with a clearer if-else statement or a direct return of a boolean check.

Suggested change

return (

True

if self.update_data_from_path(file_path, new_text, changed_field)

else False

)

return self.update_data_from_path(file_path, new_text, changed_field)

Copilot · 2025-12-03T06:21:42Z

apps/challenges/github_interface.py

+            success = self.update_data_from_path(
+                "challenge_config.yaml", yaml_content, changed_field
+            )
+            return True if success else False


Redundant boolean conversion: return True if success else False can be simplified to return bool(success) or just return success is not None since the function should return a boolean indicating success or failure.

Suggested change

return True if success else False

return bool(success)

Copilot · 2025-12-03T06:21:45Z

apps/challenges/github_utils.py

+def trigger_challenge_phase_sync(challenge_phase, update_fields=None):
+    """
+    Trigger GitHub sync for a ChallengePhase if conditions are met.
+
+    Args:
+        challenge_phase: ChallengePhase model instance
+        update_fields: List of fields that were updated (from model.save())
+    """
+    # Get parent challenge
+    challenge = challenge_phase.challenge
+
+    # Skip if not configured for GitHub sync
+    if not challenge.github_repository or not challenge.github_token:
+        return
+
+    # Skip if already synced in this request (prevent loops)
+    if is_synced("ChallengePhase", challenge_phase.pk):
+        logger.debug(
+            f"ChallengePhase {challenge_phase.pk} already synced in this request"
+        )
+        return
+
+    # Skip if request came from GitHub (prevent loops)
+    if is_github_source():
+        logger.debug("Skipping phase sync - request originated from GitHub")
+        return
+
+    # Determine which field changed
+    changed_field = get_changed_field_from_update_fields(
+        update_fields, CHALLENGE_PHASE_SYNC_FIELDS
+    )
+
+    # If not from update_fields, try payload keys
+    if not changed_field:
+        changed_field = get_changed_field_from_payload(
+            get_payload_keys(), CHALLENGE_PHASE_SYNC_FIELDS
+        )
+
+    if not changed_field:
+        logger.debug(
+            f"Could not determine changed field for challenge phase "
+            f"{challenge_phase.pk}"
+        )
+        return
+
+    # Mark as synced to prevent re-entry
+    mark_synced("ChallengePhase", challenge_phase.pk)
+
+    # Serialize and queue the sync task
+    serialized_challenge_phase = serializers.serialize("json", [challenge_phase])
+    serialized_challenge = serializers.serialize("json", [challenge])
+    github_challenge_phase_sync.delay(
+        serialized_challenge_phase, serialized_challenge, changed_field
+    )
+    logger.info(
+        f"Queued GitHub sync for challenge phase {challenge_phase.pk}, "
+        f"field: {changed_field}"
+    )


The trigger_challenge_phase_sync function lacks test coverage. Tests should verify all conditional branches, loop prevention, and proper integration with the parent challenge. Add tests to tests/unit/challenges/ directory.

Copilot · 2025-12-03T06:21:45Z

apps/challenges/models.py

+@receiver(signals.post_save, sender="challenges.ChallengePhase")
+def challenge_phase_details_sync(sender, instance, created, **kwargs):
+    """
+    Signal to sync ChallengePhase changes to GitHub repository.
+    Triggered on post_save when a ChallengePhase is updated.
+    """
+    # Skip on create - only sync updates
+    if created:
+        return
+
+    # Get parent challenge
+    challenge = instance.challenge
+
+    # Skip if no GitHub config on parent challenge
+    if not challenge.github_repository or not challenge.github_token:
+        return
+
+    # Import here to avoid circular imports
+    from challenges.github_utils import trigger_challenge_phase_sync
+
+    # Get update_fields from kwargs if available
+    update_fields = kwargs.get("update_fields")
+    trigger_challenge_phase_sync(instance, update_fields)
+


The new signal challenge_phase_details_sync lacks test coverage. Tests should verify proper signal behavior for ChallengePhase updates, parent challenge configuration checks, and sync task triggering. Add tests to tests/unit/challenges/test_models.py.

Copilot · 2025-12-03T06:21:45Z

apps/challenges/github_utils.py

+# Thread-local storage for per-request sync context
+_sync_context = threading.local()


Using threading.local() for sync context in a Celery/async environment may not work as intended. Celery tasks run in separate worker processes, not threads, so thread-local storage won't be shared across requests or provide the intended loop prevention. Consider using a cache backend (Redis/Memcached) with short TTL, or a request-scoped middleware pattern for web requests, to properly track sync state across the distributed system.

Copilot · 2025-12-03T06:21:45Z

apps/challenges/github_interface.py

+            success = self.update_data_from_path(
+                "challenge_config.yaml", yaml_content, changed_field
+            )
+            return True if success else False


Redundant boolean conversion: return True if success else False can be simplified to return bool(success) or return success is not None for better code clarity.

Suggested change

return True if success else False

return bool(success)

Copilot · 2025-12-03T06:21:46Z

apps/challenges/serializers.py

            "sqs_retention_period",
            "github_repository",
            "github_branch",
+            "github_token",


The github_token field should be marked as write_only=True in the serializer to prevent it from being exposed in API responses. GitHub Personal Access Tokens are sensitive credentials that should only be accepted on write operations (POST/PUT/PATCH) but never returned in GET responses. Consider adding: github_token = serializers.CharField(write_only=True, required=False, allow_blank=True)

- Mark github_token as write_only in serializers to prevent API exposure - Fix undefined response variable in make_request exception handler - Simplify ternary operators to use bool() for cleaner code - Remove inconsistent help_text from migration - Add fallback field detection using original value comparison - Add write_only github_token to ZipChallengeSerializer

- Apply black formatting to github_interface.py and github_utils.py - Fix import ordering in github_utils.py per isort requirements

feat: Implement bi-directional GitHub sync for challenges

cbb69a6

This implements the bi-directional GitHub sync feature that allows changes made through the EvalAI UI to be automatically synced back to the linked GitHub repository. Closes Cloud-CV#4760

Copilot AI review requested due to automatic review settings December 3, 2025 06:14

Copilot started reviewing on behalf of balaraj74 December 3, 2025 06:15 View session

Copilot finished reviewing on behalf of balaraj74 December 3, 2025 06:19

Copilot AI reviewed Dec 3, 2025

View reviewed changes

balaraj74 added 2 commits December 3, 2025 17:21

style: Fix black and isort formatting issues

0a30fc6

- Apply black formatting to github_interface.py and github_utils.py - Fix import ordering in github_utils.py per isort requirements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: Implement bi-directional GitHub sync for challenges (#4760) #4825

feat: Implement bi-directional GitHub sync for challenges (#4760) #4825

balaraj74 commented Dec 3, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		# Thread-local storage for per-request sync context
		_sync_context = threading.local()

Uh oh!

feat: Implement bi-directional GitHub sync for challenges (#4760) #4825

Are you sure you want to change the base?

feat: Implement bi-directional GitHub sync for challenges (#4760) #4825

Conversation

balaraj74 commented Dec 3, 2025

Summary

What's Changed

New Files Added

Modified Files

How It Works

Loop Prevention

Testing

Related Issues

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant