-
-
Notifications
You must be signed in to change notification settings - Fork 916
feat: Implement bi-directional GitHub sync for challenges (#4760) #4825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: Implement bi-directional GitHub sync for challenges (#4760) #4825
Conversation
This implements the bi-directional GitHub sync feature that allows changes made through the EvalAI UI to be automatically synced back to the linked GitHub repository. Closes Cloud-CV#4760
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements bi-directional GitHub synchronization for challenges, enabling changes made through the EvalAI web UI to be automatically pushed back to linked GitHub repositories. Previously, EvalAI only supported one-way sync (GitHub → EvalAI), creating potential inconsistencies between the repository and the running challenge configuration.
Key Changes:
- Adds
github_tokenfield to Challenge model for storing GitHub Personal Access Tokens - Implements Django post_save signals to trigger sync on Challenge and ChallengePhase updates
- Creates GitHub API interface layer with YAML processing for challenge_config.yaml updates
- Implements Celery tasks for async synchronization with loop prevention mechanisms
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
apps/challenges/models.py |
Adds github_token field and post_save signals for Challenge and ChallengePhase to trigger GitHub sync |
apps/challenges/serializers.py |
Exposes github_token in ChallengeSerializer and ZipChallengeSerializer for API access |
apps/challenges/migrations/0114_add_github_token_field.py |
Database migration adding github_token CharField to Challenge model |
apps/challenges/github_interface.py |
New GitHub API client for reading/writing repository files and updating challenge_config.yaml |
apps/challenges/github_utils.py |
Celery tasks and sync triggers with request-scoped context for loop prevention |
apps/challenges/github_sync_config.py |
Configuration defining syncable fields for Challenge and ChallengePhase models |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "cli_version", | ||
| "github_repository", | ||
| "github_branch", | ||
| "github_token", |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The github_token field should be marked as write_only=True in the serializer to prevent it from being exposed in API responses. GitHub Personal Access Tokens are sensitive credentials that should only be accepted on write operations but never returned in GET responses. Consider declaring it explicitly with write_only=True to ensure the token is never leaked through the API.
| @shared_task | ||
| def github_challenge_phase_sync( | ||
| serialized_challenge_phase, serialized_challenge, changed_field | ||
| ): | ||
| """ | ||
| Celery task to sync a ChallengePhase change to GitHub. | ||
| Args: | ||
| serialized_challenge_phase: JSON serialized ChallengePhase object | ||
| serialized_challenge: JSON serialized Challenge object (parent) | ||
| changed_field: The field that was changed | ||
| """ | ||
| try: | ||
| challenge_phase = deserialize_object(serialized_challenge_phase) | ||
| challenge = deserialize_object(serialized_challenge) | ||
|
|
||
| if not challenge_phase or not challenge: | ||
| logger.error( | ||
| "Failed to deserialize challenge_phase or challenge for GitHub sync" | ||
| ) | ||
| return False | ||
|
|
||
| # Verify GitHub config is present on parent challenge | ||
| if not challenge.github_repository or not challenge.github_token: | ||
| logger.debug( | ||
| f"Challenge {challenge.pk} missing github_repository or github_token, " | ||
| "skipping phase sync" | ||
| ) | ||
| return False | ||
|
|
||
| # Create GitHub interface | ||
| github = GithubInterface( | ||
| challenge.github_repository, | ||
| challenge.github_branch, | ||
| challenge.github_token, | ||
| ) | ||
|
|
||
| # Verify repository access | ||
| if not github.is_repository(): | ||
| logger.error( | ||
| f"Cannot access GitHub repository: {challenge.github_repository}" | ||
| ) | ||
| return False | ||
|
|
||
| # Sync the changed field | ||
| success = github.update_challenge_phase_config(challenge_phase, changed_field) | ||
| if success: | ||
| logger.info( | ||
| f"Successfully synced {changed_field} for challenge phase " | ||
| f"{challenge_phase.pk}" | ||
| ) | ||
| else: | ||
| logger.error( | ||
| f"Failed to sync {changed_field} for challenge phase " | ||
| f"{challenge_phase.pk}" | ||
| ) | ||
| return success | ||
|
|
||
| except Exception as e: | ||
| logger.error(f"Error in github_challenge_phase_sync: {str(e)}") | ||
| return False | ||
|
|
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new Celery task github_challenge_phase_sync lacks test coverage. This critical functionality should have comprehensive tests covering serialization, GitHub API interaction, error handling, and the various conditional branches. Tests should be added to tests/unit/challenges/ directory.
| @receiver(signals.post_save, sender="challenges.Challenge") | ||
| def challenge_details_sync(sender, instance, created, **kwargs): | ||
| """ | ||
| Signal to sync Challenge changes to GitHub repository. | ||
| Triggered on post_save when a Challenge is updated. | ||
| """ | ||
| # Skip on create - only sync updates | ||
| if created: | ||
| return | ||
|
|
||
| # Skip if no GitHub config | ||
| if not instance.github_repository or not instance.github_token: | ||
| return | ||
|
|
||
| # Import here to avoid circular imports | ||
| from challenges.github_utils import trigger_challenge_sync | ||
|
|
||
| # Get update_fields from kwargs if available | ||
| update_fields = kwargs.get("update_fields") | ||
| trigger_challenge_sync(instance, update_fields) | ||
|
|
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new signal challenge_details_sync lacks test coverage. Tests should verify that the signal fires correctly on updates (but not creates), properly checks for GitHub configuration, and triggers the sync task with correct parameters. Add tests to tests/unit/challenges/test_models.py.
| return ( | ||
| True | ||
| if self.update_data_from_path(file_path, new_text, changed_field) | ||
| else False | ||
| ) |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conditional expression uses a ternary operator that's harder to read. Consider replacing return (True if self.update_data_from_path(file_path, new_text, changed_field) else False) with a clearer if-else statement or a direct return of a boolean check.
| return ( | |
| True | |
| if self.update_data_from_path(file_path, new_text, changed_field) | |
| else False | |
| ) | |
| return self.update_data_from_path(file_path, new_text, changed_field) |
apps/challenges/github_interface.py
Outdated
| success = self.update_data_from_path( | ||
| "challenge_config.yaml", yaml_content, changed_field | ||
| ) | ||
| return True if success else False |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant boolean conversion: return True if success else False can be simplified to return bool(success) or just return success is not None since the function should return a boolean indicating success or failure.
| return True if success else False | |
| return bool(success) |
| def trigger_challenge_phase_sync(challenge_phase, update_fields=None): | ||
| """ | ||
| Trigger GitHub sync for a ChallengePhase if conditions are met. | ||
| Args: | ||
| challenge_phase: ChallengePhase model instance | ||
| update_fields: List of fields that were updated (from model.save()) | ||
| """ | ||
| # Get parent challenge | ||
| challenge = challenge_phase.challenge | ||
|
|
||
| # Skip if not configured for GitHub sync | ||
| if not challenge.github_repository or not challenge.github_token: | ||
| return | ||
|
|
||
| # Skip if already synced in this request (prevent loops) | ||
| if is_synced("ChallengePhase", challenge_phase.pk): | ||
| logger.debug( | ||
| f"ChallengePhase {challenge_phase.pk} already synced in this request" | ||
| ) | ||
| return | ||
|
|
||
| # Skip if request came from GitHub (prevent loops) | ||
| if is_github_source(): | ||
| logger.debug("Skipping phase sync - request originated from GitHub") | ||
| return | ||
|
|
||
| # Determine which field changed | ||
| changed_field = get_changed_field_from_update_fields( | ||
| update_fields, CHALLENGE_PHASE_SYNC_FIELDS | ||
| ) | ||
|
|
||
| # If not from update_fields, try payload keys | ||
| if not changed_field: | ||
| changed_field = get_changed_field_from_payload( | ||
| get_payload_keys(), CHALLENGE_PHASE_SYNC_FIELDS | ||
| ) | ||
|
|
||
| if not changed_field: | ||
| logger.debug( | ||
| f"Could not determine changed field for challenge phase " | ||
| f"{challenge_phase.pk}" | ||
| ) | ||
| return | ||
|
|
||
| # Mark as synced to prevent re-entry | ||
| mark_synced("ChallengePhase", challenge_phase.pk) | ||
|
|
||
| # Serialize and queue the sync task | ||
| serialized_challenge_phase = serializers.serialize("json", [challenge_phase]) | ||
| serialized_challenge = serializers.serialize("json", [challenge]) | ||
| github_challenge_phase_sync.delay( | ||
| serialized_challenge_phase, serialized_challenge, changed_field | ||
| ) | ||
| logger.info( | ||
| f"Queued GitHub sync for challenge phase {challenge_phase.pk}, " | ||
| f"field: {changed_field}" | ||
| ) |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trigger_challenge_phase_sync function lacks test coverage. Tests should verify all conditional branches, loop prevention, and proper integration with the parent challenge. Add tests to tests/unit/challenges/ directory.
| @receiver(signals.post_save, sender="challenges.ChallengePhase") | ||
| def challenge_phase_details_sync(sender, instance, created, **kwargs): | ||
| """ | ||
| Signal to sync ChallengePhase changes to GitHub repository. | ||
| Triggered on post_save when a ChallengePhase is updated. | ||
| """ | ||
| # Skip on create - only sync updates | ||
| if created: | ||
| return | ||
|
|
||
| # Get parent challenge | ||
| challenge = instance.challenge | ||
|
|
||
| # Skip if no GitHub config on parent challenge | ||
| if not challenge.github_repository or not challenge.github_token: | ||
| return | ||
|
|
||
| # Import here to avoid circular imports | ||
| from challenges.github_utils import trigger_challenge_phase_sync | ||
|
|
||
| # Get update_fields from kwargs if available | ||
| update_fields = kwargs.get("update_fields") | ||
| trigger_challenge_phase_sync(instance, update_fields) | ||
|
|
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new signal challenge_phase_details_sync lacks test coverage. Tests should verify proper signal behavior for ChallengePhase updates, parent challenge configuration checks, and sync task triggering. Add tests to tests/unit/challenges/test_models.py.
| # Thread-local storage for per-request sync context | ||
| _sync_context = threading.local() |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using threading.local() for sync context in a Celery/async environment may not work as intended. Celery tasks run in separate worker processes, not threads, so thread-local storage won't be shared across requests or provide the intended loop prevention. Consider using a cache backend (Redis/Memcached) with short TTL, or a request-scoped middleware pattern for web requests, to properly track sync state across the distributed system.
apps/challenges/github_interface.py
Outdated
| success = self.update_data_from_path( | ||
| "challenge_config.yaml", yaml_content, changed_field | ||
| ) | ||
| return True if success else False |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant boolean conversion: return True if success else False can be simplified to return bool(success) or return success is not None for better code clarity.
| return True if success else False | |
| return bool(success) |
| "sqs_retention_period", | ||
| "github_repository", | ||
| "github_branch", | ||
| "github_token", |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The github_token field should be marked as write_only=True in the serializer to prevent it from being exposed in API responses. GitHub Personal Access Tokens are sensitive credentials that should only be accepted on write operations (POST/PUT/PATCH) but never returned in GET responses. Consider adding: github_token = serializers.CharField(write_only=True, required=False, allow_blank=True)
- Mark github_token as write_only in serializers to prevent API exposure - Fix undefined response variable in make_request exception handler - Simplify ternary operators to use bool() for cleaner code - Remove inconsistent help_text from migration - Add fallback field detection using original value comparison - Add write_only github_token to ZipChallengeSerializer
- Apply black formatting to github_interface.py and github_utils.py - Fix import ordering in github_utils.py per isort requirements
Summary
This PR implements the bi-directional GitHub sync feature as requested in issue #4760 (originally discussed in PR #3565).
Currently, EvalAI only supports one-way sync — changes made in the linked GitHub repository are pulled into EvalAI. However, when challenge organizers make changes through the EvalAI web UI (like updating the description, dates, or evaluation details), those changes are not synced back to the GitHub repository. This creates a mismatch between what's in GitHub and what's actually running on EvalAI.
This PR adds the missing reverse sync capability so that changes made via the EvalAI UI are automatically pushed back to the linked GitHub repository.
What's Changed
New Files Added
apps/challenges/github_interface.pyGithubInterface) for communicating with the GitHub APIchallenge_config.yamlwith proper field processingevalai_bot:for easy identificationapps/challenges/github_sync_config.pyapps/challenges/github_utils.pygithub_challenge_sync,github_challenge_phase_syncapps/challenges/migrations/0114_add_github_token_field.pygithub_tokenfield to the Challenge modelModified Files
apps/challenges/models.pygithub_tokenfield to store the GitHub PAT for each challengepost_savesignals for bothChallengeandChallengePhaseapps/challenges/serializers.pygithub_tokento bothChallengeSerializerandZipChallengeSerializerHow It Works
Challenge organizer configures their challenge with:
github_repository(e.g.,cloudcv/evalai-starter)github_branch(e.g.,master)github_token(GitHub Personal Access Token with repo access)When the organizer updates a challenge field through the UI:
post_savesignal firesThe Celery task:
challenge_config.yamlfrom GitHubevalai_bot: Update challenge_config.yaml - changed field: descriptionLoop Prevention
Several safeguards are in place to prevent infinite sync loops:
evalai_bot:for identificationTesting
To test this feature:
reposcopeRelated Issues
Checklist
github_tokenfield with migrationGithubInterfacefor GitHub API communicationpost_savesignals for Challenge and ChallengePhasegithub_token