-
Notifications
You must be signed in to change notification settings - Fork 0
Fix hardcoded collection intervals in multiple spots #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Fix for GitHub Issue chaoss#3194 where releases collection was failing with psycopg2.errors.UniqueViolation on releases_pkey constraint. Root cause: GitHub API returns release IDs with trailing spaces that don't match existing trimmed database records. Changes: - Add str().strip() to release_id processing in get_release_inf() - Enhance duplicate detection in insert_release() with proper trimming - Add early duplicate detection to prevent unnecessary database operations Signed-off-by: mohsinm-dev <mohsin.mdev@gmail.com>
deduplicate convert_type_of_value
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Signed-off-by: Ulincsys <ulincsys@gmail.com>
The keys on this dictionary are defined in: analyzecommit.generate_commit_record() - Update reference to use proper 'cmt_author_timestamp' key - Add warning log when replacing TZdata to show commit hash Signed-off-by: Ulincsys <ulincsys@gmail.com>
Facade timezone fix redux
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Add dockerfile for performing an empty database build
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
…roperly convert things into a chord Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Sajal-Kulshreshtha <sajalkulshreshtha9@gmail.com>
Signed-off-by: Sajal-Kulshreshtha <sajalkulshreshtha9@gmail.com>
Signed-off-by: Sajal-Kulshreshtha <sajalkulshreshtha9@gmail.com>
Signed-off-by: Sajal-Kulshreshtha <sajalkulshreshtha9@gmail.com>
facade: refactor facade_phase to return the group itself so that celery can properly convert things into a chord
fix(cli): display timezone in output for GitHub key expiry (chaoss#3251)
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
…te-code scheduling: refactor send_messages to de-duplicate code path
Signed-off-by: Sean P. Goggins <s@goggins.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
main: bump celery and a related dependency - attempt to fix secondary+core
commented out abused API endpoint
Signed-off-by: PredictiveManish <manisht0914@gmail.com>
Signed-off-by: PredictiveManish <manisht0914@gmail.com>
Signed-off-by: PredictiveManish <manisht0914@gmail.com>
WalkthroughAdded configurable per-task collection intervals and runtime config loading from augur.json; removed legacy contributor and pull request report routes and related view/report utilities; added a repo clones metric endpoint; moved/added config type conversion utility; introduced an Alembic migration and multiple Docker/versioning and documentation updates. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Monitor as augur_collection_monitor
participant Config as AugurConfig
participant Builders as CollectionRequestBuilders
Note over Config: Tasks section contains\ncore, secondary, facade, ml\ncollection interval keys
Monitor->>Config: get_section("Tasks")
Config-->>Monitor: tasks_config (interval values)
Monitor->>Builders: build_primary_repo_collect_request(days_until_collect_again=core_collection_interval_days)
Monitor->>Builders: build_secondary_repo_collect_request(days_until_collect_again=secondary_collection_interval_days)
Monitor->>Builders: build_facade_repo_collect_request(days_until_collect_again=facade_collection_interval_days)
Monitor->>Builders: build_ml_repo_collect_request(days_until_collect_again=ml_collection_interval_days)
Note right of Builders: Builders receive configurable\nper-task intervals instead of hard-coded literals
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
augur/application/config.py(1 hunks)augur/tasks/github/releases/core.py(3 hunks)augur/tasks/start_tasks.py(5 hunks)augur/tasks/util/collection_util.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
augur/tasks/util/collection_util.py (2)
augur/application/config.py (1)
get_section(159-182)augur/application/db/lib.py (1)
get_section(52-77)
augur/tasks/start_tasks.py (1)
augur/tasks/util/collection_util.py (3)
get_collection_intervals_from_config(216-233)CollectionRequest(116-151)get_valid_repos(132-151)
augur/tasks/github/releases/core.py (1)
augur/application/db/models/augur_data.py (1)
Release(1832-1860)
🪛 Ruff (0.14.1)
augur/tasks/util/collection_util.py
226-226: Do not catch blind exception: Exception
(BLE001)
augur/tasks/start_tasks.py
159-159: Unused function argument: logger
(ARG001)
179-179: get_collection_intervals_from_config may be undefined, or defined from star imports
(F405)
180-180: CollectionRequest may be undefined, or defined from star imports
(F405)
184-184: Unused function argument: logger
(ARG001)
199-199: get_collection_intervals_from_config may be undefined, or defined from star imports
(F405)
200-200: CollectionRequest may be undefined, or defined from star imports
(F405)
206-206: Unused function argument: logger
(ARG001)
206-206: Unused function argument: enabled_phase_names
(ARG001)
223-223: get_collection_intervals_from_config may be undefined, or defined from star imports
(F405)
224-224: CollectionRequest may be undefined, or defined from star imports
(F405)
229-229: Unused function argument: logger
(ARG001)
229-229: Unused function argument: enabled_phase_names
(ARG001)
240-240: get_collection_intervals_from_config may be undefined, or defined from star imports
(F405)
241-241: CollectionRequest may be undefined, or defined from star imports
(F405)
augur/tasks/github/releases/core.py
72-72: Release may be undefined, or defined from star imports
(F405)
72-72: Release may be undefined, or defined from star imports
(F405)
🔇 Additional comments (8)
augur/application/config.py (1)
81-85: LGTM! Configuration keys properly defined.The new collection interval keys are well-structured and align with the values used in the fallback logic of
get_collection_intervals_from_config(). The existingcollection_intervalat line 81 remains for backward compatibility.augur/tasks/github/releases/core.py (2)
26-26: LGTM! Release ID normalization is appropriate.Converting release_id to string and stripping whitespace ensures consistent formatting for both regular releases and tag-only releases.
Also applies to: 54-54
81-84: Duplicate suppression logic is sound.The early return prevents inserting duplicate releases, assuming the
existing_release_idsset is populated correctly (contingent on the verification above).augur/tasks/util/collection_util.py (1)
216-233: LGTM! Robust config retrieval with appropriate fallback.The function properly retrieves collection intervals from configuration with per-key defaults and a fallback mechanism. The broad exception catch on line 226 is appropriate here to ensure graceful degradation if the config section is unavailable.
augur/tasks/start_tasks.py (4)
159-182: LGTM! Config-driven intervals properly integrated.The function signature update and interval retrieval from config are implemented correctly. The
intervals['core']value is appropriately passed to the CollectionRequest constructor.
184-203: LGTM! Secondary collection intervals properly configured.Consistent implementation with the primary collection request. The
intervals['secondary']value is correctly applied.
206-227: LGTM! Facade collection intervals properly configured.The facade collection request correctly uses
intervals['facade']from the configuration.
229-243: LGTM! ML collection intervals properly configured.The machine learning collection request correctly uses
intervals['ml']from the configuration, completing the consistent refactoring across all collection types.
Signed-off-by: PredictiveManish <manisht0914@gmail.com>
|
Just as a heads up: you filed this Pull request against your fork of the augur repository, not the actual upstream augur repository (notice how this is pull request number 1 when the issue it is solving is chaoss#3000-ish?). when making your pull request, make sure the "base branch" (usually on the left in the selection menu) is set to the chaoss/augur repository's |
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
…format Fixing the API key Visibility issue in debug mode
Co-authored-by: Adrian Edwards <17362949+MoralCode@users.noreply.github.com> Signed-off-by: Manish Tiwari <manish.tiwari.09@zohomail.in>
testing docker config for empty database
Fix: python3.9 compatibility message in docs chaoss#3266
Alternative mypy setup
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
f829663 to
8201ed8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
augur/tasks/start_tasks.py (4)
159-159: Remove the unuseddays_until_collect_againparameter.The PR description states that "unused days_until_collect_again parameters from build functions" were removed, but this parameter is still present in the function signature. Since the collection interval is now sourced from configuration (lines 178-181), this parameter is no longer used and should be removed to avoid confusion.
Apply this diff:
-def build_primary_repo_collect_request(session, logger, enabled_phase_names, days_until_collect_again = 15): +def build_primary_repo_collect_request(session, logger, enabled_phase_names):
185-185: Remove the unuseddays_until_collect_againparameter.Similar to the primary collection function, this parameter is no longer used since the interval is now config-driven (lines 200-203). Additionally, note that the parameter's default value (1) doesn't match the config fallback default (10), which could have caused confusion.
Apply this diff:
-def build_secondary_repo_collect_request(session, logger, enabled_phase_names, days_until_collect_again = 1): +def build_secondary_repo_collect_request(session, logger, enabled_phase_names):
209-209: Remove the unuseddays_until_collect_againparameter.This parameter is unused since the interval is now retrieved from configuration (lines 225-228).
Apply this diff:
-def build_facade_repo_collect_request(session, logger, enabled_phase_names, days_until_collect_again = 10): +def build_facade_repo_collect_request(session, logger, enabled_phase_names):
233-233: Remove the unuseddays_until_collect_againparameter.This parameter is unused since the interval is now retrieved from configuration (lines 243-246).
Apply this diff:
-def build_ml_repo_collect_request(session, logger, enabled_phase_names, days_until_collect_again = 40): +def build_ml_repo_collect_request(session, logger, enabled_phase_names):
🧹 Nitpick comments (1)
augur/tasks/start_tasks.py (1)
178-181: Consider centralizing config retrieval to reduce duplication.This pattern of fetching the Tasks configuration section is repeated in all four build functions (lines 178-181, 200-203, 225-228, 243-246). Consider retrieving the Tasks config once in the
augur_collection_monitorfunction and passing the relevant intervals to each build function, or creating a helper function to encapsulate this logic.Example approach - fetch config once in the calling function:
# In augur_collection_monitor function tasks_config = get_section("Tasks") # Then pass to build functions or extract a helper: def get_collection_interval(interval_name, default): tasks_config = get_section("Tasks") return tasks_config.get(interval_name, default)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
augur/application/config.py(1 hunks)augur/tasks/start_tasks.py(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- augur/application/config.py
🧰 Additional context used
🧬 Code graph analysis (1)
augur/tasks/start_tasks.py (2)
augur/application/db/lib.py (1)
get_section(52-77)augur/tasks/util/collection_util.py (1)
CollectionRequest(116-151)
🪛 Ruff (0.14.2)
augur/tasks/start_tasks.py
181-181: CollectionRequest may be undefined, or defined from star imports
(F405)
203-203: CollectionRequest may be undefined, or defined from star imports
(F405)
228-228: CollectionRequest may be undefined, or defined from star imports
(F405)
246-246: CollectionRequest may be undefined, or defined from star imports
(F405)
🔇 Additional comments (2)
augur/tasks/start_tasks.py (2)
31-31: LGTM!The import of
get_sectionis necessary to fetch configuration values from the database.
159-248: Verification complete: thedays_until_collect_againparameter is not used by external callers.The search confirms that these build functions are only referenced within
augur/tasks/start_tasks.pyand are never imported elsewhere. All internal calls (lines 268, 271, 276, 279) pass only three arguments and do not use thedays_until_collect_againparameter. The parameter can be safely removed from all four function signatures.
Fix variables issue in chaoss#3183
remove the "empty database" docker build step to avoid blocking release Adrian and I are merging this with only "one" approval as its been discussed in maintainer meetings.
6e4db56 to
b99a182
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
augur/tasks/git/util/facade_worker/facade_worker/utilitymethods.py (1)
108-122: Critical: Fix string formatting in git command.There's a critical string formatting issue on line 111. The format string contains
"git --git-dir %s log..."but the string substitution uses% (absolute_repo_path)at the end of line 111, which expects a%splaceholder. However, the string only contains%%H(an escaped percent sign for the git format), not%sfor the repo path.This will cause the command to fail or behave incorrectly.
Apply this diff to fix the formatting:
def get_parent_commits_set(absolute_repo_path): - parents = subprocess.Popen(["git --git-dir %s log --ignore-missing " - "--pretty=format:'%%H'" % (absolute_repo_path)], + parents = subprocess.Popen([f"git --git-dir {absolute_repo_path} log --ignore-missing " + "--pretty=format:'%H'"], stdout=subprocess.PIPE, shell=True)Note: The static analysis tool also flagged the use of
shell=Trueas a security concern. Consider using a list of arguments without shell=True if possible.augur/application/db/lib.py (1)
226-273: Rewrite timezone validation logic - inverted condition will cause valid timestamps to fail.The conditional at lines 249-252 is inverted. When a timezone offset is valid (found in
postgres_valid_timezones), the code raises the original exception, preventing the record from being inserted. When invalid, it normalizes to UTC. This is backwards.Current code:
if int(tzdata) not in postgres_valid_timezones: tzdata = "+0000" else: raise eShould be:
if int(tzdata) not in postgres_valid_timezones: tzdata = "+0000"Remove the
else: raise eclause entirely. If the timezone is valid, allow the record to proceed with the normalized timestamp. The original exceptioneindicates a database error—if timezone normalization resolves the issue (e.g., an invalid offset was causing PostgreSQL to reject the insert), the record should be successfully saved..github/workflows/build_docker.yml (1)
202-240: Add VERSION build-arg to Podman build steps.The test-e2e-podman job's build steps (lines 202-240) do not pass the VERSION build-arg, even though the Dockerfiles now require it via
ARG VERSIONdeclarations. This will cause the Podman builds to fail or produce images without proper version labels.The test-e2e-podman job needs the same version extraction step as test-e2e. Add after line 200:
- name: Checkout repository uses: actions/checkout@v4 + +- name: Extract project version + id: version + run: | + VERSION=$(python -c "exec(open('metadata.py').read()); print(__version__)") + echo "version=$VERSION" >> $GITHUB_OUTPUT + echo "Using version: $VERSION"Then update each Podman build step to include build-args. For example:
- name: Build database container uses: redhat-actions/buildah-build@v2 with: context: . containerfiles: | ./docker/database/Dockerfile + build-args: | + VERSION=${{ steps.version.outputs.version }} platforms: linux/amd64 tags: ghcr.io/${{ github.repository_owner }}/augur_database:test layers: trueApply similar changes to the keyman, rabbitmq, and backend Podman build steps.
🧹 Nitpick comments (8)
augur/tasks/util/random_key_auth.py (1)
6-6: Remove unused import.The
hashlibimport is not used anywhere in this file.Apply this diff to remove the unused import:
-import hashlibaugur/api/view/routes.py (1)
7-7: Consider replacing star import with explicit imports.The star import
from .utils import *makes it unclear which symbols are imported and prevents static analysis tools from verifying their availability. Based on the code,render_module,renderRepos, andrender_messageappear to be the primary imports needed.Apply this diff to use explicit imports:
-from .utils import * +from .utils import render_module, renderRepos, render_messageNote: This is a pre-existing issue, not introduced by this PR.
augur/tasks/github/releases/core.py (1)
3-3: Consider replacing star import with explicit imports.The star import makes it less clear where
Releaseand other models are defined. Consider using explicit imports:-from augur.application.db.models import * +from augur.application.db.models import ReleaseThis improves code clarity and helps static analysis tools. Based on static analysis hints.
augur/tasks/util/collection_util.py (1)
600-628: Consider more robust platform detection.The current approach uses substring matching (
"github" in repo.repo_gitand"gitlab" in repo.repo_git) to determine the platform. This could lead to false positives if a repository URL contains these strings in unexpected ways (e.g., a self-hosted GitLab instance with "github" in its domain name).Consider using a more robust detection method:
- Check if
Repohas aplatformorrepo_typefield- Use URL parsing to extract and compare the hostname
- Use a regex pattern for more precise matching
Example with URL parsing:
+from urllib.parse import urlparse + repo = get_repo_by_repo_git(repo_git) platform_name = "github" -# this needs to be here and not up a level since it should be set/reset for each repo. -# otherwise a gitlab repo would reset it and cause subsequent github repos to use gitlab phases. phases = None -if "github" in repo.repo_git: + +parsed_url = urlparse(repo.repo_git) +hostname = parsed_url.hostname or "" + +if "github.com" in hostname: phases = col_hook.phases - # use default platform name - -elif "gitlab" in repo.repo_git: +elif "gitlab.com" in hostname: platform_name = "gitlab" if col_hook.gitlab_phases is None: continue phases = col_hook.gitlab_phasesaugur/tasks/git/util/facade_worker/facade_worker/config.py (1)
254-264: Consider removing commented-out code.The
get_last_collected_commit_datehelper has been commented out. If this code is no longer needed, consider removing it entirely to keep the codebase clean. If it might be needed in the future, document why it's being kept.augur/tasks/init/celery_app.py (1)
244-248: LGTM! Config-driven scheduling implemented correctly.The conditional scheduling based on
mat_views_intervalaligns well with the PR's goal of making collection intervals configurable. The logic correctly schedules the task only when the interval is positive.Minor style cleanup: Remove the
fprefix from the log statements on lines 245 and 248 since they don't contain placeholders.- logger.info(f"Scheduling refresh materialized view every night at 1am CDT") + logger.info("Scheduling refresh materialized view every night at 1am CDT")- logger.info(f"Refresh materialized view task is disabled.") + logger.info("Refresh materialized view task is disabled.")README.md (1)
14-14: Improve link accessibility with descriptive text.The link text "here" is not descriptive for screen readers and users scanning the document. Consider using more descriptive text such as "the v0.90.3 release" or "version 0.90.3 on GitHub".
Apply this diff to improve accessibility:
-Augur is now releasing a dramatically improved new version. It is also available [here](https://github.com/chaoss/augur/releases/tag/v0.90.3). +Augur is now releasing a dramatically improved new version. It is also available [on GitHub as v0.90.3](https://github.com/chaoss/augur/releases/tag/v0.90.3).Based on static analysis (markdownlint).
augur/application/db/util.py (1)
73-79: Enhance boolean parsing logic for robustness.The current boolean parsing only recognizes lowercase "false" as False, while all other values (including empty strings, "False", "FALSE", "0", "no", "off") are treated as True. This could lead to unexpected configuration behavior.
Consider a more robust implementation:
elif data_type == "bool": value = config_dict["value"] - if value.lower() == "false": + if isinstance(value, bool): + config_dict["value"] = value + elif str(value).lower() in ("false", "0", "no", "off", ""): config_dict["value"] = False else: config_dict["value"] = TrueThis handles common false-like strings and prevents empty strings from being treated as True.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
augur/application/schema/repo_load_sample.csvis excluded by!**/*.csvtests/test_workers/test_facade/test_facade_contributor_interface/test_repos.csvis excluded by!**/*.csvuv.lockis excluded by!**/*.lock
📒 Files selected for processing (50)
.github/workflows/build_docker.yml(5 hunks)CITATION.cff(1 hunks)README.md(2 hunks)augur/api/metrics/repo_meta.py(1 hunks)augur/api/routes/__init__.py(0 hunks)augur/api/routes/contributor_reports.py(0 hunks)augur/api/routes/pull_request_reports.py(0 hunks)augur/api/view/init.py(0 hunks)augur/api/view/routes.py(1 hunks)augur/api/view/utils.py(0 hunks)augur/application/cli/github.py(2 hunks)augur/application/config.py(4 hunks)augur/application/db/data_parse.py(4 hunks)augur/application/db/engine.py(1 hunks)augur/application/db/lib.py(4 hunks)augur/application/db/models/augur_data.py(3 hunks)augur/application/db/session.py(3 hunks)augur/application/db/util.py(1 hunks)augur/application/schema/alembic/versions/34_add_contrib_to_config.py(1 hunks)augur/application/util.py(1 hunks)augur/tasks/git/facade_tasks.py(3 hunks)augur/tasks/git/util/facade_worker/facade_worker/config.py(3 hunks)augur/tasks/git/util/facade_worker/facade_worker/repofetch.py(1 hunks)augur/tasks/git/util/facade_worker/facade_worker/utilitymethods.py(1 hunks)augur/tasks/github/facade_github/tasks.py(7 hunks)augur/tasks/github/releases/core.py(3 hunks)augur/tasks/init/celery_app.py(1 hunks)augur/tasks/start_tasks.py(6 hunks)augur/tasks/util/collection_util.py(1 hunks)augur/tasks/util/random_key_auth.py(1 hunks)augur/templates/repo-info.j2(1 hunks)augur/util/keys.py(1 hunks)docker-compose.yml(1 hunks)docker/backend/Dockerfile(1 hunks)docker/database/Dockerfile(1 hunks)docker/keyman/Dockerfile(1 hunks)docker/rabbitmq/Dockerfile(1 hunks)docker/rabbitmq/definitions.json(1 hunks)docs/source/conf.py(1 hunks)docs/source/development-guide/create-a-metric/api-development.rst(0 hunks)docs/source/docker/docker-compose.rst(1 hunks)docs/source/docker/getting-started.rst(1 hunks)docs/source/docker/quick-start.rst(1 hunks)docs/source/getting-started/collecting-data.rst(1 hunks)docs/source/getting-started/command-line-interface/db.rst(1 hunks)docs/source/getting-started/installation.rst(1 hunks)docs/source/getting-started/using-docker.rst(2 hunks)docs/source/procedures/creating-releases.rst(1 hunks)metadata.py(1 hunks)pyproject.toml(3 hunks)
💤 Files with no reviewable changes (6)
- augur/api/view/init.py
- docs/source/development-guide/create-a-metric/api-development.rst
- augur/api/view/utils.py
- augur/api/routes/contributor_reports.py
- augur/api/routes/init.py
- augur/api/routes/pull_request_reports.py
✅ Files skipped from review due to trivial changes (6)
- docs/source/getting-started/using-docker.rst
- docs/source/docker/quick-start.rst
- docs/source/docker/getting-started.rst
- CITATION.cff
- augur/application/util.py
- augur/application/db/models/augur_data.py
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/source/getting-started/collecting-data.rst
🧰 Additional context used
🧬 Code graph analysis (11)
augur/tasks/github/facade_github/tasks.py (4)
augur/application/db/__init__.py (1)
get_session(29-39)augur/application/db/util.py (1)
execute_session_query(31-43)augur/tasks/git/util/facade_worker/facade_worker/config.py (1)
FacadeHelper(81-253)augur/application/db/models/augur_operations.py (1)
CollectionStatus(1139-1283)
augur/tasks/util/random_key_auth.py (1)
augur/util/keys.py (1)
mask_key(1-5)
augur/tasks/github/releases/core.py (2)
augur/application/db/models/augur_data.py (1)
Release(1832-1860)augur/application/db/util.py (1)
execute_session_query(31-43)
augur/tasks/git/facade_tasks.py (4)
augur/tasks/git/util/facade_worker/facade_worker/utilitymethods.py (1)
get_parent_commits_set(108-122)augur/tasks/git/dependency_tasks/tasks.py (1)
process_dependency_metrics(10-14)augur/tasks/git/dependency_libyear_tasks/tasks.py (1)
process_libyear_dependency_metrics(7-12)augur/tasks/git/scc_value_tasks/tasks.py (1)
process_scc_value_metrics(9-13)
augur/tasks/init/celery_app.py (1)
augur/tasks/db/refresh_materialized_views.py (1)
refresh_materialized_views(12-210)
augur/application/schema/alembic/versions/34_add_contrib_to_config.py (2)
augur/application/db/session.py (1)
DatabaseSession(45-283)augur/application/config.py (2)
AugurConfig(118-368)add_section_from_json(288-314)
augur/api/view/routes.py (1)
augur/api/view/utils.py (1)
render_module(222-224)
augur/tasks/start_tasks.py (2)
augur/application/config.py (2)
AugurConfig(118-368)get_value(163-188)augur/tasks/util/collection_util.py (1)
CollectionRequest(116-151)
augur/api/metrics/repo_meta.py (2)
augur/api/util.py (1)
register_metric(54-83)augur/application/db/engine.py (1)
engine(138-139)
augur/application/config.py (1)
augur/application/db/util.py (2)
execute_session_query(31-43)convert_type_of_value(63-90)
augur/application/db/lib.py (1)
augur/application/db/util.py (2)
execute_session_query(31-43)convert_type_of_value(63-90)
🪛 markdownlint-cli2 (0.18.1)
README.md
14-14: Link text should be descriptive
(MD059, descriptive-link-text)
🪛 Ruff (0.14.2)
augur/tasks/github/facade_github/tasks.py
202-202: FacadeHelper may be undefined, or defined from star imports
(F405)
205-205: CollectionStatus may be undefined, or defined from star imports
(F405)
205-205: CollectionStatus may be undefined, or defined from star imports
(F405)
augur/tasks/util/random_key_auth.py
32-32: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
augur/tasks/github/releases/core.py
72-72: Release may be undefined, or defined from star imports
(F405)
72-72: Release may be undefined, or defined from star imports
(F405)
augur/tasks/init/celery_app.py
245-245: f-string without any placeholders
Remove extraneous f prefix
(F541)
248-248: f-string without any placeholders
Remove extraneous f prefix
(F541)
augur/application/schema/alembic/versions/34_add_contrib_to_config.py
10-10: from augur.application.config import * used; unable to detect undefined names
(F403)
25-25: AugurConfig may be undefined, or defined from star imports
(F405)
46-49: f-string without any placeholders
Remove extraneous f prefix
(F541)
52-55: f-string without any placeholders
Remove extraneous f prefix
(F541)
56-56: Do not use bare except
(E722)
56-57: try-except-pass detected, consider logging the exception
(S110)
augur/api/view/routes.py
226-226: render_module may be undefined, or defined from star imports
(F405)
augur/tasks/start_tasks.py
179-179: CollectionRequest may be undefined, or defined from star imports
(F405)
198-198: CollectionRequest may be undefined, or defined from star imports
(F405)
220-220: CollectionRequest may be undefined, or defined from star imports
(F405)
235-235: CollectionRequest may be undefined, or defined from star imports
(F405)
270-270: facade_phase may be undefined, or defined from star imports
(F405)
274-274: machine_learning_phase may be undefined, or defined from star imports
(F405)
augur/tasks/git/util/facade_worker/facade_worker/utilitymethods.py
110-110: subprocess call with shell=True identified, security issue
(S602)
augur/application/db/lib.py
252-252: Use raise without specifying exception name
Remove exception name
(TRY201)
🔇 Additional comments (40)
docs/source/getting-started/command-line-interface/db.rst (1)
81-88: Documentation example now correctly matches the specified format.The example CSV entries have been reordered to align with the documented format specification (
<git_repo_url>,<repo_group_id>), improving consistency between the specification and the worked example. The URLs and repo IDs are syntactically correct.Minor note: Line 66 contains a trailing comma after
<repo_group_id>,which does not appear in the actual examples—consider removing it for clarity if it's unintentional.augur/util/keys.py (1)
1-5: LGTM! Solid security utility implementation.The masking logic correctly handles edge cases by returning all stars when the key is too short or not a string, preventing inadvertent exposure of short secrets. The default parameters are sensible for typical API keys.
augur/tasks/util/random_key_auth.py (3)
27-30: Excellent security improvement!The validation for empty keys with explicit error logging prevents silent failures and improves debuggability.
32-32: Static analysis warning is a false positive.The S311 warning about
choice()not being suitable for cryptographic purposes can be safely ignored here. You're selecting from a pool of existing keys for load balancing/rotation, not generating cryptographic material. The security of the keys themselves doesn't depend on the randomness of the selection algorithm.
34-36: Significant security improvement by masking keys in logs.Logging the masked key instead of plaintext prevents accidental exposure of sensitive credentials in log files, which is a critical security best practice.
augur/api/metrics/repo_meta.py (2)
1244-1298: Change appears unrelated to PR objectives.The PR summary states the focus is on "fixing hardcoded collection intervals" in configuration and task collection code (config.py, start_tasks.py, collection_util.py). However, this change introduces a new metric endpoint for repository clone data, which doesn't appear related to collection interval configuration.
Can you clarify how this new
clonesmetric relates to the collection interval fixes mentioned in the PR description?
1244-1298: Database schema verified—no issues found.The table
augur_data.repo_clones_dataand all referenced columns (clone_data_timestamp,count_clones,unique_clones) exist across the schema definitions. The implementation follows established patterns, uses parameterized queries correctly, and integrates properly with the existing codebase including the data collection task and ORM model.augur/api/view/routes.py (1)
226-226: LGTM! Correctly removes server-side reports parameter.The change aligns with the PR's goal of moving from server-side report rendering to client-side dynamic loading. The template no longer expects reports or images to be passed from the server.
augur/tasks/github/releases/core.py (2)
26-26: LGTM! Consistent release_id normalization.The
release_idis now properly normalized by converting to string and stripping whitespace in both the non-tag-only and tag-only branches. This ensures consistency for duplicate checking downstream.Also applies to: 54-54
79-84: Good duplicate prevention logic, but depends on fixing line 74.The duplicate check is well-implemented with consistent normalization and efficient set membership testing. The early return avoids unnecessary database operations and provides clear logging.
However, this logic will only work correctly once the Row unpacking issue on line 74 is fixed.
augur/application/cli/github.py (1)
59-59: LGTM: Improved time formatting with timezone awareness.The changes enhance the CLI output by centering the reset time headers and adding timezone information to the displayed timestamps. The use of
astimezone()and the%Z (UTC%z)format string provides clearer context for users.Also applies to: 91-94
augur/tasks/git/facade_tasks.py (2)
124-124: LGTM: Simplified commit collection without date filtering.The removal of
start_datefromget_parent_commits_setcalls aligns with the broader simplification of the commit history collection process. This change is consistent with updates toaugur/tasks/git/util/facade_worker/facade_worker/utilitymethods.py.Also applies to: 247-247
500-508: LGTM: Simplified facade sequence construction.The change from building a list and returning
chain(*facade_sequence)to directly returning agroup()object simplifies the task orchestration pattern.augur/tasks/git/util/facade_worker/facade_worker/config.py (2)
130-130: LGTM: New configuration option added.The
facade_contributor_full_recollectconfiguration option is properly initialized from the worker_options.
170-177: LGTM: Improved log routing by level.The refactored
log_activitymethod now properly routes log messages based on their level:
- Error →
logger.error- Debug/Verbose →
logger.debug- Others →
logger.infoThis provides better log filtering and organization.
augur/application/db/session.py (2)
96-109: LGTM: Improved parameter naming and input normalization.Renaming
datatodata_inputimproves clarity, and the explicit else branch for normalizing input to a list is more robust.
171-212: The review comment is incorrect—the code already returnsNone, not the string"success".The function signature shows the return type as
Optional[List[dict]], and the actual code at line 213 returnsNone(the# successis merely a comment, not the return value). All callers in the codebase already check forNoneor use the returned list of dicts whenreturn_columnsis provided. There is no evidence of a"success"string being the previous return value, and no callers depend on such a value.Likely an incorrect or invalid review comment.
augur/application/db/lib.py (3)
16-16: LGTM: Proper separation of concerns.Moving
convert_type_of_valueimport toaugur.application.db.utilimproves code organization.
275-282: LGTM: Added configurability for batch size.The
batch_sizeparameter (defaulting to 1000) provides flexibility for tuning batch insert performance.
286-299: LGTM: Consistent parameter naming and return semantics.The rename from
datatodata_inputis consistent with changes inaugur/application/db/session.py, and the explicitNonereturn in the no-return-columns path clarifies the function's behavior.Also applies to: 401-403
augur/application/db/engine.py (1)
13-29: No action required—all callers have been properly updated.The search confirms that all four call sites of
parse_database_string()in the codebase properly unpack the tuple return value using destructuring assignment. The function signature change is not breaking because callers have already been updated to handle the five-element tuple(username, password, host, port, database).docker-compose.yml (1)
76-76: LGTM! Config directory now configurable.The addition of
CONFIG_DATADIRenables runtime configuration loading fromaugur.json, supporting the PR's goal of making collection intervals configurable.docker/keyman/Dockerfile (1)
4-6: LGTM! Dynamic versioning adopted.The switch from hardcoded version labels to build-time
VERSIONarguments is a good practice that enables consistent versioning across all Docker images via the CI workflow.docker/backend/Dockerfile (1)
23-25: LGTM! Consistent with dynamic versioning pattern.This change mirrors the keyman Dockerfile, ensuring all service images use the same build-time versioning approach.
augur/tasks/git/util/facade_worker/facade_worker/repofetch.py (1)
438-438: LGTM! More precise git reset command.Changing the reset target from
origintoorigin/{remotedefault}makes the command more explicit and correct by targeting the detected default branch rather than relying on the symbolicoriginref.docs/source/conf.py (1)
26-30: LGTM! Cleaner metadata import approach.Replacing
exec()with explicit imports is more maintainable and follows Python best practices. This change properly integrates with the new metadata.py versioning.docs/source/docker/docker-compose.rst (1)
29-29: LGTM! Documentation clarified.The updated wording better explains that all environment variables must be present in the
.envfile, with placeholder values acceptable for unused variables.docker/rabbitmq/definitions.json (1)
2-2: Verify RabbitMQ 4.1 compatibility with application dependencies and runtime behavior.The version bump from 3.12 to 4.1 introduces several breaking changes, including removal of classic queue replication, AMQP 1.0 as a core protocol, AMQP 0-9-1 frame_max increase from 4096 to 8192, and Erlang/OTP compatibility changes.
The definitions.json file itself contains no queue configurations or classic mirrored queue definitions, so it is compatible with the upgrade. However, verification is needed on:
- AMQP client libraries: Confirm that any AMQP clients used by the Augur application support the increased frame_max of 8192 bytes or are configured with frame_max ≥ 8192.
- Erlang/OTP version: Verify that the environment provides an Erlang/OTP version compatible with RabbitMQ 4.1.
- Runtime queue behavior: If queues are created or configured at runtime by the application code, ensure they don't rely on classic queue mirroring or other removed features.
docker/database/Dockerfile (1)
5-7: LGTM: Dynamic versioning implementation.The introduction of ARG VERSION and dynamic LABEL version=${VERSION} follows Docker best practices for build-time version injection. This change aligns with the workflow updates that extract the version from metadata.py and pass it as a build argument.
README.md (1)
1-1: LGTM: Version bump aligns with metadata.py.The version header update to v0.90.3 is consistent with the changes in metadata.py and reflects the centralized version management approach.
docker/rabbitmq/Dockerfile (2)
5-6: LGTM: Dynamic versioning implementation.The addition of ARG VERSION and LABEL version=${VERSION} follows the same pattern as other Dockerfiles in this PR, enabling consistent build-time version injection.
1-1: RabbitMQ 4.1 compatibility verified — all configuration and code dependencies are compatible.Verification results:
- Celery 5.5 (via kombu/py-amqp) is compatible with RabbitMQ 4.1, as Python AMQP clients default to the server-recommended frame_max and negotiate compatible values
- Config files (definitions.json, advanced.config, augur.conf) contain no unsupported patterns for 4.1
- No classic queue mirroring, etcd clustering, or incompatible plugins detected
- update_config.py script uses standard rabbitmqctl and is compatible
Note: If upgrading from RabbitMQ 3.12, follow the staged upgrade path (3.12 → 3.13 → 4.1) or use blue/green migration, as direct 3.12 → 4.1 upgrades are not supported.
metadata.py (1)
8-9: LGTM: Version constants updated correctly.The version bump to 0.90.3 is consistent across all related files and serves as the single source of truth for version information, as documented in the release process documentation.
augur/application/db/util.py (1)
63-90: Function correctly implements type coercion with error handling.The function provides appropriate type conversion for common config value types (str, int, bool, float) and includes error handling for unsupported types via logging or print statements.
.github/workflows/build_docker.yml (2)
66-71: LGTM: Version extraction centralizes version management.The version extraction from metadata.py and propagation as a GitHub Actions output enables consistent versioning across all Docker images, aligning with the centralized version management approach documented in the release process.
58-62: Cleanup step improves CI runner disk space management.The addition of cleanup to remove unnecessary files (/usr/share/dotnet, AGENT_TOOLSDIRECTORY) helps prevent disk space issues during Docker builds on GitHub-hosted runners.
pyproject.toml (3)
28-29: LGTM: Compatible release constraints improve dependency flexibility.Updating celery from
==5.2.7to~=5.5and click from==8.0.3to~=8.1allows patch-level updates while preventing breaking changes, following Python packaging best practices.
102-107: LGTM: Type checking infrastructure added to dev dependencies.The addition of mypy and type stubs (types-requests, types-pyyaml, types-python-dateutil, types-toml) enables static type checking for improved code quality and developer experience.
158-163: LGTM: Mypy configuration provides sensible starting point.The mypy configuration starts with a focused scope (augur/application/db/*.py) and permissive settings, which is a good approach for gradual type checking adoption. The configuration can be tightened over time as the codebase adds more type annotations.
docs/source/procedures/creating-releases.rst (1)
5-110: LGTM: Comprehensive documentation of release process.The documentation effectively describes:
- The main/release branch workflow and their purposes
- The new centralized version management approach using metadata.py
- How the CI/CD pipeline automatically propagates versions to Docker builds
- Special cases like hotfixes
This aligns well with the technical changes in the PR and provides clear guidance for future releases.
| query = session.query(CollectionStatus).filter(CollectionStatus.repo_id == repo.repo_id) | ||
| collection_status = execute_session_query(query,'one') | ||
| last_collected_date = collection_status.facade_data_last_collected if not facade_helper.facade_contributor_full_recollect else None | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix missing CollectionStatus import.
CollectionStatus is referenced here without being imported, so this code will raise a NameError the first time the task runs. Please import it alongside Contributor.
-from augur.application.db.models import Contributor
+from augur.application.db.models import CollectionStatus, Contributor📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| query = session.query(CollectionStatus).filter(CollectionStatus.repo_id == repo.repo_id) | |
| collection_status = execute_session_query(query,'one') | |
| last_collected_date = collection_status.facade_data_last_collected if not facade_helper.facade_contributor_full_recollect else None | |
| from augur.application.db.models import CollectionStatus, Contributor |
🧰 Tools
🪛 Ruff (0.14.2)
205-205: CollectionStatus may be undefined, or defined from star imports
(F405)
205-205: CollectionStatus may be undefined, or defined from star imports
(F405)
🤖 Prompt for AI Agents
In augur/tasks/github/facade_github/tasks.py around lines 205 to 208,
CollectionStatus is used but not imported which will raise a NameError; add
CollectionStatus to the import list where Contributor is imported (i.e., import
CollectionStatus alongside Contributor from the same module/package) so the
symbol is available for the query. Ensure the import follows existing import
style and is placed with other model imports at the top of the file.
Handle missing collection_status rows safely.
execute_session_query(query, 'one') raises when no row exists (common for freshly added repos before collection_status is populated), which will crash the task. Please fall back gracefully instead of assuming the record is present.
- query = session.query(CollectionStatus).filter(CollectionStatus.repo_id == repo.repo_id)
- collection_status = execute_session_query(query,'one')
- last_collected_date = collection_status.facade_data_last_collected if not facade_helper.facade_contributor_full_recollect else None
+ query = session.query(CollectionStatus).filter(CollectionStatus.repo_id == repo.repo_id)
+ collection_status = execute_session_query(query, 'first')
+ last_collected_date = None
+ if collection_status and not facade_helper.facade_contributor_full_recollect:
+ last_collected_date = collection_status.facade_data_last_collected🧰 Tools
🪛 Ruff (0.14.2)
205-205: CollectionStatus may be undefined, or defined from star imports
(F405)
205-205: CollectionStatus may be undefined, or defined from star imports
(F405)
🤖 Prompt for AI Agents
In augur/tasks/github/facade_github/tasks.py around lines 205 to 208, the code
assumes execute_session_query(query,'one') always returns a CollectionStatus and
will raise if none exists; update this to handle missing rows safely by either
using a query variant that returns None (e.g.,
execute_session_query(query,'one_or_none') or session.query(...).one_or_none())
or catching the exception returned by execute_session_query and treating a
missing row as None, then set last_collected_date = None when collection_status
is None (or when facade_helper.facade_contributor_full_recollect is True) so the
task continues without crashing for newly added repos.
| release_id_data = execute_session_query(query, 'all')#pd.read_sql(release_id_data_sql, self.db, params={'repo_id': repo_id}) | ||
| release_id_data = [str(r_id).strip() for r_id in release_id_data]#release_id_data.apply(lambda x: x.str.strip()) | ||
| release_id_data = execute_session_query(query, 'all') | ||
| existing_release_ids = {str(r_id).strip() for r_id in release_id_data} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical bug: The past review comment is still unaddressed.
The past reviewer correctly identified that line 74 has a Row unpacking issue. SQLAlchemy's query.all() returns Row objects, and for single-column queries, each Row must be indexed to extract the value.
Current code:
existing_release_ids = {str(r_id).strip() for r_id in release_id_data}This produces strings like "('some_id',)" instead of "some_id", which breaks the duplicate check on line 82.
Apply the fix suggested in the past review:
- existing_release_ids = {str(r_id).strip() for r_id in release_id_data}
+ existing_release_ids = {str(r_id[0]).strip() for r_id in release_id_data}🤖 Prompt for AI Agents
In augur/tasks/github/releases/core.py around line 74, the set comprehension
treats SQLAlchemy Row objects as strings, producing values like "('some_id',)";
update the comprehension to extract the first column from each Row (e.g., use
r_id[0]) and skip or filter out None values before converting to string and
stripping so existing_release_ids contains the actual ID strings (e.g., use
str(r_id[0]).strip() for each non-None row).
| **Python 3.9 is not yet supported because TensorFlow, which we use in our machine learning workers, does not yet support Python 3.9.** | ||
|
|
||
| Our REST API & data collection workers write in Python 3.6. We query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository, so GitLab and GitHub access tokens are **required** for data collection. | ||
| Our REST API & data collection workers query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository. Values for GitLab and GitHub access tokens are **required** for data collection and must be provided (an invalid token can be provided if you don't plan to use one platform) . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify the token requirement and fix punctuation.
The statement "must be provided (an invalid token can be provided if...)" is somewhat contradictory and could mislead users. Additionally, there's an extra space before the period at the end of the line.
Consider restructuring this sentence for clarity and to explicitly explain what "invalid token" means in context.
Apply this diff to improve clarity and fix punctuation:
-Our REST API & data collection workers query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository. Values for GitLab and GitHub access tokens are **required** for data collection and must be provided (an invalid token can be provided if you don't plan to use one platform) .
+Our REST API & data collection workers query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository. GitHub and GitLab access tokens are **required** for data collection. If you don't plan to use one platform, you may provide a placeholder or invalid token for that platform.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Our REST API & data collection workers query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository. Values for GitLab and GitHub access tokens are **required** for data collection and must be provided (an invalid token can be provided if you don't plan to use one platform) . | |
| Our REST API & data collection workers query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository. GitHub and GitLab access tokens are **required** for data collection. If you don't plan to use one platform, you may provide a placeholder or invalid token for that platform. |
🤖 Prompt for AI Agents
In docs/source/getting-started/installation.rst around line 34, clarify and
rephrase the sentence about GitHub/GitLab tokens and remove the stray space
before the period: state that values for GitHub and GitLab access tokens are
required fields for data collection, but if you do not intend to collect data
from one provider you may supply a placeholder or deliberately invalid token
(e.g., "none" or "placeholder") so the configuration remains valid; reword the
sentence to avoid contradiction and fix punctuation so there is no extra space
before the period.
Summary
Resolves hardcoded collection intervals by moving them to configuration, making them easily configurable without code changes.
Changes Made
Technical Details
Testing
Before/After Example
Fixes chaoss#3338
Summary by CodeRabbit
New Features
Removed Features
Documentation
Other