Skip to content

Conversation

@mohsinm-dev
Copy link
Owner

@mohsinm-dev mohsinm-dev commented Oct 25, 2025

Summary

Resolves hardcoded collection intervals by moving them to configuration, making them easily configurable without code changes.

Changes Made

  • Added collection interval settings to the Tasks configuration section
  • Created get_collection_intervals_from_config() helper function with fallback support
  • Removed unused days_until_collect_again parameters from build functions
  • Updated all CollectionRequest calls to use config-driven values instead of hardcoded ones

Technical Details

  • Files Modified: config.py, start_tasks.py, collection_util.py
  • Backward Compatible: Function signature changes don't break external calls
  • Robust Fallback: System works even if config section is missing
  • Follows Conventions: Uses existing Augur config patterns

Testing

  • Syntax validation passed for all modified files
  • Configuration structure verified
  • Function signatures updated correctly
  • No regressions detected in basic functionality

Before/After Example

# Before: Hardcoded value ignored parameter
def build_secondary_repo_collect_request(session, logger, enabled_phase_names, days_until_collect_again = 1):
    request = CollectionRequest("secondary", phases, max_repo=60, days_until_collect_again=10)

# After: Config-driven value
def build_secondary_repo_collect_request(session, logger, enabled_phase_names):
    intervals = get_collection_intervals_from_config()
    request = CollectionRequest("secondary", phases, max_repo=60, days_until_collect_again=intervals['secondary'])

Fixes chaoss#3338

Summary by CodeRabbit

  • New Features

    • Collection intervals for core, secondary, facade, and ML tasks are now configurable via app config (with sensible defaults).
    • New "clones" metric endpoint to retrieve daily clone totals and uniques.
  • Removed Features

    • Built-in contributor and pull-request report pages/images removed from server-side rendering and package exports.
  • Documentation

    • Added/updated docs on adjusting per-task re-run intervals and Docker/.env guidance.
  • Other

    • API key display now masks keys and duplicate releases are skipped during insert.

MoralCode and others added 30 commits July 15, 2025 09:14
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Fix for GitHub Issue chaoss#3194 where releases collection was failing with
psycopg2.errors.UniqueViolation on releases_pkey constraint.

Root cause: GitHub API returns release IDs with trailing spaces that don't
match existing trimmed database records.

Changes:
- Add str().strip() to release_id processing in get_release_inf()
- Enhance duplicate detection in insert_release() with proper trimming
- Add early duplicate detection to prevent unnecessary database operations

Signed-off-by: mohsinm-dev <mohsin.mdev@gmail.com>
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Signed-off-by: Ulincsys <ulincsys@gmail.com>
The keys on this dictionary are defined in:

analyzecommit.generate_commit_record()

- Update reference to use proper 'cmt_author_timestamp' key
- Add warning log when replacing TZdata to show commit hash

Signed-off-by: Ulincsys <ulincsys@gmail.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Add dockerfile for performing an empty database build
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
…roperly convert things into a chord

Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Sajal-Kulshreshtha <sajalkulshreshtha9@gmail.com>
Signed-off-by: Sajal-Kulshreshtha <sajalkulshreshtha9@gmail.com>
Signed-off-by: Sajal-Kulshreshtha <sajalkulshreshtha9@gmail.com>
Signed-off-by: Sajal-Kulshreshtha <sajalkulshreshtha9@gmail.com>
facade: refactor facade_phase to return the group itself so that celery can properly convert things into a chord
fix(cli): display timezone in output for GitHub key expiry (chaoss#3251)
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
…te-code

scheduling: refactor send_messages to de-duplicate code path
Signed-off-by: Sean P. Goggins <s@goggins.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Sean P. Goggins <s@goggins.com>
…-duplicate-code"

This reverts commit c8f22bb, reversing
changes made to fefb48a.
main: bump celery and a related dependency - attempt to fix secondary+core
Signed-off-by: PredictiveManish <manisht0914@gmail.com>
Signed-off-by: PredictiveManish <manisht0914@gmail.com>
Signed-off-by: PredictiveManish <manisht0914@gmail.com>
@coderabbitai
Copy link

coderabbitai bot commented Oct 25, 2025

Walkthrough

Added configurable per-task collection intervals and runtime config loading from augur.json; removed legacy contributor and pull request report routes and related view/report utilities; added a repo clones metric endpoint; moved/added config type conversion utility; introduced an Alembic migration and multiple Docker/versioning and documentation updates.

Changes

Cohort / File(s) Summary
Configuration core
augur/application/config.py, augur/application/db/util.py, docker-compose.yml
Config now supports runtime loading from augur.json (lookup via CONFIG_DATADIR), added facade_contributor_full_recollect and four Task interval keys, and moved/added convert_type_of_value into augur.application.db.util. CONFIG_DATADIR env var added to docker-compose.
Task scheduling / intervals
augur/tasks/start_tasks.py, augur/tasks/util/collection_util.py, augur/tasks/init/celery_app.py
start_tasks reads per-task interval values from AugurConfig and passes them into build_*_repo_collect_request; collection dispatch unified for GitHub/GitLab in collection_util; materialized view scheduling now respects mat_views_interval > 0.
Facade/gitrepo worker changes
augur/tasks/git/facade_tasks.py, augur/tasks/git/util/facade_worker/.../config.py, .../repofetch.py, .../utilitymethods.py, augur/tasks/github/facade_github/tasks.py
Facade worker gains facade_contributor_full_recollect flag; parent-commit collection removed start_date filtering; git reset now targets origin/{remotedefault}; facade contributor collection now filters by last_collected_date (via DB session) when applicable.
API removals / view simplification
augur/api/routes/contributor_reports.py (removed), augur/api/routes/pull_request_reports.py (removed), augur/api/routes/__init__.py, augur/api/view/init.py, augur/api/view/routes.py, augur/api/view/utils.py, augur/templates/repo-info.j2
Deleted contributor and pull-request report route modules; removed in-memory reports structure and report-loading/requesting utilities; repo view/template markup no longer renders server-side report images or validates reports.
New metric endpoint
augur/api/metrics/repo_meta.py
Added clones(repo_group_id, repo_id=None, begin_date=None, end_date=None) endpoint querying augur_data.repo_clones_data with optional date bounds.
Database / DB API changes
augur/application/db/lib.py, augur/application/db/session.py, augur/application/db/engine.py, augur/application/schema/alembic/versions/34_add_contrib_to_config.py
Bulk insert functions renamed/parameter-updated and return semantics aligned; parse_database_string now annotated to return a 5-tuple; new Alembic migration adds facade_contributor_full_recollect config key.
Utilities / keys / auth
augur/util/keys.py, augur/tasks/util/random_key_auth.py
Added mask_key utility and used it in random-key auth flow; auth flow now validates empty key lists and masks logged keys.
Metric/release/other logic tweaks
augur/tasks/github/releases/core.py, augur/application/db/models/augur_data.py, augur/application/db/data_parse.py
Release_id normalization and duplicate-skip guard; minor model logging/typo fixes; return type annotations tightened for several data-parse helpers.
Docker & CI versioning
docker/*/Dockerfile, .github/workflows/build_docker.yml, docker/rabbitmq/definitions.json
Dockerfiles switched to ARG VERSION + LABEL version=${VERSION}; rabbitmq base bumped to 4.1-management-alpine and definitions rabbit_version to "4.1"; CI workflow extracts version and passes it as build-arg.
Docs and metadata
docs/source/**, README.md, metadata.py, CITATION.cff, pyproject.toml
Multiple documentation edits (getting-started, docker, release process, API dev guide), version bump to 0.90.3 in metadata and README, dependency and mypy config updates, and a minor CITATION date change.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Monitor as augur_collection_monitor
  participant Config as AugurConfig
  participant Builders as CollectionRequestBuilders

  Note over Config: Tasks section contains\ncore, secondary, facade, ml\ncollection interval keys
  Monitor->>Config: get_section("Tasks")
  Config-->>Monitor: tasks_config (interval values)
  Monitor->>Builders: build_primary_repo_collect_request(days_until_collect_again=core_collection_interval_days)
  Monitor->>Builders: build_secondary_repo_collect_request(days_until_collect_again=secondary_collection_interval_days)
  Monitor->>Builders: build_facade_repo_collect_request(days_until_collect_again=facade_collection_interval_days)
  Monitor->>Builders: build_ml_repo_collect_request(days_until_collect_again=ml_collection_interval_days)
  Note right of Builders: Builders receive configurable\nper-task intervals instead of hard-coded literals
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay special attention to the large removals of report modules and view/report utilities to ensure no remaining imports or references cause runtime errors.
  • Verify config loading precedence (env CONFIG_DATADIR → augur.json) and type coercion behavior via convert_type_of_value.
  • Review facade worker changes for unintended behavior from removing start_date filters and the string-format bug noted in utilitymethods.
  • Confirm DB insert API signature changes are propagated to all call sites.
  • Validate Docker/CI changes correctly pass VERSION build-arg in all build steps.

Poem

🐰 I nibble configs, numbers in a row,

Core and facade and ML now know,
I masked a key, I pruned old views,
Bumped versions, patched the news,
Hoppity-hop — the codebase grows.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.48% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "Fix hardcoded collection intervals in multiple spots" directly corresponds to a significant and deliberate change documented in the PR objectives, which explicitly states the primary goal is to "replace hardcoded collection interval values with configuration-driven settings." The title accurately identifies this core feature by referring to the real and substantive modification of collection interval handling across the codebase. While the PR includes numerous other changes (Docker updates, API removals, documentation, database migrations, type fixes, etc.), the title effectively captures the main feature that the PR author intended to address, consistent with the principle that titles need not be comprehensive of every detail in the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-hardcoded-collection-intervals

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7548d2e and f829663.

📒 Files selected for processing (4)
  • augur/application/config.py (1 hunks)
  • augur/tasks/github/releases/core.py (3 hunks)
  • augur/tasks/start_tasks.py (5 hunks)
  • augur/tasks/util/collection_util.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
augur/tasks/util/collection_util.py (2)
augur/application/config.py (1)
  • get_section (159-182)
augur/application/db/lib.py (1)
  • get_section (52-77)
augur/tasks/start_tasks.py (1)
augur/tasks/util/collection_util.py (3)
  • get_collection_intervals_from_config (216-233)
  • CollectionRequest (116-151)
  • get_valid_repos (132-151)
augur/tasks/github/releases/core.py (1)
augur/application/db/models/augur_data.py (1)
  • Release (1832-1860)
🪛 Ruff (0.14.1)
augur/tasks/util/collection_util.py

226-226: Do not catch blind exception: Exception

(BLE001)

augur/tasks/start_tasks.py

159-159: Unused function argument: logger

(ARG001)


179-179: get_collection_intervals_from_config may be undefined, or defined from star imports

(F405)


180-180: CollectionRequest may be undefined, or defined from star imports

(F405)


184-184: Unused function argument: logger

(ARG001)


199-199: get_collection_intervals_from_config may be undefined, or defined from star imports

(F405)


200-200: CollectionRequest may be undefined, or defined from star imports

(F405)


206-206: Unused function argument: logger

(ARG001)


206-206: Unused function argument: enabled_phase_names

(ARG001)


223-223: get_collection_intervals_from_config may be undefined, or defined from star imports

(F405)


224-224: CollectionRequest may be undefined, or defined from star imports

(F405)


229-229: Unused function argument: logger

(ARG001)


229-229: Unused function argument: enabled_phase_names

(ARG001)


240-240: get_collection_intervals_from_config may be undefined, or defined from star imports

(F405)


241-241: CollectionRequest may be undefined, or defined from star imports

(F405)

augur/tasks/github/releases/core.py

72-72: Release may be undefined, or defined from star imports

(F405)


72-72: Release may be undefined, or defined from star imports

(F405)

🔇 Additional comments (8)
augur/application/config.py (1)

81-85: LGTM! Configuration keys properly defined.

The new collection interval keys are well-structured and align with the values used in the fallback logic of get_collection_intervals_from_config(). The existing collection_interval at line 81 remains for backward compatibility.

augur/tasks/github/releases/core.py (2)

26-26: LGTM! Release ID normalization is appropriate.

Converting release_id to string and stripping whitespace ensures consistent formatting for both regular releases and tag-only releases.

Also applies to: 54-54


81-84: Duplicate suppression logic is sound.

The early return prevents inserting duplicate releases, assuming the existing_release_ids set is populated correctly (contingent on the verification above).

augur/tasks/util/collection_util.py (1)

216-233: LGTM! Robust config retrieval with appropriate fallback.

The function properly retrieves collection intervals from configuration with per-key defaults and a fallback mechanism. The broad exception catch on line 226 is appropriate here to ensure graceful degradation if the config section is unavailable.

augur/tasks/start_tasks.py (4)

159-182: LGTM! Config-driven intervals properly integrated.

The function signature update and interval retrieval from config are implemented correctly. The intervals['core'] value is appropriately passed to the CollectionRequest constructor.


184-203: LGTM! Secondary collection intervals properly configured.

Consistent implementation with the primary collection request. The intervals['secondary'] value is correctly applied.


206-227: LGTM! Facade collection intervals properly configured.

The facade collection request correctly uses intervals['facade'] from the configuration.


229-243: LGTM! ML collection intervals properly configured.

The machine learning collection request correctly uses intervals['ml'] from the configuration, completing the consistent refactoring across all collection types.

@MoralCode
Copy link

Just as a heads up: you filed this Pull request against your fork of the augur repository, not the actual upstream augur repository (notice how this is pull request number 1 when the issue it is solving is chaoss#3000-ish?).

when making your pull request, make sure the "base branch" (usually on the left in the selection menu) is set to the chaoss/augur repository's main branch, not the main branch in your fork (which may just appear as main on its own)

MoralCode and others added 12 commits October 27, 2025 15:12
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
…format

Fixing the API key Visibility issue in debug mode
Co-authored-by: Adrian Edwards <17362949+MoralCode@users.noreply.github.com>
Signed-off-by: Manish Tiwari <manish.tiwari.09@zohomail.in>
Signed-off-by: Sean P. Goggins <s@goggins.com>
testing docker config for empty database
Fix: python3.9 compatibility message in docs chaoss#3266
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
@mohsinm-dev mohsinm-dev force-pushed the fix-hardcoded-collection-intervals branch from f829663 to 8201ed8 Compare October 30, 2025 05:06
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
augur/tasks/start_tasks.py (4)

159-159: Remove the unused days_until_collect_again parameter.

The PR description states that "unused days_until_collect_again parameters from build functions" were removed, but this parameter is still present in the function signature. Since the collection interval is now sourced from configuration (lines 178-181), this parameter is no longer used and should be removed to avoid confusion.

Apply this diff:

-def build_primary_repo_collect_request(session, logger, enabled_phase_names, days_until_collect_again = 15):
+def build_primary_repo_collect_request(session, logger, enabled_phase_names):

185-185: Remove the unused days_until_collect_again parameter.

Similar to the primary collection function, this parameter is no longer used since the interval is now config-driven (lines 200-203). Additionally, note that the parameter's default value (1) doesn't match the config fallback default (10), which could have caused confusion.

Apply this diff:

-def build_secondary_repo_collect_request(session, logger, enabled_phase_names, days_until_collect_again = 1):
+def build_secondary_repo_collect_request(session, logger, enabled_phase_names):

209-209: Remove the unused days_until_collect_again parameter.

This parameter is unused since the interval is now retrieved from configuration (lines 225-228).

Apply this diff:

-def build_facade_repo_collect_request(session, logger, enabled_phase_names, days_until_collect_again = 10):
+def build_facade_repo_collect_request(session, logger, enabled_phase_names):

233-233: Remove the unused days_until_collect_again parameter.

This parameter is unused since the interval is now retrieved from configuration (lines 243-246).

Apply this diff:

-def build_ml_repo_collect_request(session, logger, enabled_phase_names, days_until_collect_again = 40):
+def build_ml_repo_collect_request(session, logger, enabled_phase_names):
🧹 Nitpick comments (1)
augur/tasks/start_tasks.py (1)

178-181: Consider centralizing config retrieval to reduce duplication.

This pattern of fetching the Tasks configuration section is repeated in all four build functions (lines 178-181, 200-203, 225-228, 243-246). Consider retrieving the Tasks config once in the augur_collection_monitor function and passing the relevant intervals to each build function, or creating a helper function to encapsulate this logic.

Example approach - fetch config once in the calling function:

# In augur_collection_monitor function
tasks_config = get_section("Tasks")

# Then pass to build functions or extract a helper:
def get_collection_interval(interval_name, default):
    tasks_config = get_section("Tasks")
    return tasks_config.get(interval_name, default)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f829663 and 8201ed8.

📒 Files selected for processing (2)
  • augur/application/config.py (1 hunks)
  • augur/tasks/start_tasks.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • augur/application/config.py
🧰 Additional context used
🧬 Code graph analysis (1)
augur/tasks/start_tasks.py (2)
augur/application/db/lib.py (1)
  • get_section (52-77)
augur/tasks/util/collection_util.py (1)
  • CollectionRequest (116-151)
🪛 Ruff (0.14.2)
augur/tasks/start_tasks.py

181-181: CollectionRequest may be undefined, or defined from star imports

(F405)


203-203: CollectionRequest may be undefined, or defined from star imports

(F405)


228-228: CollectionRequest may be undefined, or defined from star imports

(F405)


246-246: CollectionRequest may be undefined, or defined from star imports

(F405)

🔇 Additional comments (2)
augur/tasks/start_tasks.py (2)

31-31: LGTM!

The import of get_section is necessary to fetch configuration values from the database.


159-248: Verification complete: the days_until_collect_again parameter is not used by external callers.

The search confirms that these build functions are only referenced within augur/tasks/start_tasks.py and are never imported elsewhere. All internal calls (lines 268, 271, 276, 279) pass only three arguments and do not use the days_until_collect_again parameter. The parameter can be safely removed from all four function signatures.

@MoralCode MoralCode force-pushed the fix-hardcoded-collection-intervals branch from 6e4db56 to b99a182 Compare October 31, 2025 13:23
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
augur/tasks/git/util/facade_worker/facade_worker/utilitymethods.py (1)

108-122: Critical: Fix string formatting in git command.

There's a critical string formatting issue on line 111. The format string contains "git --git-dir %s log..." but the string substitution uses % (absolute_repo_path) at the end of line 111, which expects a %s placeholder. However, the string only contains %%H (an escaped percent sign for the git format), not %s for the repo path.

This will cause the command to fail or behave incorrectly.

Apply this diff to fix the formatting:

 def get_parent_commits_set(absolute_repo_path):
 	
-	parents = subprocess.Popen(["git --git-dir %s log --ignore-missing "
-								"--pretty=format:'%%H'" % (absolute_repo_path)],
+	parents = subprocess.Popen([f"git --git-dir {absolute_repo_path} log --ignore-missing "
+								"--pretty=format:'%H'"],
 	stdout=subprocess.PIPE, shell=True)

Note: The static analysis tool also flagged the use of shell=True as a security concern. Consider using a list of arguments without shell=True if possible.

augur/application/db/lib.py (1)

226-273: Rewrite timezone validation logic - inverted condition will cause valid timestamps to fail.

The conditional at lines 249-252 is inverted. When a timezone offset is valid (found in postgres_valid_timezones), the code raises the original exception, preventing the record from being inserted. When invalid, it normalizes to UTC. This is backwards.

Current code:

if int(tzdata) not in postgres_valid_timezones:
    tzdata = "+0000"
else:
    raise e

Should be:

if int(tzdata) not in postgres_valid_timezones:
    tzdata = "+0000"

Remove the else: raise e clause entirely. If the timezone is valid, allow the record to proceed with the normalized timestamp. The original exception e indicates a database error—if timezone normalization resolves the issue (e.g., an invalid offset was causing PostgreSQL to reject the insert), the record should be successfully saved.

.github/workflows/build_docker.yml (1)

202-240: Add VERSION build-arg to Podman build steps.

The test-e2e-podman job's build steps (lines 202-240) do not pass the VERSION build-arg, even though the Dockerfiles now require it via ARG VERSION declarations. This will cause the Podman builds to fail or produce images without proper version labels.

The test-e2e-podman job needs the same version extraction step as test-e2e. Add after line 200:

 - name: Checkout repository
   uses: actions/checkout@v4
+
+- name: Extract project version
+  id: version
+  run: |
+    VERSION=$(python -c "exec(open('metadata.py').read()); print(__version__)")
+    echo "version=$VERSION" >> $GITHUB_OUTPUT
+    echo "Using version: $VERSION"

Then update each Podman build step to include build-args. For example:

 - name: Build database container
   uses: redhat-actions/buildah-build@v2
   with:
     context: .
     containerfiles: |
       ./docker/database/Dockerfile
+    build-args: |
+      VERSION=${{ steps.version.outputs.version }}
     platforms: linux/amd64
     tags: ghcr.io/${{ github.repository_owner }}/augur_database:test
     layers: true

Apply similar changes to the keyman, rabbitmq, and backend Podman build steps.

🧹 Nitpick comments (8)
augur/tasks/util/random_key_auth.py (1)

6-6: Remove unused import.

The hashlib import is not used anywhere in this file.

Apply this diff to remove the unused import:

-import hashlib
augur/api/view/routes.py (1)

7-7: Consider replacing star import with explicit imports.

The star import from .utils import * makes it unclear which symbols are imported and prevents static analysis tools from verifying their availability. Based on the code, render_module, renderRepos, and render_message appear to be the primary imports needed.

Apply this diff to use explicit imports:

-from .utils import *
+from .utils import render_module, renderRepos, render_message

Note: This is a pre-existing issue, not introduced by this PR.

augur/tasks/github/releases/core.py (1)

3-3: Consider replacing star import with explicit imports.

The star import makes it less clear where Release and other models are defined. Consider using explicit imports:

-from augur.application.db.models import *
+from augur.application.db.models import Release

This improves code clarity and helps static analysis tools. Based on static analysis hints.

augur/tasks/util/collection_util.py (1)

600-628: Consider more robust platform detection.

The current approach uses substring matching ("github" in repo.repo_git and "gitlab" in repo.repo_git) to determine the platform. This could lead to false positives if a repository URL contains these strings in unexpected ways (e.g., a self-hosted GitLab instance with "github" in its domain name).

Consider using a more robust detection method:

  • Check if Repo has a platform or repo_type field
  • Use URL parsing to extract and compare the hostname
  • Use a regex pattern for more precise matching

Example with URL parsing:

+from urllib.parse import urlparse
+
 repo = get_repo_by_repo_git(repo_git)
 platform_name = "github"
-# this needs to be here and not up a level since it should be set/reset for each repo.
-# otherwise a gitlab repo would reset it and cause subsequent github repos to use gitlab phases.
 phases = None
-if "github" in repo.repo_git:
+
+parsed_url = urlparse(repo.repo_git)
+hostname = parsed_url.hostname or ""
+
+if "github.com" in hostname:
     phases = col_hook.phases
-    # use default platform name
-
-elif "gitlab" in repo.repo_git:
+elif "gitlab.com" in hostname:
     platform_name = "gitlab"
     if col_hook.gitlab_phases is None:
         continue
     phases = col_hook.gitlab_phases
augur/tasks/git/util/facade_worker/facade_worker/config.py (1)

254-264: Consider removing commented-out code.

The get_last_collected_commit_date helper has been commented out. If this code is no longer needed, consider removing it entirely to keep the codebase clean. If it might be needed in the future, document why it's being kept.

augur/tasks/init/celery_app.py (1)

244-248: LGTM! Config-driven scheduling implemented correctly.

The conditional scheduling based on mat_views_interval aligns well with the PR's goal of making collection intervals configurable. The logic correctly schedules the task only when the interval is positive.

Minor style cleanup: Remove the f prefix from the log statements on lines 245 and 248 since they don't contain placeholders.

-            logger.info(f"Scheduling refresh materialized view every night at 1am CDT")
+            logger.info("Scheduling refresh materialized view every night at 1am CDT")
-            logger.info(f"Refresh materialized view task is disabled.")
+            logger.info("Refresh materialized view task is disabled.")
README.md (1)

14-14: Improve link accessibility with descriptive text.

The link text "here" is not descriptive for screen readers and users scanning the document. Consider using more descriptive text such as "the v0.90.3 release" or "version 0.90.3 on GitHub".

Apply this diff to improve accessibility:

-Augur is now releasing a dramatically improved new version. It is also available [here](https://github.com/chaoss/augur/releases/tag/v0.90.3).
+Augur is now releasing a dramatically improved new version. It is also available [on GitHub as v0.90.3](https://github.com/chaoss/augur/releases/tag/v0.90.3).

Based on static analysis (markdownlint).

augur/application/db/util.py (1)

73-79: Enhance boolean parsing logic for robustness.

The current boolean parsing only recognizes lowercase "false" as False, while all other values (including empty strings, "False", "FALSE", "0", "no", "off") are treated as True. This could lead to unexpected configuration behavior.

Consider a more robust implementation:

 elif data_type == "bool":
     value = config_dict["value"]
     
-    if value.lower() == "false":
+    if isinstance(value, bool):
+        config_dict["value"] = value
+    elif str(value).lower() in ("false", "0", "no", "off", ""):
         config_dict["value"] = False
     else:
         config_dict["value"] = True

This handles common false-like strings and prevents empty strings from being treated as True.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5751600 and b99a182.

⛔ Files ignored due to path filters (3)
  • augur/application/schema/repo_load_sample.csv is excluded by !**/*.csv
  • tests/test_workers/test_facade/test_facade_contributor_interface/test_repos.csv is excluded by !**/*.csv
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (50)
  • .github/workflows/build_docker.yml (5 hunks)
  • CITATION.cff (1 hunks)
  • README.md (2 hunks)
  • augur/api/metrics/repo_meta.py (1 hunks)
  • augur/api/routes/__init__.py (0 hunks)
  • augur/api/routes/contributor_reports.py (0 hunks)
  • augur/api/routes/pull_request_reports.py (0 hunks)
  • augur/api/view/init.py (0 hunks)
  • augur/api/view/routes.py (1 hunks)
  • augur/api/view/utils.py (0 hunks)
  • augur/application/cli/github.py (2 hunks)
  • augur/application/config.py (4 hunks)
  • augur/application/db/data_parse.py (4 hunks)
  • augur/application/db/engine.py (1 hunks)
  • augur/application/db/lib.py (4 hunks)
  • augur/application/db/models/augur_data.py (3 hunks)
  • augur/application/db/session.py (3 hunks)
  • augur/application/db/util.py (1 hunks)
  • augur/application/schema/alembic/versions/34_add_contrib_to_config.py (1 hunks)
  • augur/application/util.py (1 hunks)
  • augur/tasks/git/facade_tasks.py (3 hunks)
  • augur/tasks/git/util/facade_worker/facade_worker/config.py (3 hunks)
  • augur/tasks/git/util/facade_worker/facade_worker/repofetch.py (1 hunks)
  • augur/tasks/git/util/facade_worker/facade_worker/utilitymethods.py (1 hunks)
  • augur/tasks/github/facade_github/tasks.py (7 hunks)
  • augur/tasks/github/releases/core.py (3 hunks)
  • augur/tasks/init/celery_app.py (1 hunks)
  • augur/tasks/start_tasks.py (6 hunks)
  • augur/tasks/util/collection_util.py (1 hunks)
  • augur/tasks/util/random_key_auth.py (1 hunks)
  • augur/templates/repo-info.j2 (1 hunks)
  • augur/util/keys.py (1 hunks)
  • docker-compose.yml (1 hunks)
  • docker/backend/Dockerfile (1 hunks)
  • docker/database/Dockerfile (1 hunks)
  • docker/keyman/Dockerfile (1 hunks)
  • docker/rabbitmq/Dockerfile (1 hunks)
  • docker/rabbitmq/definitions.json (1 hunks)
  • docs/source/conf.py (1 hunks)
  • docs/source/development-guide/create-a-metric/api-development.rst (0 hunks)
  • docs/source/docker/docker-compose.rst (1 hunks)
  • docs/source/docker/getting-started.rst (1 hunks)
  • docs/source/docker/quick-start.rst (1 hunks)
  • docs/source/getting-started/collecting-data.rst (1 hunks)
  • docs/source/getting-started/command-line-interface/db.rst (1 hunks)
  • docs/source/getting-started/installation.rst (1 hunks)
  • docs/source/getting-started/using-docker.rst (2 hunks)
  • docs/source/procedures/creating-releases.rst (1 hunks)
  • metadata.py (1 hunks)
  • pyproject.toml (3 hunks)
💤 Files with no reviewable changes (6)
  • augur/api/view/init.py
  • docs/source/development-guide/create-a-metric/api-development.rst
  • augur/api/view/utils.py
  • augur/api/routes/contributor_reports.py
  • augur/api/routes/init.py
  • augur/api/routes/pull_request_reports.py
✅ Files skipped from review due to trivial changes (6)
  • docs/source/getting-started/using-docker.rst
  • docs/source/docker/quick-start.rst
  • docs/source/docker/getting-started.rst
  • CITATION.cff
  • augur/application/util.py
  • augur/application/db/models/augur_data.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/source/getting-started/collecting-data.rst
🧰 Additional context used
🧬 Code graph analysis (11)
augur/tasks/github/facade_github/tasks.py (4)
augur/application/db/__init__.py (1)
  • get_session (29-39)
augur/application/db/util.py (1)
  • execute_session_query (31-43)
augur/tasks/git/util/facade_worker/facade_worker/config.py (1)
  • FacadeHelper (81-253)
augur/application/db/models/augur_operations.py (1)
  • CollectionStatus (1139-1283)
augur/tasks/util/random_key_auth.py (1)
augur/util/keys.py (1)
  • mask_key (1-5)
augur/tasks/github/releases/core.py (2)
augur/application/db/models/augur_data.py (1)
  • Release (1832-1860)
augur/application/db/util.py (1)
  • execute_session_query (31-43)
augur/tasks/git/facade_tasks.py (4)
augur/tasks/git/util/facade_worker/facade_worker/utilitymethods.py (1)
  • get_parent_commits_set (108-122)
augur/tasks/git/dependency_tasks/tasks.py (1)
  • process_dependency_metrics (10-14)
augur/tasks/git/dependency_libyear_tasks/tasks.py (1)
  • process_libyear_dependency_metrics (7-12)
augur/tasks/git/scc_value_tasks/tasks.py (1)
  • process_scc_value_metrics (9-13)
augur/tasks/init/celery_app.py (1)
augur/tasks/db/refresh_materialized_views.py (1)
  • refresh_materialized_views (12-210)
augur/application/schema/alembic/versions/34_add_contrib_to_config.py (2)
augur/application/db/session.py (1)
  • DatabaseSession (45-283)
augur/application/config.py (2)
  • AugurConfig (118-368)
  • add_section_from_json (288-314)
augur/api/view/routes.py (1)
augur/api/view/utils.py (1)
  • render_module (222-224)
augur/tasks/start_tasks.py (2)
augur/application/config.py (2)
  • AugurConfig (118-368)
  • get_value (163-188)
augur/tasks/util/collection_util.py (1)
  • CollectionRequest (116-151)
augur/api/metrics/repo_meta.py (2)
augur/api/util.py (1)
  • register_metric (54-83)
augur/application/db/engine.py (1)
  • engine (138-139)
augur/application/config.py (1)
augur/application/db/util.py (2)
  • execute_session_query (31-43)
  • convert_type_of_value (63-90)
augur/application/db/lib.py (1)
augur/application/db/util.py (2)
  • execute_session_query (31-43)
  • convert_type_of_value (63-90)
🪛 markdownlint-cli2 (0.18.1)
README.md

14-14: Link text should be descriptive

(MD059, descriptive-link-text)

🪛 Ruff (0.14.2)
augur/tasks/github/facade_github/tasks.py

202-202: FacadeHelper may be undefined, or defined from star imports

(F405)


205-205: CollectionStatus may be undefined, or defined from star imports

(F405)


205-205: CollectionStatus may be undefined, or defined from star imports

(F405)

augur/tasks/util/random_key_auth.py

32-32: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

augur/tasks/github/releases/core.py

72-72: Release may be undefined, or defined from star imports

(F405)


72-72: Release may be undefined, or defined from star imports

(F405)

augur/tasks/init/celery_app.py

245-245: f-string without any placeholders

Remove extraneous f prefix

(F541)


248-248: f-string without any placeholders

Remove extraneous f prefix

(F541)

augur/application/schema/alembic/versions/34_add_contrib_to_config.py

10-10: from augur.application.config import * used; unable to detect undefined names

(F403)


25-25: AugurConfig may be undefined, or defined from star imports

(F405)


46-49: f-string without any placeholders

Remove extraneous f prefix

(F541)


52-55: f-string without any placeholders

Remove extraneous f prefix

(F541)


56-56: Do not use bare except

(E722)


56-57: try-except-pass detected, consider logging the exception

(S110)

augur/api/view/routes.py

226-226: render_module may be undefined, or defined from star imports

(F405)

augur/tasks/start_tasks.py

179-179: CollectionRequest may be undefined, or defined from star imports

(F405)


198-198: CollectionRequest may be undefined, or defined from star imports

(F405)


220-220: CollectionRequest may be undefined, or defined from star imports

(F405)


235-235: CollectionRequest may be undefined, or defined from star imports

(F405)


270-270: facade_phase may be undefined, or defined from star imports

(F405)


274-274: machine_learning_phase may be undefined, or defined from star imports

(F405)

augur/tasks/git/util/facade_worker/facade_worker/utilitymethods.py

110-110: subprocess call with shell=True identified, security issue

(S602)

augur/application/db/lib.py

252-252: Use raise without specifying exception name

Remove exception name

(TRY201)

🔇 Additional comments (40)
docs/source/getting-started/command-line-interface/db.rst (1)

81-88: Documentation example now correctly matches the specified format.

The example CSV entries have been reordered to align with the documented format specification (<git_repo_url>,<repo_group_id>), improving consistency between the specification and the worked example. The URLs and repo IDs are syntactically correct.

Minor note: Line 66 contains a trailing comma after <repo_group_id>, which does not appear in the actual examples—consider removing it for clarity if it's unintentional.

augur/util/keys.py (1)

1-5: LGTM! Solid security utility implementation.

The masking logic correctly handles edge cases by returning all stars when the key is too short or not a string, preventing inadvertent exposure of short secrets. The default parameters are sensible for typical API keys.

augur/tasks/util/random_key_auth.py (3)

27-30: Excellent security improvement!

The validation for empty keys with explicit error logging prevents silent failures and improves debuggability.


32-32: Static analysis warning is a false positive.

The S311 warning about choice() not being suitable for cryptographic purposes can be safely ignored here. You're selecting from a pool of existing keys for load balancing/rotation, not generating cryptographic material. The security of the keys themselves doesn't depend on the randomness of the selection algorithm.


34-36: Significant security improvement by masking keys in logs.

Logging the masked key instead of plaintext prevents accidental exposure of sensitive credentials in log files, which is a critical security best practice.

augur/api/metrics/repo_meta.py (2)

1244-1298: Change appears unrelated to PR objectives.

The PR summary states the focus is on "fixing hardcoded collection intervals" in configuration and task collection code (config.py, start_tasks.py, collection_util.py). However, this change introduces a new metric endpoint for repository clone data, which doesn't appear related to collection interval configuration.

Can you clarify how this new clones metric relates to the collection interval fixes mentioned in the PR description?


1244-1298: Database schema verified—no issues found.

The table augur_data.repo_clones_data and all referenced columns (clone_data_timestamp, count_clones, unique_clones) exist across the schema definitions. The implementation follows established patterns, uses parameterized queries correctly, and integrates properly with the existing codebase including the data collection task and ORM model.

augur/api/view/routes.py (1)

226-226: LGTM! Correctly removes server-side reports parameter.

The change aligns with the PR's goal of moving from server-side report rendering to client-side dynamic loading. The template no longer expects reports or images to be passed from the server.

augur/tasks/github/releases/core.py (2)

26-26: LGTM! Consistent release_id normalization.

The release_id is now properly normalized by converting to string and stripping whitespace in both the non-tag-only and tag-only branches. This ensures consistency for duplicate checking downstream.

Also applies to: 54-54


79-84: Good duplicate prevention logic, but depends on fixing line 74.

The duplicate check is well-implemented with consistent normalization and efficient set membership testing. The early return avoids unnecessary database operations and provides clear logging.

However, this logic will only work correctly once the Row unpacking issue on line 74 is fixed.

augur/application/cli/github.py (1)

59-59: LGTM: Improved time formatting with timezone awareness.

The changes enhance the CLI output by centering the reset time headers and adding timezone information to the displayed timestamps. The use of astimezone() and the %Z (UTC%z) format string provides clearer context for users.

Also applies to: 91-94

augur/tasks/git/facade_tasks.py (2)

124-124: LGTM: Simplified commit collection without date filtering.

The removal of start_date from get_parent_commits_set calls aligns with the broader simplification of the commit history collection process. This change is consistent with updates to augur/tasks/git/util/facade_worker/facade_worker/utilitymethods.py.

Also applies to: 247-247


500-508: LGTM: Simplified facade sequence construction.

The change from building a list and returning chain(*facade_sequence) to directly returning a group() object simplifies the task orchestration pattern.

augur/tasks/git/util/facade_worker/facade_worker/config.py (2)

130-130: LGTM: New configuration option added.

The facade_contributor_full_recollect configuration option is properly initialized from the worker_options.


170-177: LGTM: Improved log routing by level.

The refactored log_activity method now properly routes log messages based on their level:

  • Error → logger.error
  • Debug/Verbose → logger.debug
  • Others → logger.info

This provides better log filtering and organization.

augur/application/db/session.py (2)

96-109: LGTM: Improved parameter naming and input normalization.

Renaming data to data_input improves clarity, and the explicit else branch for normalizing input to a list is more robust.


171-212: The review comment is incorrect—the code already returns None, not the string "success".

The function signature shows the return type as Optional[List[dict]], and the actual code at line 213 returns None (the # success is merely a comment, not the return value). All callers in the codebase already check for None or use the returned list of dicts when return_columns is provided. There is no evidence of a "success" string being the previous return value, and no callers depend on such a value.

Likely an incorrect or invalid review comment.

augur/application/db/lib.py (3)

16-16: LGTM: Proper separation of concerns.

Moving convert_type_of_value import to augur.application.db.util improves code organization.


275-282: LGTM: Added configurability for batch size.

The batch_size parameter (defaulting to 1000) provides flexibility for tuning batch insert performance.


286-299: LGTM: Consistent parameter naming and return semantics.

The rename from data to data_input is consistent with changes in augur/application/db/session.py, and the explicit None return in the no-return-columns path clarifies the function's behavior.

Also applies to: 401-403

augur/application/db/engine.py (1)

13-29: No action required—all callers have been properly updated.

The search confirms that all four call sites of parse_database_string() in the codebase properly unpack the tuple return value using destructuring assignment. The function signature change is not breaking because callers have already been updated to handle the five-element tuple (username, password, host, port, database).

docker-compose.yml (1)

76-76: LGTM! Config directory now configurable.

The addition of CONFIG_DATADIR enables runtime configuration loading from augur.json, supporting the PR's goal of making collection intervals configurable.

docker/keyman/Dockerfile (1)

4-6: LGTM! Dynamic versioning adopted.

The switch from hardcoded version labels to build-time VERSION arguments is a good practice that enables consistent versioning across all Docker images via the CI workflow.

docker/backend/Dockerfile (1)

23-25: LGTM! Consistent with dynamic versioning pattern.

This change mirrors the keyman Dockerfile, ensuring all service images use the same build-time versioning approach.

augur/tasks/git/util/facade_worker/facade_worker/repofetch.py (1)

438-438: LGTM! More precise git reset command.

Changing the reset target from origin to origin/{remotedefault} makes the command more explicit and correct by targeting the detected default branch rather than relying on the symbolic origin ref.

docs/source/conf.py (1)

26-30: LGTM! Cleaner metadata import approach.

Replacing exec() with explicit imports is more maintainable and follows Python best practices. This change properly integrates with the new metadata.py versioning.

docs/source/docker/docker-compose.rst (1)

29-29: LGTM! Documentation clarified.

The updated wording better explains that all environment variables must be present in the .env file, with placeholder values acceptable for unused variables.

docker/rabbitmq/definitions.json (1)

2-2: Verify RabbitMQ 4.1 compatibility with application dependencies and runtime behavior.

The version bump from 3.12 to 4.1 introduces several breaking changes, including removal of classic queue replication, AMQP 1.0 as a core protocol, AMQP 0-9-1 frame_max increase from 4096 to 8192, and Erlang/OTP compatibility changes.

The definitions.json file itself contains no queue configurations or classic mirrored queue definitions, so it is compatible with the upgrade. However, verification is needed on:

  • AMQP client libraries: Confirm that any AMQP clients used by the Augur application support the increased frame_max of 8192 bytes or are configured with frame_max ≥ 8192.
  • Erlang/OTP version: Verify that the environment provides an Erlang/OTP version compatible with RabbitMQ 4.1.
  • Runtime queue behavior: If queues are created or configured at runtime by the application code, ensure they don't rely on classic queue mirroring or other removed features.
docker/database/Dockerfile (1)

5-7: LGTM: Dynamic versioning implementation.

The introduction of ARG VERSION and dynamic LABEL version=${VERSION} follows Docker best practices for build-time version injection. This change aligns with the workflow updates that extract the version from metadata.py and pass it as a build argument.

README.md (1)

1-1: LGTM: Version bump aligns with metadata.py.

The version header update to v0.90.3 is consistent with the changes in metadata.py and reflects the centralized version management approach.

docker/rabbitmq/Dockerfile (2)

5-6: LGTM: Dynamic versioning implementation.

The addition of ARG VERSION and LABEL version=${VERSION} follows the same pattern as other Dockerfiles in this PR, enabling consistent build-time version injection.


1-1: RabbitMQ 4.1 compatibility verified — all configuration and code dependencies are compatible.

Verification results:

  • Celery 5.5 (via kombu/py-amqp) is compatible with RabbitMQ 4.1, as Python AMQP clients default to the server-recommended frame_max and negotiate compatible values
  • Config files (definitions.json, advanced.config, augur.conf) contain no unsupported patterns for 4.1
  • No classic queue mirroring, etcd clustering, or incompatible plugins detected
  • update_config.py script uses standard rabbitmqctl and is compatible

Note: If upgrading from RabbitMQ 3.12, follow the staged upgrade path (3.12 → 3.13 → 4.1) or use blue/green migration, as direct 3.12 → 4.1 upgrades are not supported.

metadata.py (1)

8-9: LGTM: Version constants updated correctly.

The version bump to 0.90.3 is consistent across all related files and serves as the single source of truth for version information, as documented in the release process documentation.

augur/application/db/util.py (1)

63-90: Function correctly implements type coercion with error handling.

The function provides appropriate type conversion for common config value types (str, int, bool, float) and includes error handling for unsupported types via logging or print statements.

.github/workflows/build_docker.yml (2)

66-71: LGTM: Version extraction centralizes version management.

The version extraction from metadata.py and propagation as a GitHub Actions output enables consistent versioning across all Docker images, aligning with the centralized version management approach documented in the release process.


58-62: Cleanup step improves CI runner disk space management.

The addition of cleanup to remove unnecessary files (/usr/share/dotnet, AGENT_TOOLSDIRECTORY) helps prevent disk space issues during Docker builds on GitHub-hosted runners.

pyproject.toml (3)

28-29: LGTM: Compatible release constraints improve dependency flexibility.

Updating celery from ==5.2.7 to ~=5.5 and click from ==8.0.3 to ~=8.1 allows patch-level updates while preventing breaking changes, following Python packaging best practices.


102-107: LGTM: Type checking infrastructure added to dev dependencies.

The addition of mypy and type stubs (types-requests, types-pyyaml, types-python-dateutil, types-toml) enables static type checking for improved code quality and developer experience.


158-163: LGTM: Mypy configuration provides sensible starting point.

The mypy configuration starts with a focused scope (augur/application/db/*.py) and permissive settings, which is a good approach for gradual type checking adoption. The configuration can be tightened over time as the codebase adds more type annotations.

docs/source/procedures/creating-releases.rst (1)

5-110: LGTM: Comprehensive documentation of release process.

The documentation effectively describes:

  • The main/release branch workflow and their purposes
  • The new centralized version management approach using metadata.py
  • How the CI/CD pipeline automatically propagates versions to Docker builds
  • Special cases like hotfixes

This aligns well with the technical changes in the PR and provides clear guidance for future releases.

Comment on lines +205 to 208
query = session.query(CollectionStatus).filter(CollectionStatus.repo_id == repo.repo_id)
collection_status = execute_session_query(query,'one')
last_collected_date = collection_status.facade_data_last_collected if not facade_helper.facade_contributor_full_recollect else None

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix missing CollectionStatus import.

CollectionStatus is referenced here without being imported, so this code will raise a NameError the first time the task runs. Please import it alongside Contributor.

-from augur.application.db.models import Contributor
+from augur.application.db.models import CollectionStatus, Contributor
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
query = session.query(CollectionStatus).filter(CollectionStatus.repo_id == repo.repo_id)
collection_status = execute_session_query(query,'one')
last_collected_date = collection_status.facade_data_last_collected if not facade_helper.facade_contributor_full_recollect else None
from augur.application.db.models import CollectionStatus, Contributor
🧰 Tools
🪛 Ruff (0.14.2)

205-205: CollectionStatus may be undefined, or defined from star imports

(F405)


205-205: CollectionStatus may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
In augur/tasks/github/facade_github/tasks.py around lines 205 to 208,
CollectionStatus is used but not imported which will raise a NameError; add
CollectionStatus to the import list where Contributor is imported (i.e., import
CollectionStatus alongside Contributor from the same module/package) so the
symbol is available for the query. Ensure the import follows existing import
style and is placed with other model imports at the top of the file.

⚠️ Potential issue | 🟠 Major

Handle missing collection_status rows safely.

execute_session_query(query, 'one') raises when no row exists (common for freshly added repos before collection_status is populated), which will crash the task. Please fall back gracefully instead of assuming the record is present.

-        query = session.query(CollectionStatus).filter(CollectionStatus.repo_id == repo.repo_id)
-        collection_status = execute_session_query(query,'one')
-        last_collected_date = collection_status.facade_data_last_collected if not facade_helper.facade_contributor_full_recollect else None
+        query = session.query(CollectionStatus).filter(CollectionStatus.repo_id == repo.repo_id)
+        collection_status = execute_session_query(query, 'first')
+        last_collected_date = None
+        if collection_status and not facade_helper.facade_contributor_full_recollect:
+            last_collected_date = collection_status.facade_data_last_collected
🧰 Tools
🪛 Ruff (0.14.2)

205-205: CollectionStatus may be undefined, or defined from star imports

(F405)


205-205: CollectionStatus may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
In augur/tasks/github/facade_github/tasks.py around lines 205 to 208, the code
assumes execute_session_query(query,'one') always returns a CollectionStatus and
will raise if none exists; update this to handle missing rows safely by either
using a query variant that returns None (e.g.,
execute_session_query(query,'one_or_none') or session.query(...).one_or_none())
or catching the exception returned by execute_session_query and treating a
missing row as None, then set last_collected_date = None when collection_status
is None (or when facade_helper.facade_contributor_full_recollect is True) so the
task continues without crashing for newly added repos.

release_id_data = execute_session_query(query, 'all')#pd.read_sql(release_id_data_sql, self.db, params={'repo_id': repo_id})
release_id_data = [str(r_id).strip() for r_id in release_id_data]#release_id_data.apply(lambda x: x.str.strip())
release_id_data = execute_session_query(query, 'all')
existing_release_ids = {str(r_id).strip() for r_id in release_id_data}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical bug: The past review comment is still unaddressed.

The past reviewer correctly identified that line 74 has a Row unpacking issue. SQLAlchemy's query.all() returns Row objects, and for single-column queries, each Row must be indexed to extract the value.

Current code:

existing_release_ids = {str(r_id).strip() for r_id in release_id_data}

This produces strings like "('some_id',)" instead of "some_id", which breaks the duplicate check on line 82.

Apply the fix suggested in the past review:

-    existing_release_ids = {str(r_id).strip() for r_id in release_id_data}
+    existing_release_ids = {str(r_id[0]).strip() for r_id in release_id_data}
🤖 Prompt for AI Agents
In augur/tasks/github/releases/core.py around line 74, the set comprehension
treats SQLAlchemy Row objects as strings, producing values like "('some_id',)";
update the comprehension to extract the first column from each Row (e.g., use
r_id[0]) and skip or filter out None values before converting to string and
stripping so existing_release_ids contains the actual ID strings (e.g., use
str(r_id[0]).strip() for each non-None row).

**Python 3.9 is not yet supported because TensorFlow, which we use in our machine learning workers, does not yet support Python 3.9.**

Our REST API & data collection workers write in Python 3.6. We query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository, so GitLab and GitHub access tokens are **required** for data collection.
Our REST API & data collection workers query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository. Values for GitLab and GitHub access tokens are **required** for data collection and must be provided (an invalid token can be provided if you don't plan to use one platform) .
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify the token requirement and fix punctuation.

The statement "must be provided (an invalid token can be provided if...)" is somewhat contradictory and could mislead users. Additionally, there's an extra space before the period at the end of the line.

Consider restructuring this sentence for clarity and to explicitly explain what "invalid token" means in context.

Apply this diff to improve clarity and fix punctuation:

-Our REST API & data collection workers query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository. Values for GitLab and GitHub access tokens are **required** for data collection and must be provided (an invalid token can be provided if you don't plan to use one platform) .
+Our REST API & data collection workers query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository. GitHub and GitLab access tokens are **required** for data collection. If you don't plan to use one platform, you may provide a placeholder or invalid token for that platform.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Our REST API & data collection workers query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository. Values for GitLab and GitHub access tokens are **required** for data collection and must be provided (an invalid token can be provided if you don't plan to use one platform) .
Our REST API & data collection workers query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository. GitHub and GitLab access tokens are **required** for data collection. If you don't plan to use one platform, you may provide a placeholder or invalid token for that platform.
🤖 Prompt for AI Agents
In docs/source/getting-started/installation.rst around line 34, clarify and
rephrase the sentence about GitHub/GitLab tokens and remove the stray space
before the period: state that values for GitHub and GitLab access tokens are
required fields for data collection, but if you do not intend to collect data
from one provider you may supply a placeholder or deliberately invalid token
(e.g., "none" or "placeholder") so the configuration remains valid; reword the
sentence to avoid contradiction and fix punctuation so there is no extra space
before the period.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Collection intervals hard coded in multiple spots