Fix hardcoded collection intervals in multiple spots #1

mohsinm-dev · 2025-10-25T12:52:26Z

Summary

Resolves hardcoded collection intervals by moving them to configuration, making them easily configurable without code changes.

Changes Made

Added collection interval settings to the Tasks configuration section
Created get_collection_intervals_from_config() helper function with fallback support
Removed unused days_until_collect_again parameters from build functions
Updated all CollectionRequest calls to use config-driven values instead of hardcoded ones

Technical Details

Files Modified: config.py, start_tasks.py, collection_util.py
Backward Compatible: Function signature changes don't break external calls
Robust Fallback: System works even if config section is missing
Follows Conventions: Uses existing Augur config patterns

Testing

Syntax validation passed for all modified files
Configuration structure verified
Function signatures updated correctly
No regressions detected in basic functionality

Before/After Example

# Before: Hardcoded value ignored parameter
def build_secondary_repo_collect_request(session, logger, enabled_phase_names, days_until_collect_again = 1):
    request = CollectionRequest("secondary", phases, max_repo=60, days_until_collect_again=10)

# After: Config-driven value
def build_secondary_repo_collect_request(session, logger, enabled_phase_names):
    intervals = get_collection_intervals_from_config()
    request = CollectionRequest("secondary", phases, max_repo=60, days_until_collect_again=intervals['secondary'])

Fixes chaoss#3338

Summary by CodeRabbit

New Features
- Collection intervals for core, secondary, facade, and ML tasks are now configurable via app config (with sensible defaults).
- New "clones" metric endpoint to retrieve daily clone totals and uniques.
Removed Features
- Built-in contributor and pull-request report pages/images removed from server-side rendering and package exports.
Documentation
- Added/updated docs on adjusting per-task re-run intervals and Docker/.env guidance.
Other
- API key display now masks keys and duplicate releases are skipped during insert.

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

Fix for GitHub Issue chaoss#3194 where releases collection was failing with psycopg2.errors.UniqueViolation on releases_pkey constraint. Root cause: GitHub API returns release IDs with trailing spaces that don't match existing trimmed database records. Changes: - Add str().strip() to release_id processing in get_release_inf() - Enhance duplicate detection in insert_release() with proper trimming - Add early duplicate detection to prevent unnecessary database operations Signed-off-by: mohsinm-dev <mohsin.mdev@gmail.com>

deduplicate convert_type_of_value

Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>

Signed-off-by: Ulincsys <ulincsys@gmail.com>

The keys on this dictionary are defined in: analyzecommit.generate_commit_record() - Update reference to use proper 'cmt_author_timestamp' key - Add warning log when replacing TZdata to show commit hash Signed-off-by: Ulincsys <ulincsys@gmail.com>

Facade timezone fix redux

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>

Add dockerfile for performing an empty database build

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

…roperly convert things into a chord Signed-off-by: Adrian Edwards <adredwar@redhat.com>

Signed-off-by: Sajal-Kulshreshtha <sajalkulshreshtha9@gmail.com>

facade: refactor facade_phase to return the group itself so that celery can properly convert things into a chord

fix(cli): display timezone in output for GitHub key expiry (chaoss#3251)

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

…te-code scheduling: refactor send_messages to de-duplicate code path

Signed-off-by: Sean P. Goggins <s@goggins.com>

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

Signed-off-by: Sean P. Goggins <s@goggins.com>

…-duplicate-code" This reverts commit c8f22bb, reversing changes made to fefb48a.

main: bump celery and a related dependency - attempt to fix secondary+core

commented out abused API endpoint

Signed-off-by: PredictiveManish <manisht0914@gmail.com>

coderabbitai · 2025-10-25T12:52:50Z

Walkthrough

Added configurable per-task collection intervals and runtime config loading from augur.json; removed legacy contributor and pull request report routes and related view/report utilities; added a repo clones metric endpoint; moved/added config type conversion utility; introduced an Alembic migration and multiple Docker/versioning and documentation updates.

Changes

Cohort / File(s)	Summary
Configuration core `augur/application/config.py`, `augur/application/db/util.py`, `docker-compose.yml`	Config now supports runtime loading from `augur.json` (lookup via `CONFIG_DATADIR`), added `facade_contributor_full_recollect` and four Task interval keys, and moved/added `convert_type_of_value` into `augur.application.db.util`. `CONFIG_DATADIR` env var added to docker-compose.
Task scheduling / intervals `augur/tasks/start_tasks.py`, `augur/tasks/util/collection_util.py`, `augur/tasks/init/celery_app.py`	`start_tasks` reads per-task interval values from `AugurConfig` and passes them into `build_*_repo_collect_request`; collection dispatch unified for GitHub/GitLab in `collection_util`; materialized view scheduling now respects `mat_views_interval` > 0.
Facade/gitrepo worker changes `augur/tasks/git/facade_tasks.py`, `augur/tasks/git/util/facade_worker/.../config.py`, `.../repofetch.py`, `.../utilitymethods.py`, `augur/tasks/github/facade_github/tasks.py`	Facade worker gains `facade_contributor_full_recollect` flag; parent-commit collection removed start_date filtering; git reset now targets `origin/{remotedefault}`; facade contributor collection now filters by last_collected_date (via DB session) when applicable.
API removals / view simplification `augur/api/routes/contributor_reports.py` (removed), `augur/api/routes/pull_request_reports.py` (removed), `augur/api/routes/__init__.py`, `augur/api/view/init.py`, `augur/api/view/routes.py`, `augur/api/view/utils.py`, `augur/templates/repo-info.j2`	Deleted contributor and pull-request report route modules; removed in-memory `reports` structure and report-loading/requesting utilities; repo view/template markup no longer renders server-side report images or validates reports.
New metric endpoint `augur/api/metrics/repo_meta.py`	Added `clones(repo_group_id, repo_id=None, begin_date=None, end_date=None)` endpoint querying `augur_data.repo_clones_data` with optional date bounds.
Database / DB API changes `augur/application/db/lib.py`, `augur/application/db/session.py`, `augur/application/db/engine.py`, `augur/application/schema/alembic/versions/34_add_contrib_to_config.py`	Bulk insert functions renamed/parameter-updated and return semantics aligned; `parse_database_string` now annotated to return a 5-tuple; new Alembic migration adds `facade_contributor_full_recollect` config key.
Utilities / keys / auth `augur/util/keys.py`, `augur/tasks/util/random_key_auth.py`	Added `mask_key` utility and used it in random-key auth flow; auth flow now validates empty key lists and masks logged keys.
Metric/release/other logic tweaks `augur/tasks/github/releases/core.py`, `augur/application/db/models/augur_data.py`, `augur/application/db/data_parse.py`	Release_id normalization and duplicate-skip guard; minor model logging/typo fixes; return type annotations tightened for several data-parse helpers.
Docker & CI versioning `docker/*/Dockerfile`, `.github/workflows/build_docker.yml`, `docker/rabbitmq/definitions.json`	Dockerfiles switched to ARG VERSION + LABEL version=${VERSION}; rabbitmq base bumped to 4.1-management-alpine and definitions rabbit_version to "4.1"; CI workflow extracts version and passes it as build-arg.
Docs and metadata `docs/source/**`, `README.md`, `metadata.py`, `CITATION.cff`, `pyproject.toml`	Multiple documentation edits (getting-started, docker, release process, API dev guide), version bump to 0.90.3 in metadata and README, dependency and mypy config updates, and a minor CITATION date change.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Monitor as augur_collection_monitor
  participant Config as AugurConfig
  participant Builders as CollectionRequestBuilders

  Note over Config: Tasks section contains\ncore, secondary, facade, ml\ncollection interval keys
  Monitor->>Config: get_section("Tasks")
  Config-->>Monitor: tasks_config (interval values)
  Monitor->>Builders: build_primary_repo_collect_request(days_until_collect_again=core_collection_interval_days)
  Monitor->>Builders: build_secondary_repo_collect_request(days_until_collect_again=secondary_collection_interval_days)
  Monitor->>Builders: build_facade_repo_collect_request(days_until_collect_again=facade_collection_interval_days)
  Monitor->>Builders: build_ml_repo_collect_request(days_until_collect_again=ml_collection_interval_days)
  Note right of Builders: Builders receive configurable\nper-task intervals instead of hard-coded literals

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pay special attention to the large removals of report modules and view/report utilities to ensure no remaining imports or references cause runtime errors.
Verify config loading precedence (env CONFIG_DATADIR → augur.json) and type coercion behavior via convert_type_of_value.
Review facade worker changes for unintended behavior from removing start_date filters and the string-format bug noted in utilitymethods.
Confirm DB insert API signature changes are propagated to all call sites.
Validate Docker/CI changes correctly pass VERSION build-arg in all build steps.

Poem

🐰 I nibble configs, numbers in a row,

Core and facade and ML now know,
I masked a key, I pruned old views,
Bumped versions, patched the news,
Hoppity-hop — the codebase grows.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 31.48% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The PR title "Fix hardcoded collection intervals in multiple spots" directly corresponds to a significant and deliberate change documented in the PR objectives, which explicitly states the primary goal is to "replace hardcoded collection interval values with configuration-driven settings." The title accurately identifies this core feature by referring to the real and substantive modification of collection interval handling across the codebase. While the PR includes numerous other changes (Docker updates, API removals, documentation, database migrations, type fixes, etc.), the title effectively captures the main feature that the PR author intended to address, consistent with the principle that titles need not be comprehensive of every detail in the changeset.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-hardcoded-collection-intervals

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7548d2e and f829663.

📒 Files selected for processing (4)

augur/application/config.py (1 hunks)
augur/tasks/github/releases/core.py (3 hunks)
augur/tasks/start_tasks.py (5 hunks)
augur/tasks/util/collection_util.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

augur/tasks/util/collection_util.py (2)

augur/application/config.py (1)

get_section (159-182)

augur/application/db/lib.py (1)

get_section (52-77)

augur/tasks/start_tasks.py (1)

augur/tasks/util/collection_util.py (3)

get_collection_intervals_from_config (216-233)

CollectionRequest (116-151)

get_valid_repos (132-151)

augur/tasks/github/releases/core.py (1)

augur/application/db/models/augur_data.py (1)

Release (1832-1860)

🪛 Ruff (0.14.1)

augur/tasks/util/collection_util.py

226-226: Do not catch blind exception: Exception

(BLE001)

augur/tasks/start_tasks.py

159-159: Unused function argument: logger

(ARG001)

179-179: get_collection_intervals_from_config may be undefined, or defined from star imports

(F405)

180-180: CollectionRequest may be undefined, or defined from star imports

(F405)

184-184: Unused function argument: logger

(ARG001)

199-199: get_collection_intervals_from_config may be undefined, or defined from star imports

(F405)

200-200: CollectionRequest may be undefined, or defined from star imports

(F405)

206-206: Unused function argument: logger

(ARG001)

206-206: Unused function argument: enabled_phase_names

(ARG001)

223-223: get_collection_intervals_from_config may be undefined, or defined from star imports

(F405)

224-224: CollectionRequest may be undefined, or defined from star imports

(F405)

229-229: Unused function argument: logger

(ARG001)

229-229: Unused function argument: enabled_phase_names

(ARG001)

240-240: get_collection_intervals_from_config may be undefined, or defined from star imports

(F405)

241-241: CollectionRequest may be undefined, or defined from star imports

(F405)

augur/tasks/github/releases/core.py

72-72: Release may be undefined, or defined from star imports

(F405)

72-72: Release may be undefined, or defined from star imports

(F405)

🔇 Additional comments (8)

augur/application/config.py (1)

81-85: LGTM! Configuration keys properly defined.

The new collection interval keys are well-structured and align with the values used in the fallback logic of get_collection_intervals_from_config(). The existing collection_interval at line 81 remains for backward compatibility.

augur/tasks/github/releases/core.py (2)

26-26: LGTM! Release ID normalization is appropriate.

Converting release_id to string and stripping whitespace ensures consistent formatting for both regular releases and tag-only releases.

Also applies to: 54-54

81-84: Duplicate suppression logic is sound.

The early return prevents inserting duplicate releases, assuming the existing_release_ids set is populated correctly (contingent on the verification above).

augur/tasks/util/collection_util.py (1)

216-233: LGTM! Robust config retrieval with appropriate fallback.

The function properly retrieves collection intervals from configuration with per-key defaults and a fallback mechanism. The broad exception catch on line 226 is appropriate here to ensure graceful degradation if the config section is unavailable.

augur/tasks/start_tasks.py (4)

159-182: LGTM! Config-driven intervals properly integrated.

The function signature update and interval retrieval from config are implemented correctly. The intervals['core'] value is appropriately passed to the CollectionRequest constructor.

184-203: LGTM! Secondary collection intervals properly configured.

Consistent implementation with the primary collection request. The intervals['secondary'] value is correctly applied.

206-227: LGTM! Facade collection intervals properly configured.

The facade collection request correctly uses intervals['facade'] from the configuration.

229-243: LGTM! ML collection intervals properly configured.

The machine learning collection request correctly uses intervals['ml'] from the configuration, completing the consistent refactoring across all collection types.

augur/tasks/github/releases/core.py

Signed-off-by: PredictiveManish <manisht0914@gmail.com>

MoralCode · 2025-10-27T18:53:33Z

Just as a heads up: you filed this Pull request against your fork of the augur repository, not the actual upstream augur repository (notice how this is pull request number 1 when the issue it is solving is chaoss#3000-ish?).

when making your pull request, make sure the "base branch" (usually on the left in the selection menu) is set to the chaoss/augur repository's main branch, not the main branch in your fork (which may just appear as main on its own)

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

…format Fixing the API key Visibility issue in debug mode

Co-authored-by: Adrian Edwards <17362949+MoralCode@users.noreply.github.com> Signed-off-by: Manish Tiwari <manish.tiwari.09@zohomail.in>

Signed-off-by: Sean P. Goggins <s@goggins.com>

testing docker config for empty database

Fix: python3.9 compatibility message in docs chaoss#3266

Alternative mypy setup

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

augur/tasks/start_tasks.py (4)
159-159: Remove the unused days_until_collect_again parameter.

The PR description states that "unused days_until_collect_again parameters from build functions" were removed, but this parameter is still present in the function signature. Since the collection interval is now sourced from configuration (lines 178-181), this parameter is no longer used and should be removed to avoid confusion.

Apply this diff:
-def build_primary_repo_collect_request(session, logger, enabled_phase_names, days_until_collect_again = 15):
+def build_primary_repo_collect_request(session, logger, enabled_phase_names):
185-185: Remove the unused days_until_collect_again parameter.

Similar to the primary collection function, this parameter is no longer used since the interval is now config-driven (lines 200-203). Additionally, note that the parameter's default value (1) doesn't match the config fallback default (10), which could have caused confusion.

Apply this diff:
-def build_secondary_repo_collect_request(session, logger, enabled_phase_names, days_until_collect_again = 1):
+def build_secondary_repo_collect_request(session, logger, enabled_phase_names):
209-209: Remove the unused days_until_collect_again parameter.

This parameter is unused since the interval is now retrieved from configuration (lines 225-228).

Apply this diff:
-def build_facade_repo_collect_request(session, logger, enabled_phase_names, days_until_collect_again = 10):
+def build_facade_repo_collect_request(session, logger, enabled_phase_names):
233-233: Remove the unused days_until_collect_again parameter.

This parameter is unused since the interval is now retrieved from configuration (lines 243-246).

Apply this diff:
-def build_ml_repo_collect_request(session, logger, enabled_phase_names, days_until_collect_again = 40):
+def build_ml_repo_collect_request(session, logger, enabled_phase_names):

🧹 Nitpick comments (1)

augur/tasks/start_tasks.py (1)
178-181: Consider centralizing config retrieval to reduce duplication.

This pattern of fetching the Tasks configuration section is repeated in all four build functions (lines 178-181, 200-203, 225-228, 243-246). Consider retrieving the Tasks config once in the augur_collection_monitor function and passing the relevant intervals to each build function, or creating a helper function to encapsulate this logic.

Example approach - fetch config once in the calling function:
# In augur_collection_monitor function
tasks_config = get_section("Tasks")

# Then pass to build functions or extract a helper:
def get_collection_interval(interval_name, default):
    tasks_config = get_section("Tasks")
    return tasks_config.get(interval_name, default)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f829663 and 8201ed8.

📒 Files selected for processing (2)

augur/application/config.py (1 hunks)
augur/tasks/start_tasks.py (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

augur/application/config.py

🧰 Additional context used

🧬 Code graph analysis (1)

augur/tasks/start_tasks.py (2)

augur/application/db/lib.py (1)

get_section (52-77)

augur/tasks/util/collection_util.py (1)

CollectionRequest (116-151)

🪛 Ruff (0.14.2)

augur/tasks/start_tasks.py

181-181: CollectionRequest may be undefined, or defined from star imports

(F405)

203-203: CollectionRequest may be undefined, or defined from star imports

(F405)

228-228: CollectionRequest may be undefined, or defined from star imports

(F405)

246-246: CollectionRequest may be undefined, or defined from star imports

(F405)

🔇 Additional comments (2)

augur/tasks/start_tasks.py (2)

31-31: LGTM!

The import of get_section is necessary to fetch configuration values from the database.

159-248: Verification complete: the days_until_collect_again parameter is not used by external callers.

The search confirms that these build functions are only referenced within augur/tasks/start_tasks.py and are never imported elsewhere. All internal calls (lines 268, 271, 276, 279) pass only three arguments and do not use the days_until_collect_again parameter. The parameter can be safely removed from all four function signatures.

Fix variables issue in chaoss#3183

remove the "empty database" docker build step to avoid blocking release Adrian and I are merging this with only "one" approval as its been discussed in maintainer meetings.

…arameter

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

augur/tasks/git/util/facade_worker/facade_worker/utilitymethods.py (1)
108-122: Critical: Fix string formatting in git command.

There's a critical string formatting issue on line 111. The format string contains "git --git-dir %s log..." but the string substitution uses % (absolute_repo_path) at the end of line 111, which expects a %s placeholder. However, the string only contains %%H (an escaped percent sign for the git format), not %s for the repo path.

This will cause the command to fail or behave incorrectly.

Apply this diff to fix the formatting:
 def get_parent_commits_set(absolute_repo_path):
 	
-	parents = subprocess.Popen(["git --git-dir %s log --ignore-missing "
-								"--pretty=format:'%%H'" % (absolute_repo_path)],
+	parents = subprocess.Popen([f"git --git-dir {absolute_repo_path} log --ignore-missing "
+								"--pretty=format:'%H'"],
 	stdout=subprocess.PIPE, shell=True)
Note: The static analysis tool also flagged the use of shell=True as a security concern. Consider using a list of arguments without shell=True if possible.
augur/application/db/lib.py (1)
226-273: Rewrite timezone validation logic - inverted condition will cause valid timestamps to fail.

The conditional at lines 249-252 is inverted. When a timezone offset is valid (found in postgres_valid_timezones), the code raises the original exception, preventing the record from being inserted. When invalid, it normalizes to UTC. This is backwards.

Current code:
if int(tzdata) not in postgres_valid_timezones:
    tzdata = "+0000"
else:
    raise e
Should be:
if int(tzdata) not in postgres_valid_timezones:
    tzdata = "+0000"
Remove the else: raise e clause entirely. If the timezone is valid, allow the record to proceed with the normalized timestamp. The original exception e indicates a database error—if timezone normalization resolves the issue (e.g., an invalid offset was causing PostgreSQL to reject the insert), the record should be successfully saved.
.github/workflows/build_docker.yml (1)
202-240: Add VERSION build-arg to Podman build steps.

The test-e2e-podman job's build steps (lines 202-240) do not pass the VERSION build-arg, even though the Dockerfiles now require it via ARG VERSION declarations. This will cause the Podman builds to fail or produce images without proper version labels.

The test-e2e-podman job needs the same version extraction step as test-e2e. Add after line 200:
 - name: Checkout repository
   uses: actions/checkout@v4
+
+- name: Extract project version
+  id: version
+  run: |
+    VERSION=$(python -c "exec(open('metadata.py').read()); print(__version__)")
+    echo "version=$VERSION" >> $GITHUB_OUTPUT
+    echo "Using version: $VERSION"
Then update each Podman build step to include build-args. For example:
 - name: Build database container
   uses: redhat-actions/buildah-build@v2
   with:
     context: .
     containerfiles: |
       ./docker/database/Dockerfile
+    build-args: |
+      VERSION=${{ steps.version.outputs.version }}
     platforms: linux/amd64
     tags: ghcr.io/${{ github.repository_owner }}/augur_database:test
     layers: true
Apply similar changes to the keyman, rabbitmq, and backend Podman build steps.

🧹 Nitpick comments (8)

augur/tasks/util/random_key_auth.py (1)
6-6: Remove unused import.

The hashlib import is not used anywhere in this file.

Apply this diff to remove the unused import:
-import hashlib
augur/api/view/routes.py (1)
7-7: Consider replacing star import with explicit imports.

The star import from .utils import * makes it unclear which symbols are imported and prevents static analysis tools from verifying their availability. Based on the code, render_module, renderRepos, and render_message appear to be the primary imports needed.

Apply this diff to use explicit imports:
-from .utils import *
+from .utils import render_module, renderRepos, render_message
Note: This is a pre-existing issue, not introduced by this PR.
augur/tasks/github/releases/core.py (1)
3-3: Consider replacing star import with explicit imports.

The star import makes it less clear where Release and other models are defined. Consider using explicit imports:
-from augur.application.db.models import *
+from augur.application.db.models import Release
This improves code clarity and helps static analysis tools. Based on static analysis hints.
augur/tasks/util/collection_util.py (1)
600-628: Consider more robust platform detection.

The current approach uses substring matching ("github" in repo.repo_git and "gitlab" in repo.repo_git) to determine the platform. This could lead to false positives if a repository URL contains these strings in unexpected ways (e.g., a self-hosted GitLab instance with "github" in its domain name).

Consider using a more robust detection method:

Check if Repo has a platform or repo_type field

Use URL parsing to extract and compare the hostname

Use a regex pattern for more precise matching

Example with URL parsing:
+from urllib.parse import urlparse
+
 repo = get_repo_by_repo_git(repo_git)
 platform_name = "github"
-# this needs to be here and not up a level since it should be set/reset for each repo.
-# otherwise a gitlab repo would reset it and cause subsequent github repos to use gitlab phases.
 phases = None
-if "github" in repo.repo_git:
+
+parsed_url = urlparse(repo.repo_git)
+hostname = parsed_url.hostname or ""
+
+if "github.com" in hostname:
     phases = col_hook.phases
-    # use default platform name
-
-elif "gitlab" in repo.repo_git:
+elif "gitlab.com" in hostname:
     platform_name = "gitlab"
     if col_hook.gitlab_phases is None:
         continue
     phases = col_hook.gitlab_phases
augur/tasks/git/util/facade_worker/facade_worker/config.py (1)

254-264: Consider removing commented-out code.

The get_last_collected_commit_date helper has been commented out. If this code is no longer needed, consider removing it entirely to keep the codebase clean. If it might be needed in the future, document why it's being kept.
augur/tasks/init/celery_app.py (1)
244-248: LGTM! Config-driven scheduling implemented correctly.

The conditional scheduling based on mat_views_interval aligns well with the PR's goal of making collection intervals configurable. The logic correctly schedules the task only when the interval is positive.

Minor style cleanup: Remove the f prefix from the log statements on lines 245 and 248 since they don't contain placeholders.
-            logger.info(f"Scheduling refresh materialized view every night at 1am CDT")
+            logger.info("Scheduling refresh materialized view every night at 1am CDT")
-            logger.info(f"Refresh materialized view task is disabled.")
+            logger.info("Refresh materialized view task is disabled.")
README.md (1)
14-14: Improve link accessibility with descriptive text.

The link text "here" is not descriptive for screen readers and users scanning the document. Consider using more descriptive text such as "the v0.90.3 release" or "version 0.90.3 on GitHub".

Apply this diff to improve accessibility:
-Augur is now releasing a dramatically improved new version. It is also available [here](https://github.com/chaoss/augur/releases/tag/v0.90.3).
+Augur is now releasing a dramatically improved new version. It is also available [on GitHub as v0.90.3](https://github.com/chaoss/augur/releases/tag/v0.90.3).
Based on static analysis (markdownlint).
augur/application/db/util.py (1)
73-79: Enhance boolean parsing logic for robustness.

The current boolean parsing only recognizes lowercase "false" as False, while all other values (including empty strings, "False", "FALSE", "0", "no", "off") are treated as True. This could lead to unexpected configuration behavior.

Consider a more robust implementation:
 elif data_type == "bool":
     value = config_dict["value"]
     
-    if value.lower() == "false":
+    if isinstance(value, bool):
+        config_dict["value"] = value
+    elif str(value).lower() in ("false", "0", "no", "off", ""):
         config_dict["value"] = False
     else:
         config_dict["value"] = True
This handles common false-like strings and prevents empty strings from being treated as True.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5751600 and b99a182.

⛔ Files ignored due to path filters (3)

augur/application/schema/repo_load_sample.csv is excluded by !**/*.csv
tests/test_workers/test_facade/test_facade_contributor_interface/test_repos.csv is excluded by !**/*.csv
uv.lock is excluded by !**/*.lock

📒 Files selected for processing (50)

.github/workflows/build_docker.yml (5 hunks)
CITATION.cff (1 hunks)
README.md (2 hunks)
augur/api/metrics/repo_meta.py (1 hunks)
augur/api/routes/__init__.py (0 hunks)
augur/api/routes/contributor_reports.py (0 hunks)
augur/api/routes/pull_request_reports.py (0 hunks)
augur/api/view/init.py (0 hunks)
augur/api/view/routes.py (1 hunks)
augur/api/view/utils.py (0 hunks)
augur/application/cli/github.py (2 hunks)
augur/application/config.py (4 hunks)
augur/application/db/data_parse.py (4 hunks)
augur/application/db/engine.py (1 hunks)
augur/application/db/lib.py (4 hunks)
augur/application/db/models/augur_data.py (3 hunks)
augur/application/db/session.py (3 hunks)
augur/application/db/util.py (1 hunks)
augur/application/schema/alembic/versions/34_add_contrib_to_config.py (1 hunks)
augur/application/util.py (1 hunks)
augur/tasks/git/facade_tasks.py (3 hunks)
augur/tasks/git/util/facade_worker/facade_worker/config.py (3 hunks)
augur/tasks/git/util/facade_worker/facade_worker/repofetch.py (1 hunks)
augur/tasks/git/util/facade_worker/facade_worker/utilitymethods.py (1 hunks)
augur/tasks/github/facade_github/tasks.py (7 hunks)
augur/tasks/github/releases/core.py (3 hunks)
augur/tasks/init/celery_app.py (1 hunks)
augur/tasks/start_tasks.py (6 hunks)
augur/tasks/util/collection_util.py (1 hunks)
augur/tasks/util/random_key_auth.py (1 hunks)
augur/templates/repo-info.j2 (1 hunks)
augur/util/keys.py (1 hunks)
docker-compose.yml (1 hunks)
docker/backend/Dockerfile (1 hunks)
docker/database/Dockerfile (1 hunks)
docker/keyman/Dockerfile (1 hunks)
docker/rabbitmq/Dockerfile (1 hunks)
docker/rabbitmq/definitions.json (1 hunks)
docs/source/conf.py (1 hunks)
docs/source/development-guide/create-a-metric/api-development.rst (0 hunks)
docs/source/docker/docker-compose.rst (1 hunks)
docs/source/docker/getting-started.rst (1 hunks)
docs/source/docker/quick-start.rst (1 hunks)
docs/source/getting-started/collecting-data.rst (1 hunks)
docs/source/getting-started/command-line-interface/db.rst (1 hunks)
docs/source/getting-started/installation.rst (1 hunks)
docs/source/getting-started/using-docker.rst (2 hunks)
docs/source/procedures/creating-releases.rst (1 hunks)
metadata.py (1 hunks)
pyproject.toml (3 hunks)

💤 Files with no reviewable changes (6)

augur/api/view/init.py
docs/source/development-guide/create-a-metric/api-development.rst
augur/api/view/utils.py
augur/api/routes/contributor_reports.py
augur/api/routes/init.py
augur/api/routes/pull_request_reports.py

✅ Files skipped from review due to trivial changes (6)

docs/source/getting-started/using-docker.rst
docs/source/docker/quick-start.rst
docs/source/docker/getting-started.rst
CITATION.cff
augur/application/util.py
augur/application/db/models/augur_data.py

🚧 Files skipped from review as they are similar to previous changes (1)

docs/source/getting-started/collecting-data.rst

🧰 Additional context used

🧬 Code graph analysis (11)

augur/tasks/github/facade_github/tasks.py (4)

augur/application/db/__init__.py (1)

get_session (29-39)

augur/application/db/util.py (1)

execute_session_query (31-43)

augur/tasks/git/util/facade_worker/facade_worker/config.py (1)

FacadeHelper (81-253)

augur/application/db/models/augur_operations.py (1)

CollectionStatus (1139-1283)

augur/tasks/util/random_key_auth.py (1)

augur/util/keys.py (1)

mask_key (1-5)

augur/tasks/github/releases/core.py (2)

augur/application/db/models/augur_data.py (1)

Release (1832-1860)

augur/application/db/util.py (1)

execute_session_query (31-43)

augur/tasks/git/facade_tasks.py (4)

augur/tasks/git/util/facade_worker/facade_worker/utilitymethods.py (1)

get_parent_commits_set (108-122)

augur/tasks/git/dependency_tasks/tasks.py (1)

process_dependency_metrics (10-14)

augur/tasks/git/dependency_libyear_tasks/tasks.py (1)

process_libyear_dependency_metrics (7-12)

augur/tasks/git/scc_value_tasks/tasks.py (1)

process_scc_value_metrics (9-13)

augur/tasks/init/celery_app.py (1)

augur/tasks/db/refresh_materialized_views.py (1)

refresh_materialized_views (12-210)

augur/application/schema/alembic/versions/34_add_contrib_to_config.py (2)

augur/application/db/session.py (1)

DatabaseSession (45-283)

augur/application/config.py (2)

AugurConfig (118-368)

add_section_from_json (288-314)

augur/api/view/routes.py (1)

augur/api/view/utils.py (1)

render_module (222-224)

augur/tasks/start_tasks.py (2)

augur/application/config.py (2)

AugurConfig (118-368)

get_value (163-188)

augur/tasks/util/collection_util.py (1)

CollectionRequest (116-151)

augur/api/metrics/repo_meta.py (2)

augur/api/util.py (1)

register_metric (54-83)

augur/application/db/engine.py (1)

engine (138-139)

augur/application/config.py (1)

augur/application/db/util.py (2)

execute_session_query (31-43)

convert_type_of_value (63-90)

augur/application/db/lib.py (1)

augur/application/db/util.py (2)

execute_session_query (31-43)

convert_type_of_value (63-90)

🪛 markdownlint-cli2 (0.18.1)

README.md

14-14: Link text should be descriptive

(MD059, descriptive-link-text)

🪛 Ruff (0.14.2)

augur/tasks/github/facade_github/tasks.py

202-202: FacadeHelper may be undefined, or defined from star imports

(F405)

205-205: CollectionStatus may be undefined, or defined from star imports

(F405)

205-205: CollectionStatus may be undefined, or defined from star imports

(F405)

augur/tasks/util/random_key_auth.py

32-32: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

augur/tasks/github/releases/core.py

72-72: Release may be undefined, or defined from star imports

(F405)

72-72: Release may be undefined, or defined from star imports

(F405)

augur/tasks/init/celery_app.py

245-245: f-string without any placeholders

Remove extraneous f prefix

(F541)

248-248: f-string without any placeholders

Remove extraneous f prefix

(F541)

augur/application/schema/alembic/versions/34_add_contrib_to_config.py

10-10: from augur.application.config import * used; unable to detect undefined names

(F403)

25-25: AugurConfig may be undefined, or defined from star imports

(F405)

46-49: f-string without any placeholders

Remove extraneous f prefix

(F541)

52-55: f-string without any placeholders

Remove extraneous f prefix

(F541)

56-56: Do not use bare except

(E722)

56-57: try-except-pass detected, consider logging the exception

(S110)

augur/api/view/routes.py

226-226: render_module may be undefined, or defined from star imports

(F405)

augur/tasks/start_tasks.py

179-179: CollectionRequest may be undefined, or defined from star imports

(F405)

198-198: CollectionRequest may be undefined, or defined from star imports

(F405)

220-220: CollectionRequest may be undefined, or defined from star imports

(F405)

235-235: CollectionRequest may be undefined, or defined from star imports

(F405)

270-270: facade_phase may be undefined, or defined from star imports

(F405)

274-274: machine_learning_phase may be undefined, or defined from star imports

(F405)

augur/tasks/git/util/facade_worker/facade_worker/utilitymethods.py

110-110: subprocess call with shell=True identified, security issue

(S602)

augur/application/db/lib.py

252-252: Use raise without specifying exception name

Remove exception name

(TRY201)

🔇 Additional comments (40)

docs/source/getting-started/command-line-interface/db.rst (1)

81-88: Documentation example now correctly matches the specified format.

The example CSV entries have been reordered to align with the documented format specification (<git_repo_url>,<repo_group_id>), improving consistency between the specification and the worked example. The URLs and repo IDs are syntactically correct.

Minor note: Line 66 contains a trailing comma after <repo_group_id>, which does not appear in the actual examples—consider removing it for clarity if it's unintentional.

augur/util/keys.py (1)

1-5: LGTM! Solid security utility implementation.

The masking logic correctly handles edge cases by returning all stars when the key is too short or not a string, preventing inadvertent exposure of short secrets. The default parameters are sensible for typical API keys.

augur/tasks/util/random_key_auth.py (3)

27-30: Excellent security improvement!

The validation for empty keys with explicit error logging prevents silent failures and improves debuggability.

32-32: Static analysis warning is a false positive.

The S311 warning about choice() not being suitable for cryptographic purposes can be safely ignored here. You're selecting from a pool of existing keys for load balancing/rotation, not generating cryptographic material. The security of the keys themselves doesn't depend on the randomness of the selection algorithm.

34-36: Significant security improvement by masking keys in logs.

Logging the masked key instead of plaintext prevents accidental exposure of sensitive credentials in log files, which is a critical security best practice.

augur/api/metrics/repo_meta.py (2)

1244-1298: Change appears unrelated to PR objectives.

The PR summary states the focus is on "fixing hardcoded collection intervals" in configuration and task collection code (config.py, start_tasks.py, collection_util.py). However, this change introduces a new metric endpoint for repository clone data, which doesn't appear related to collection interval configuration.

Can you clarify how this new clones metric relates to the collection interval fixes mentioned in the PR description?

1244-1298: Database schema verified—no issues found.

The table augur_data.repo_clones_data and all referenced columns (clone_data_timestamp, count_clones, unique_clones) exist across the schema definitions. The implementation follows established patterns, uses parameterized queries correctly, and integrates properly with the existing codebase including the data collection task and ORM model.

augur/api/view/routes.py (1)

226-226: LGTM! Correctly removes server-side reports parameter.

The change aligns with the PR's goal of moving from server-side report rendering to client-side dynamic loading. The template no longer expects reports or images to be passed from the server.

augur/tasks/github/releases/core.py (2)

26-26: LGTM! Consistent release_id normalization.

The release_id is now properly normalized by converting to string and stripping whitespace in both the non-tag-only and tag-only branches. This ensures consistency for duplicate checking downstream.

Also applies to: 54-54

79-84: Good duplicate prevention logic, but depends on fixing line 74.

The duplicate check is well-implemented with consistent normalization and efficient set membership testing. The early return avoids unnecessary database operations and provides clear logging.

However, this logic will only work correctly once the Row unpacking issue on line 74 is fixed.

augur/application/cli/github.py (1)

59-59: LGTM: Improved time formatting with timezone awareness.

The changes enhance the CLI output by centering the reset time headers and adding timezone information to the displayed timestamps. The use of astimezone() and the %Z (UTC%z) format string provides clearer context for users.

Also applies to: 91-94

augur/tasks/git/facade_tasks.py (2)

124-124: LGTM: Simplified commit collection without date filtering.

The removal of start_date from get_parent_commits_set calls aligns with the broader simplification of the commit history collection process. This change is consistent with updates to augur/tasks/git/util/facade_worker/facade_worker/utilitymethods.py.

Also applies to: 247-247

500-508: LGTM: Simplified facade sequence construction.

The change from building a list and returning chain(*facade_sequence) to directly returning a group() object simplifies the task orchestration pattern.

augur/tasks/git/util/facade_worker/facade_worker/config.py (2)

130-130: LGTM: New configuration option added.

The facade_contributor_full_recollect configuration option is properly initialized from the worker_options.

170-177: LGTM: Improved log routing by level.

The refactored log_activity method now properly routes log messages based on their level:

Error → logger.error

Debug/Verbose → logger.debug

Others → logger.info

This provides better log filtering and organization.

augur/application/db/session.py (2)

96-109: LGTM: Improved parameter naming and input normalization.

Renaming data to data_input improves clarity, and the explicit else branch for normalizing input to a list is more robust.

171-212: The review comment is incorrect—the code already returns None, not the string "success".

The function signature shows the return type as Optional[List[dict]], and the actual code at line 213 returns None (the # success is merely a comment, not the return value). All callers in the codebase already check for None or use the returned list of dicts when return_columns is provided. There is no evidence of a "success" string being the previous return value, and no callers depend on such a value.

Likely an incorrect or invalid review comment.

augur/application/db/lib.py (3)

16-16: LGTM: Proper separation of concerns.

Moving convert_type_of_value import to augur.application.db.util improves code organization.

275-282: LGTM: Added configurability for batch size.

The batch_size parameter (defaulting to 1000) provides flexibility for tuning batch insert performance.

286-299: LGTM: Consistent parameter naming and return semantics.

The rename from data to data_input is consistent with changes in augur/application/db/session.py, and the explicit None return in the no-return-columns path clarifies the function's behavior.

Also applies to: 401-403

augur/application/db/engine.py (1)

13-29: No action required—all callers have been properly updated.

The search confirms that all four call sites of parse_database_string() in the codebase properly unpack the tuple return value using destructuring assignment. The function signature change is not breaking because callers have already been updated to handle the five-element tuple (username, password, host, port, database).

docker-compose.yml (1)

76-76: LGTM! Config directory now configurable.

The addition of CONFIG_DATADIR enables runtime configuration loading from augur.json, supporting the PR's goal of making collection intervals configurable.

docker/keyman/Dockerfile (1)

4-6: LGTM! Dynamic versioning adopted.

The switch from hardcoded version labels to build-time VERSION arguments is a good practice that enables consistent versioning across all Docker images via the CI workflow.

docker/backend/Dockerfile (1)

23-25: LGTM! Consistent with dynamic versioning pattern.

This change mirrors the keyman Dockerfile, ensuring all service images use the same build-time versioning approach.

augur/tasks/git/util/facade_worker/facade_worker/repofetch.py (1)

438-438: LGTM! More precise git reset command.

Changing the reset target from origin to origin/{remotedefault} makes the command more explicit and correct by targeting the detected default branch rather than relying on the symbolic origin ref.

docs/source/conf.py (1)

26-30: LGTM! Cleaner metadata import approach.

Replacing exec() with explicit imports is more maintainable and follows Python best practices. This change properly integrates with the new metadata.py versioning.

docs/source/docker/docker-compose.rst (1)

29-29: LGTM! Documentation clarified.

The updated wording better explains that all environment variables must be present in the .env file, with placeholder values acceptable for unused variables.

docker/rabbitmq/definitions.json (1)

2-2: Verify RabbitMQ 4.1 compatibility with application dependencies and runtime behavior.

The version bump from 3.12 to 4.1 introduces several breaking changes, including removal of classic queue replication, AMQP 1.0 as a core protocol, AMQP 0-9-1 frame_max increase from 4096 to 8192, and Erlang/OTP compatibility changes.

The definitions.json file itself contains no queue configurations or classic mirrored queue definitions, so it is compatible with the upgrade. However, verification is needed on:

AMQP client libraries: Confirm that any AMQP clients used by the Augur application support the increased frame_max of 8192 bytes or are configured with frame_max ≥ 8192.

Erlang/OTP version: Verify that the environment provides an Erlang/OTP version compatible with RabbitMQ 4.1.

Runtime queue behavior: If queues are created or configured at runtime by the application code, ensure they don't rely on classic queue mirroring or other removed features.

docker/database/Dockerfile (1)

5-7: LGTM: Dynamic versioning implementation.

The introduction of ARG VERSION and dynamic LABEL version=${VERSION} follows Docker best practices for build-time version injection. This change aligns with the workflow updates that extract the version from metadata.py and pass it as a build argument.

README.md (1)

1-1: LGTM: Version bump aligns with metadata.py.

The version header update to v0.90.3 is consistent with the changes in metadata.py and reflects the centralized version management approach.

docker/rabbitmq/Dockerfile (2)

5-6: LGTM: Dynamic versioning implementation.

The addition of ARG VERSION and LABEL version=${VERSION} follows the same pattern as other Dockerfiles in this PR, enabling consistent build-time version injection.

1-1: RabbitMQ 4.1 compatibility verified — all configuration and code dependencies are compatible.

Verification results:

Celery 5.5 (via kombu/py-amqp) is compatible with RabbitMQ 4.1, as Python AMQP clients default to the server-recommended frame_max and negotiate compatible values

Config files (definitions.json, advanced.config, augur.conf) contain no unsupported patterns for 4.1

No classic queue mirroring, etcd clustering, or incompatible plugins detected

update_config.py script uses standard rabbitmqctl and is compatible

Note: If upgrading from RabbitMQ 3.12, follow the staged upgrade path (3.12 → 3.13 → 4.1) or use blue/green migration, as direct 3.12 → 4.1 upgrades are not supported.

metadata.py (1)

8-9: LGTM: Version constants updated correctly.

The version bump to 0.90.3 is consistent across all related files and serves as the single source of truth for version information, as documented in the release process documentation.

augur/application/db/util.py (1)

63-90: Function correctly implements type coercion with error handling.

The function provides appropriate type conversion for common config value types (str, int, bool, float) and includes error handling for unsupported types via logging or print statements.

.github/workflows/build_docker.yml (2)

66-71: LGTM: Version extraction centralizes version management.

The version extraction from metadata.py and propagation as a GitHub Actions output enables consistent versioning across all Docker images, aligning with the centralized version management approach documented in the release process.

58-62: Cleanup step improves CI runner disk space management.

The addition of cleanup to remove unnecessary files (/usr/share/dotnet, AGENT_TOOLSDIRECTORY) helps prevent disk space issues during Docker builds on GitHub-hosted runners.

pyproject.toml (3)

28-29: LGTM: Compatible release constraints improve dependency flexibility.

Updating celery from ==5.2.7 to ~=5.5 and click from ==8.0.3 to ~=8.1 allows patch-level updates while preventing breaking changes, following Python packaging best practices.

102-107: LGTM: Type checking infrastructure added to dev dependencies.

The addition of mypy and type stubs (types-requests, types-pyyaml, types-python-dateutil, types-toml) enables static type checking for improved code quality and developer experience.

158-163: LGTM: Mypy configuration provides sensible starting point.

The mypy configuration starts with a focused scope (augur/application/db/*.py) and permissive settings, which is a good approach for gradual type checking adoption. The configuration can be tightened over time as the codebase adds more type annotations.

docs/source/procedures/creating-releases.rst (1)

5-110: LGTM: Comprehensive documentation of release process.

The documentation effectively describes:

The main/release branch workflow and their purposes

The new centralized version management approach using metadata.py

How the CI/CD pipeline automatically propagates versions to Docker builds

Special cases like hotfixes

This aligns well with the technical changes in the PR and provides clear guidance for future releases.

coderabbitai · 2025-10-31T13:32:57Z

augur/tasks/github/facade_github/tasks.py

+        query = session.query(CollectionStatus).filter(CollectionStatus.repo_id == repo.repo_id)
+        collection_status = execute_session_query(query,'one')
+        last_collected_date = collection_status.facade_data_last_collected if not facade_helper.facade_contributor_full_recollect else None



⚠️ Potential issue | 🔴 Critical

Fix missing CollectionStatus import.

CollectionStatus is referenced here without being imported, so this code will raise a NameError the first time the task runs. Please import it alongside Contributor.

-from augur.application.db.models import Contributor +from augur.application.db.models import CollectionStatus, Contributor

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

query = session.query(CollectionStatus).filter(CollectionStatus.repo_id == repo.repo_id)

collection_status = execute_session_query(query,'one')

last_collected_date = collection_status.facade_data_last_collected if not facade_helper.facade_contributor_full_recollect else None

from augur.application.db.models import CollectionStatus, Contributor

🧰 Tools

🪛 Ruff (0.14.2)

205-205: CollectionStatus may be undefined, or defined from star imports

(F405)

205-205: CollectionStatus may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents

In augur/tasks/github/facade_github/tasks.py around lines 205 to 208, CollectionStatus is used but not imported which will raise a NameError; add CollectionStatus to the import list where Contributor is imported (i.e., import CollectionStatus alongside Contributor from the same module/package) so the symbol is available for the query. Ensure the import follows existing import style and is placed with other model imports at the top of the file.

⚠️ Potential issue | 🟠 Major

Handle missing collection_status rows safely.

execute_session_query(query, 'one') raises when no row exists (common for freshly added repos before collection_status is populated), which will crash the task. Please fall back gracefully instead of assuming the record is present.

- query = session.query(CollectionStatus).filter(CollectionStatus.repo_id == repo.repo_id) - collection_status = execute_session_query(query,'one') - last_collected_date = collection_status.facade_data_last_collected if not facade_helper.facade_contributor_full_recollect else None + query = session.query(CollectionStatus).filter(CollectionStatus.repo_id == repo.repo_id) + collection_status = execute_session_query(query, 'first') + last_collected_date = None + if collection_status and not facade_helper.facade_contributor_full_recollect: + last_collected_date = collection_status.facade_data_last_collected

🧰 Tools

🪛 Ruff (0.14.2)

205-205: CollectionStatus may be undefined, or defined from star imports

(F405)

205-205: CollectionStatus may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents

In augur/tasks/github/facade_github/tasks.py around lines 205 to 208, the code assumes execute_session_query(query,'one') always returns a CollectionStatus and will raise if none exists; update this to handle missing rows safely by either using a query variant that returns None (e.g., execute_session_query(query,'one_or_none') or session.query(...).one_or_none()) or catching the exception returned by execute_session_query and treating a missing row as None, then set last_collected_date = None when collection_status is None (or when facade_helper.facade_contributor_full_recollect is True) so the task continues without crashing for newly added repos.

coderabbitai · 2025-10-31T13:32:57Z

augur/tasks/github/releases/core.py

-    release_id_data = execute_session_query(query, 'all')#pd.read_sql(release_id_data_sql, self.db, params={'repo_id': repo_id})
-    release_id_data = [str(r_id).strip() for r_id in release_id_data]#release_id_data.apply(lambda x: x.str.strip())
+    release_id_data = execute_session_query(query, 'all')
+    existing_release_ids = {str(r_id).strip() for r_id in release_id_data}


⚠️ Potential issue | 🔴 Critical

Critical bug: The past review comment is still unaddressed.

The past reviewer correctly identified that line 74 has a Row unpacking issue. SQLAlchemy's query.all() returns Row objects, and for single-column queries, each Row must be indexed to extract the value.

Current code:

existing_release_ids = {str(r_id).strip() for r_id in release_id_data}

This produces strings like "('some_id',)" instead of "some_id", which breaks the duplicate check on line 82.

Apply the fix suggested in the past review:

- existing_release_ids = {str(r_id).strip() for r_id in release_id_data} + existing_release_ids = {str(r_id[0]).strip() for r_id in release_id_data}

🤖 Prompt for AI Agents

In augur/tasks/github/releases/core.py around line 74, the set comprehension treats SQLAlchemy Row objects as strings, producing values like "('some_id',)"; update the comprehension to extract the first column from each Row (e.g., use r_id[0]) and skip or filter out None values before converting to string and stripping so existing_release_ids contains the actual ID strings (e.g., use str(r_id[0]).strip() for each non-None row).

coderabbitai · 2025-10-31T13:32:57Z

docs/source/getting-started/installation.rst

-**Python 3.9 is not yet supported because TensorFlow, which we use in our machine learning workers, does not yet support Python 3.9.**
-
-Our REST API & data collection workers write in Python 3.6. We query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository, so GitLab and GitHub access tokens are **required** for data collection.
+Our REST API & data collection workers query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository. Values for GitLab and GitHub access tokens are **required** for data collection and must be provided (an invalid token can be provided if you don't plan to use one platform) .


⚠️ Potential issue | 🟡 Minor

Clarify the token requirement and fix punctuation.

The statement "must be provided (an invalid token can be provided if...)" is somewhat contradictory and could mislead users. Additionally, there's an extra space before the period at the end of the line.

Consider restructuring this sentence for clarity and to explicitly explain what "invalid token" means in context.

Apply this diff to improve clarity and fix punctuation:

-Our REST API & data collection workers query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository. Values for GitLab and GitHub access tokens are **required** for data collection and must be provided (an invalid token can be provided if you don't plan to use one platform) . +Our REST API & data collection workers query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository. GitHub and GitLab access tokens are **required** for data collection. If you don't plan to use one platform, you may provide a placeholder or invalid token for that platform.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Our REST API & data collection workers query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository. Values for GitLab and GitHub access tokens are **required** for data collection and must be provided (an invalid token can be provided if you don't plan to use one platform) .

Our REST API & data collection workers query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository. GitHub and GitLab access tokens are **required** for data collection. If you don't plan to use one platform, you may provide a placeholder or invalid token for that platform.

🤖 Prompt for AI Agents

In docs/source/getting-started/installation.rst around line 34, clarify and rephrase the sentence about GitHub/GitLab tokens and remove the stray space before the period: state that values for GitHub and GitLab access tokens are required fields for data collection, but if you do not intend to collect data from one provider you may supply a placeholder or deliberately invalid token (e.g., "none" or "placeholder") so the configuration remains valid; reword the sentence to avoid contradiction and fix punctuation so there is no extra space before the period.

MoralCode and others added 30 commits July 15, 2025 09:14

deduplicate convert_type_of_value

b8c2ca4

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

move to db.util

b289d66

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

Merge pull request chaoss#3229 from MoralCode/dedup-conversion-function

6cd41f6

deduplicate convert_type_of_value

add date filter to contributer resolution logic queries

8539825

Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>

dont use start date

fe85b3f

Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>

Try a new version of the TZdata fix

b17d8f9

Signed-off-by: Ulincsys <ulincsys@gmail.com>

Merge pull request chaoss#3252 from chaoss/facade-timezone-fix-redux

d5bff52

Facade timezone fix redux

Merge branch 'main' into partial-recollection-facade

0df6bdd

add a dockerfile that can build the empty DB container

ca922ad

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

add empty db to GHCR container builds

4631d90

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

remove unused method

6a6c760

Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>

Merge pull request chaoss#3259 from MoralCode/empty-db-build

2c9a5f1

Add dockerfile for performing an empty database build

refactor send_messages to remove a almost entirely duplicate code path

33debfd

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

refactor facade_phase to return the group itself so that celery can p…

83d0db1

…roperly convert things into a chord Signed-off-by: Adrian Edwards <adredwar@redhat.com>

fix(cli): display timezone in output for GitHub key expiry (chaoss#3251)

bc35b38

Signed-off-by: Sajal-Kulshreshtha <sajalkulshreshtha9@gmail.com>

removed day from formatted time

60d3036

Signed-off-by: Sajal-Kulshreshtha <sajalkulshreshtha9@gmail.com>

removed extra lines

2598699

Signed-off-by: Sajal-Kulshreshtha <sajalkulshreshtha9@gmail.com>

Align timestamp formatting with header layout

139d5bc

Signed-off-by: Sajal-Kulshreshtha <sajalkulshreshtha9@gmail.com>

Merge pull request chaoss#3273 from MoralCode/feature/facade-wtf

a329409

facade: refactor facade_phase to return the group itself so that celery can properly convert things into a chord

Merge pull request chaoss#3261 from skools-here/fix-display-timezone-cli

fefb48a

fix(cli): display timezone in output for GitHub key expiry (chaoss#3251)

bump rabbit image to 4.1

b8e1c79

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

Merge pull request chaoss#3271 from MoralCode/collection_util-duplica…

c8f22bb

…te-code scheduling: refactor send_messages to de-duplicate code path

commented out abused API endpoint

db8843d

Signed-off-by: Sean P. Goggins <s@goggins.com>

bump celery and a related dependency

5899e01

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

updated

493dddd

Signed-off-by: Sean P. Goggins <s@goggins.com>

Revert "Merge pull request chaoss#3271 from MoralCode/collection_util…

bcd58f5

…-duplicate-code" This reverts commit c8f22bb, reversing changes made to fefb48a.

Merge pull request chaoss#3291 from chaoss/dependencies/celery-update

4131e03

main: bump celery and a related dependency - attempt to fix secondary+core

Merge pull request chaoss#3289 from chaoss/api-abuse-patch

84a4b8e

commented out abused API endpoint

PredictiveManish added 3 commits October 25, 2025 14:06

Reverting the unnecessary changes

8ad15bd

Signed-off-by: PredictiveManish <manisht0914@gmail.com>

Moving mask_key() to augur.util

6700088

Signed-off-by: PredictiveManish <manisht0914@gmail.com>

Removed key_fingerprint for easy interpretation

bc9bd96

Signed-off-by: PredictiveManish <manisht0914@gmail.com>

coderabbitai bot reviewed Oct 25, 2025

View reviewed changes

augur/tasks/github/releases/core.py Show resolved Hide resolved

PredictiveManish and others added 2 commits October 26, 2025 10:08

Fix: python3.9 compatibility message in docs chaoss#3266

f3f4065

Signed-off-by: PredictiveManish <manisht0914@gmail.com>

Merge branch 'chaoss:main' into docs-update

524d25d

MoralCode and others added 12 commits October 27, 2025 15:12

install mypy

c615eb0

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

add mypy config to toml file

718034a

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

add missing types packages as suggested by mypy

2fbffd1

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

broaden checking to other application db files too

7f5056e

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

type fixes for Application DB files

85e317c

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

Merge pull request chaoss#3330 from PredictiveManish/fix-api-visible-…

c2e3fe1

…format Fixing the API key Visibility issue in debug mode

Update docs/source/getting-started/installation.rst

b902e0d

Co-authored-by: Adrian Edwards <17362949+MoralCode@users.noreply.github.com> Signed-off-by: Manish Tiwari <manish.tiwari.09@zohomail.in>

testing

15f1f27

Signed-off-by: Sean P. Goggins <s@goggins.com>

Merge pull request chaoss#3351 from chaoss/docker-patch-spg-1

16d8b60

testing docker config for empty database

Merge pull request chaoss#3341 from PredictiveManish/docs-update

ce9fb34

Fix: python3.9 compatibility message in docs chaoss#3266

Merge pull request chaoss#3315 from chaoss/mypy

0c57454

Alternative mypy setup

remove empty db image build

5acb58b

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

mohsinm-dev force-pushed the fix-hardcoded-collection-intervals branch from f829663 to 8201ed8 Compare October 30, 2025 05:06

coderabbitai bot reviewed Oct 30, 2025

View reviewed changes

sgoggins and others added 6 commits October 30, 2025 15:16

Merge pull request chaoss#3332 from PredictiveManish/fix-variables-issue

56b4ee0

Fix variables issue in chaoss#3183

Merge pull request chaoss#3355 from chaoss/remove-empty-db

4221cba

remove the "empty database" docker build step to avoid blocking release Adrian and I are merging this with only "one" approval as its been discussed in maintainer meetings.

Fix hardcoded collection intervals by reading from config at call time

e29a1d8

Move config injection to call site and use days_until_collect_again p…

9ad93b6

…arameter

add unit values to the new variables

28045ef

add note to the docs about the new config items

b99a182

MoralCode force-pushed the fix-hardcoded-collection-intervals branch from 6e4db56 to b99a182 Compare October 31, 2025 13:23

coderabbitai bot reviewed Oct 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix hardcoded collection intervals in multiple spots #1

Fix hardcoded collection intervals in multiple spots #1

Uh oh!

mohsinm-dev commented Oct 25, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 25, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

MoralCode commented Oct 27, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 31, 2025

Uh oh!

coderabbitai bot Oct 31, 2025

Uh oh!

coderabbitai bot Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

	Our REST API & data collection workers query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository. Values for GitLab and GitHub access tokens are required for data collection and must be provided (an invalid token can be provided if you don't plan to use one platform) .
	Our REST API & data collection workers query the GitHub & GitLab API to collect data about issues, pull requests, contributors, and other information about a repository. GitHub and GitLab access tokens are required for data collection. If you don't plan to use one platform, you may provide a placeholder or invalid token for that platform.

Fix hardcoded collection intervals in multiple spots #1

Are you sure you want to change the base?

Fix hardcoded collection intervals in multiple spots #1

Uh oh!

Conversation

mohsinm-dev commented Oct 25, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes Made

Technical Details

Testing

Before/After Example

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MoralCode commented Oct 27, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

mohsinm-dev commented Oct 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 25, 2025 •

edited

Loading