Fix silent backup failures: retry alerts, detect unmounted drives #54

claudio-pi · 2026-02-10T15:01:20Z

Summary

_send_alert() no longer silently swallows failures — retries 3x with exponential backoff (1s, 2s, 4s) and logs on final failure. This was the root cause of never receiving "backups are stale" alerts.
backup_run() validates mount points for /mnt/* and /media/* destinations before running rsync. Prevents silent failures when an external drive disconnects.
_check_backup_freshness() detects unmounted drives and triggers a specific "drive not mounted" alert instead of the generic "backups are stale" message.

Context

The SSD disconnected due to a USB controller error (~Feb 8). Backups failed every hour for days, health check detected it, but _send_alert() discarded the curl response with > /dev/null 2>&1 || true — so the alert never reached Telegram.

Test plan

bash -n syntax validation passes
All 17 health-check.bats tests pass
All 57 Python tests pass
Mount check is conditional: only activates for /mnt/* and /media/* paths (won't break local directory backups or tests using /tmp)
CISO review: GO with conditions (all addressed)

- _send_alert() now retries 3 times with exponential backoff and logs failures instead of silently discarding them (was `> /dev/null || true`) - backup_run() validates that /mnt/* and /media/* destinations are actually mounted before attempting rsync - _check_backup_freshness() detects unmounted backup destinations and triggers a specific "drive not mounted" alert

gemini-code-assist · 2026-02-10T15:01:47Z

Summary of Changes

Hello @claudio-pi, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and reliability of the backup system by addressing critical failure points in both the backup process and the alerting mechanism. It ensures that backup operations are not performed on unmounted drives and that all health check alerts are reliably delivered, preventing silent failures and providing more precise notifications about the state of backups.

Highlights

Alert Reliability: The _send_alert() function no longer silently swallows failures. It now retries sending alerts up to 3 times with exponential backoff (1s, 2s, 4s) and logs any final failures, ensuring that critical 'backups are stale' alerts are not lost.
Backup Destination Validation: The backup_run() function now validates mount points for /mnt/* and /media/* destinations before initiating rsync. This prevents silent backup failures that occur when an external drive becomes disconnected, as it detects empty mount point directories.
Specific Unmounted Drive Alerts: The _check_backup_freshness() function has been enhanced to specifically detect unmounted drives. If a backup destination is found to be unmounted, it triggers a distinct 'drive not mounted' alert instead of the more generic 'backups are stale' message, providing clearer diagnostic information.

Changelog

lib/backup.sh
- Added a conditional check within backup_run to verify if destination paths resembling external drives (/mnt/*, /media/*) are actual mounted filesystems, returning an error if not.
lib/health-check.sh
- Implemented a retry mechanism with exponential backoff for the _send_alert function, along with internal logging of failures after all retries are exhausted.
- Modified _check_backup_freshness to include a check for unmounted backup destinations, specifically for paths under /mnt/* or /media/*.
- Updated the main health check logic to differentiate between a generic 'Backups are stale' alert and a specific 'Backup drive not mounted' alert based on the _check_backup_freshness result.
- Added comments to _send_alert calls, clarifying that || true is used to prevent script abortion, as the function now handles internal failure logging.

Activity

Bash syntax validation (bash -n) passed.
All 17 health-check.bats tests passed.
All 57 Python tests passed.
The new mount check functionality is conditional, only applying to /mnt/* and /media/* paths to avoid breaking other backup scenarios.
A CISO review was conducted, and all conditions raised were addressed.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces crucial reliability improvements by addressing silent failures in backup and alerting mechanisms. The addition of retry logic with exponential backoff in _send_alert is a robust solution to prevent lost alerts. Similarly, validating mount points before backup and in health checks is an excellent proactive measure against failures from disconnected drives. The code is well-structured and the changes are clear. I have one suggestion to improve maintainability by reducing code duplication in the health check script, aligning with our guidelines on refactoring common logic.

lib/health-check.sh

cubic-dev-ai

1 issue found across 2 files

Confidence score: 3/5

There is a concrete user-facing risk: mountpoint in lib/backup.sh will reject valid backup destinations under mounted drives (e.g., /mnt/ssd/backups), which can cause backups to fail unexpectedly.
This is a medium-severity logic issue (6/10) in a critical path, so merge risk is moderate despite being localized.
Pay close attention to lib/backup.sh - mount check logic rejects valid subdirectories under mounted drives.

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="lib/backup.sh">

<violation number="1" location="lib/backup.sh:30">
P2: The mount check incorrectly rejects valid subdirectories under mounted drives because `mountpoint` only succeeds for the mount root. A destination like `/mnt/ssd/backups` will fail even when `/mnt/ssd` is mounted. Consider checking the mount point of the destination’s filesystem instead of the destination path itself.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

lib/backup.sh

- Use findmnt --target instead of mountpoint -q to correctly handle subdirectories under mount points (e.g. /mnt/ssd/backups when /mnt/ssd is the actual mount). Falls back to mountpoint on the root component when findmnt is unavailable. - _check_backup_freshness now returns distinct exit codes (0=fresh, 1=stale, 2=unmounted) so the caller uses the exit code instead of re-running the mount check — eliminates the duplicated logic.

lib/health-check.sh

Source telegram.sh in health-check.sh and delegate alert sending to telegram_send_message, which already handles retries via telegram_api, message chunking for >4096 chars, and parse-mode fallback. Removes ~30 lines of duplicated curl/retry logic from _send_alert.

gemini-code-assist bot reviewed Feb 10, 2026

View reviewed changes

lib/health-check.sh Outdated Show resolved Hide resolved

cubic-dev-ai bot reviewed Feb 10, 2026

View reviewed changes

lib/backup.sh Outdated Show resolved Hide resolved

edgarjs reviewed Feb 10, 2026

View reviewed changes

lib/health-check.sh Outdated Show resolved Hide resolved

edgarjs approved these changes Feb 10, 2026

View reviewed changes

claudio-pi merged commit 8fcd9b7 into main Feb 10, 2026
4 checks passed

claudio-pi deleted the fix/backup-alert-reliability branch February 10, 2026 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix silent backup failures: retry alerts, detect unmounted drives #54

Fix silent backup failures: retry alerts, detect unmounted drives #54

claudio-pi commented Feb 10, 2026

Uh oh!

gemini-code-assist bot commented Feb 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix silent backup failures: retry alerts, detect unmounted drives #54

Fix silent backup failures: retry alerts, detect unmounted drives #54

Conversation

claudio-pi commented Feb 10, 2026

Summary

Context

Test plan

Uh oh!

gemini-code-assist bot commented Feb 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants