-
Notifications
You must be signed in to change notification settings - Fork 0
Fix silent backup failures: retry alerts, detect unmounted drives #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- _send_alert() now retries 3 times with exponential backoff and logs failures instead of silently discarding them (was `> /dev/null || true`) - backup_run() validates that /mnt/* and /media/* destinations are actually mounted before attempting rsync - _check_backup_freshness() detects unmounted backup destinations and triggers a specific "drive not mounted" alert
Summary of ChangesHello @claudio-pi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness and reliability of the backup system by addressing critical failure points in both the backup process and the alerting mechanism. It ensures that backup operations are not performed on unmounted drives and that all health check alerts are reliably delivered, preventing silent failures and providing more precise notifications about the state of backups. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces crucial reliability improvements by addressing silent failures in backup and alerting mechanisms. The addition of retry logic with exponential backoff in _send_alert is a robust solution to prevent lost alerts. Similarly, validating mount points before backup and in health checks is an excellent proactive measure against failures from disconnected drives. The code is well-structured and the changes are clear. I have one suggestion to improve maintainability by reducing code duplication in the health check script, aligning with our guidelines on refactoring common logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 2 files
Confidence score: 3/5
- There is a concrete user-facing risk:
mountpointinlib/backup.shwill reject valid backup destinations under mounted drives (e.g.,/mnt/ssd/backups), which can cause backups to fail unexpectedly. - This is a medium-severity logic issue (6/10) in a critical path, so merge risk is moderate despite being localized.
- Pay close attention to
lib/backup.sh- mount check logic rejects valid subdirectories under mounted drives.
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="lib/backup.sh">
<violation number="1" location="lib/backup.sh:30">
P2: The mount check incorrectly rejects valid subdirectories under mounted drives because `mountpoint` only succeeds for the mount root. A destination like `/mnt/ssd/backups` will fail even when `/mnt/ssd` is mounted. Consider checking the mount point of the destination’s filesystem instead of the destination path itself.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
- Use findmnt --target instead of mountpoint -q to correctly handle subdirectories under mount points (e.g. /mnt/ssd/backups when /mnt/ssd is the actual mount). Falls back to mountpoint on the root component when findmnt is unavailable. - _check_backup_freshness now returns distinct exit codes (0=fresh, 1=stale, 2=unmounted) so the caller uses the exit code instead of re-running the mount check — eliminates the duplicated logic.
Source telegram.sh in health-check.sh and delegate alert sending to telegram_send_message, which already handles retries via telegram_api, message chunking for >4096 chars, and parse-mode fallback. Removes ~30 lines of duplicated curl/retry logic from _send_alert.
Summary
_send_alert()no longer silently swallows failures — retries 3x with exponential backoff (1s, 2s, 4s) and logs on final failure. This was the root cause of never receiving "backups are stale" alerts.backup_run()validates mount points for/mnt/*and/media/*destinations before running rsync. Prevents silent failures when an external drive disconnects._check_backup_freshness()detects unmounted drives and triggers a specific "drive not mounted" alert instead of the generic "backups are stale" message.Context
The SSD disconnected due to a USB controller error (~Feb 8). Backups failed every hour for days, health check detected it, but
_send_alert()discarded the curl response with> /dev/null 2>&1 || true— so the alert never reached Telegram.Test plan
bash -nsyntax validation passes/mnt/*and/media/*paths (won't break local directory backups or tests using/tmp)