-
-
Notifications
You must be signed in to change notification settings - Fork 112
Server monitoring
Wormly is used for basic monitoring to verify a site is alive. An account costs AUD 25 per month. That includes alerts via SMS when a server is down. Here are some examples of the Australian configuration:
-
https://openfoodnetwork.org.au
(check every 30 seconds, alert after 5 minutes)- Expected text: Food, unincorporated
- Expected HTTP response: 200 OK
- Min. SSL cert. validity (days): 5
-
https://openfoodnetwork.org.au/
(check every 12 hours, alert after 5 minutes)- Expected HTTP response: 200 OK
- Min. SSL cert. validity (days): 20
-
https://openfoodnetwork.org.au/api/status/job_queue
(check every 2 minutes, alert after 5 minutes)- Expected text: {"alive":true}
- Expected HTTP response: 200 OK
The different alert times are configured via alert groups. We have separate groups for delayed job and SSL/TLS certificates so that a dev doesn't freak out when a certificate will expire in 20 days. They then have time to fix the configuration.
All managed instances are monitored by New Relic. Application Performance Monitoring (APM) is only activated for au-prod because it slows down server response times by 30% and our plan may not allow for all the data of all instances.
Alerts are set up for three infrastructure conditions:
- Host not responding - selected hosts only, you need to add new hosts to this.
- Memory almost full - all hosts (90%)
- Disk almost full - all hosts (90%)
Notifications go to Slack channel #devops-alerts
. You can also set up your email or mobile phone app to receive notifications.
Our account has not-for-profit status through the Open Food Foundation in Australia.
Simple availability checker: https://kuma.openfoodnetwork.org.uk/status/global
We used Datadog for several years but it got too expensive because they charge per host and each country has its own server. So we switched to New Relic.
Provisioning
Deployment
Sysadmin
External services
- Sending emails from OFN
- Email Deliverability
- SSL certificates
- Google Maps
- File storage
- Backups
- Server monitoring
- Issue reporting (Bugsnag)
Contributing