|
| 1 | +.. _system_watchdog: |
| 2 | + |
| 3 | +######## |
| 4 | +Watchdog |
| 5 | +######## |
| 6 | + |
| 7 | +VyOS supports hardware watchdog timers to automatically reboot the system if |
| 8 | +it becomes unresponsive. This is particularly useful for remote or embedded |
| 9 | +systems where physical access is limited. |
| 10 | + |
| 11 | +A watchdog timer is a hardware or software mechanism that automatically resets |
| 12 | +the system if the operating system stops responding within a configured timeout |
| 13 | +period. The system will periodically notify the watchdog that it is still |
| 14 | +running. If the watchdog is not notified within the timeout period, the watchdog |
| 15 | +will reset the system. |
| 16 | + |
| 17 | +Configuration |
| 18 | +============= |
| 19 | + |
| 20 | +The watchdog feature is configured under the ``system watchdog`` configuration |
| 21 | +tree. The presence of the ``system watchdog`` node enables the watchdog feature. |
| 22 | + |
| 23 | +.. cfgcmd:: set system watchdog |
| 24 | + |
| 25 | + Enable hardware watchdog support. This command creates the watchdog |
| 26 | + configuration node, which automatically enables watchdog functionality. |
| 27 | + |
| 28 | +.. cfgcmd:: set system watchdog module <module-name> |
| 29 | + |
| 30 | + Specify the kernel module to load for the watchdog device. |
| 31 | + |
| 32 | + **In most cases, this option is not required** as the kernel will automatically |
| 33 | + load the appropriate hardware watchdog module for your system. Only use this |
| 34 | + option if the kernel fails to automatically load the required module, such as |
| 35 | + when you want to use the software watchdog (``softdog``) instead of a hardware |
| 36 | + watchdog. |
| 37 | + |
| 38 | + Common modules include: |
| 39 | + |
| 40 | + * ``softdog`` - Software watchdog timer (available on all systems) |
| 41 | + * ``iTCO_wdt`` - Intel TCO watchdog timer |
| 42 | + * ``sp5100_tco`` - AMD SP5100 TCO watchdog timer |
| 43 | + * ``i6300esb`` - Intel 6300ESB watchdog timer |
| 44 | + |
| 45 | + .. warning:: ``softdog`` is not a real hardware watchdog and is implemented |
| 46 | + using kernel timers. It should only be used if the system does not support |
| 47 | + a real hardware watchdog. Hardware watchdog modules are more reliable as |
| 48 | + they operate independently of the operating system kernel. |
| 49 | + |
| 50 | + If no module is specified, VyOS will attempt to use an existing |
| 51 | + ``/dev/watchdog0`` device if available. |
| 52 | + |
| 53 | + Example: |
| 54 | + |
| 55 | + .. code-block:: none |
| 56 | +
|
| 57 | + set system watchdog module softdog |
| 58 | +
|
| 59 | +.. cfgcmd:: set system watchdog timeout <seconds> |
| 60 | + |
| 61 | + Set the watchdog timeout for normal runtime operation in seconds. |
| 62 | + |
| 63 | + Valid range: 1-86400 seconds (1 second to 24 hours) |
| 64 | + |
| 65 | + Default: 10 seconds |
| 66 | + |
| 67 | + This is the interval during which the system must respond to the watchdog. |
| 68 | + If the system does not respond within this time, the watchdog will trigger |
| 69 | + a reboot. |
| 70 | + |
| 71 | + Example: |
| 72 | + |
| 73 | + .. code-block:: none |
| 74 | +
|
| 75 | + set system watchdog timeout 30 |
| 76 | +
|
| 77 | +.. cfgcmd:: set system watchdog shutdown-timeout <seconds> |
| 78 | + |
| 79 | + Set the watchdog timeout during system shutdown in seconds. |
| 80 | + |
| 81 | + Valid range: 1-86400 seconds (1 second to 24 hours) |
| 82 | + |
| 83 | + Default: 120 seconds |
| 84 | + |
| 85 | + This extended timeout allows the system to complete a graceful shutdown |
| 86 | + without triggering the watchdog. |
| 87 | + |
| 88 | + .. warning:: Setting this value too low (below 120 seconds) may cause |
| 89 | + unclean shutdowns, as the system may not have enough time to properly |
| 90 | + stop all services and flush disk buffers. The recommended minimum value |
| 91 | + is 120 seconds. |
| 92 | + |
| 93 | + Example: |
| 94 | + |
| 95 | + .. code-block:: none |
| 96 | +
|
| 97 | + set system watchdog shutdown-timeout 180 |
| 98 | +
|
| 99 | +.. cfgcmd:: set system watchdog reboot-timeout <seconds> |
| 100 | + |
| 101 | + Set the watchdog timeout during system reboot in seconds. |
| 102 | + |
| 103 | + Valid range: 1-86400 seconds (1 second to 24 hours) |
| 104 | + |
| 105 | + Default: 120 seconds |
| 106 | + |
| 107 | + This extended timeout allows the system to complete the reboot process |
| 108 | + without triggering the watchdog during the transition. |
| 109 | + |
| 110 | + .. warning:: Setting this value too low (below 120 seconds) may cause |
| 111 | + unclean reboots, as the system may not have enough time to properly |
| 112 | + stop all services before restarting. The recommended minimum value |
| 113 | + is 120 seconds. |
| 114 | + |
| 115 | + Example: |
| 116 | + |
| 117 | + .. code-block:: none |
| 118 | +
|
| 119 | + set system watchdog reboot-timeout 180 |
| 120 | +
|
| 121 | +Examples |
| 122 | +======== |
| 123 | + |
| 124 | +Basic Configuration with Software Watchdog |
| 125 | +------------------------------------------- |
| 126 | + |
| 127 | +This example configures a basic software watchdog with default timeouts: |
| 128 | + |
| 129 | +.. code-block:: none |
| 130 | +
|
| 131 | + set system watchdog |
| 132 | + set system watchdog module softdog |
| 133 | +
|
| 134 | +This will: |
| 135 | + |
| 136 | +* Enable the watchdog feature |
| 137 | +* Load the ``softdog`` kernel module |
| 138 | +* Use a 10-second runtime timeout (default) |
| 139 | +* Use 120-second shutdown and reboot timeouts (default) |
| 140 | + |
| 141 | +Advanced Configuration |
| 142 | +---------------------- |
| 143 | + |
| 144 | +This example shows a more customized configuration suitable for a production |
| 145 | +system: |
| 146 | + |
| 147 | +.. code-block:: none |
| 148 | +
|
| 149 | + set system watchdog |
| 150 | + set system watchdog module iTCO_wdt |
| 151 | + set system watchdog timeout 30 |
| 152 | + set system watchdog shutdown-timeout 300 |
| 153 | + set system watchdog reboot-timeout 300 |
| 154 | +
|
| 155 | +This configuration: |
| 156 | + |
| 157 | +* Enables the watchdog feature |
| 158 | +* Loads the Intel TCO hardware watchdog module |
| 159 | +* Sets a 30-second runtime timeout |
| 160 | +* Allows 5 minutes for shutdown and reboot operations |
| 161 | + |
| 162 | +Best Practices |
| 163 | +============== |
| 164 | + |
| 165 | +* **Start with conservative timeouts**: Use longer timeouts initially and |
| 166 | + reduce them as you gain confidence in system stability. |
| 167 | + |
| 168 | +* **Test before deployment**: Verify the watchdog works as expected in a |
| 169 | + non-production environment before deploying to production systems. |
| 170 | + |
| 171 | +* **Choose appropriate modules**: Use hardware watchdog modules (like |
| 172 | + ``iTCO_wdt``) when available, as they are more reliable than software |
| 173 | + watchdogs. |
| 174 | + |
| 175 | +* **Consider shutdown time**: Set ``shutdown-timeout`` and ``reboot-timeout`` |
| 176 | + values high enough to allow for normal shutdown procedures, especially on |
| 177 | + systems with many services or slow storage. |
| 178 | + |
| 179 | +* **Monitor watchdog events**: Check system logs after any unexpected reboots |
| 180 | + to determine if the watchdog triggered the reboot. |
| 181 | + |
| 182 | +* **Remote systems**: For systems without physical console access, use |
| 183 | + conservative timeout values to avoid false-positive reboots during high |
| 184 | + load conditions. |
| 185 | + |
| 186 | +.. note:: The watchdog configuration takes effect immediately after commit, |
| 187 | + but systemd must be reloaded. This happens automatically during commit. |
| 188 | + |
| 189 | +.. warning:: Incorrect watchdog configuration on remote systems can result |
| 190 | + in unexpected reboots. Always test watchdog settings in a controlled |
| 191 | + environment before deploying to production systems. |
0 commit comments