Skip to content

Commit d78584b

Browse files
committed
T7101: Add hardware watchdog support via systemd
1 parent 0cc6bbe commit d78584b

File tree

2 files changed

+192
-0
lines changed

2 files changed

+192
-0
lines changed

docs/configuration/system/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ System
2626
task-scheduler
2727
time-zone
2828
updates
29+
watchdog
2930

3031

3132
.. toctree::
Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
.. _system_watchdog:
2+
3+
########
4+
Watchdog
5+
########
6+
7+
VyOS supports hardware watchdog timers to automatically reboot the system if
8+
it becomes unresponsive. This is particularly useful for remote or embedded
9+
systems where physical access is limited.
10+
11+
A watchdog timer is a hardware or software mechanism that automatically resets
12+
the system if the operating system stops responding within a configured timeout
13+
period. The system will periodically notify the watchdog that it is still
14+
running. If the watchdog is not notified within the timeout period, the watchdog
15+
will reset the system.
16+
17+
Configuration
18+
=============
19+
20+
The watchdog feature is configured under the ``system watchdog`` configuration
21+
tree. The presence of the ``system watchdog`` node enables the watchdog feature.
22+
23+
.. cfgcmd:: set system watchdog
24+
25+
Enable hardware watchdog support. This command creates the watchdog
26+
configuration node, which automatically enables watchdog functionality.
27+
28+
.. cfgcmd:: set system watchdog module <module-name>
29+
30+
Specify the kernel module to load for the watchdog device.
31+
32+
**In most cases, this option is not required** as the kernel will automatically
33+
load the appropriate hardware watchdog module for your system. Only use this
34+
option if the kernel fails to automatically load the required module, such as
35+
when you want to use the software watchdog (``softdog``) instead of a hardware
36+
watchdog.
37+
38+
Common modules include:
39+
40+
* ``softdog`` - Software watchdog timer (available on all systems)
41+
* ``iTCO_wdt`` - Intel TCO watchdog timer
42+
* ``sp5100_tco`` - AMD SP5100 TCO watchdog timer
43+
* ``i6300esb`` - Intel 6300ESB watchdog timer
44+
45+
.. warning:: ``softdog`` is not a real hardware watchdog and is implemented
46+
using kernel timers. It should only be used if the system does not support
47+
a real hardware watchdog. Hardware watchdog modules are more reliable as
48+
they operate independently of the operating system kernel.
49+
50+
If no module is specified, VyOS will attempt to use an existing
51+
``/dev/watchdog0`` device if available.
52+
53+
Example:
54+
55+
.. code-block:: none
56+
57+
set system watchdog module softdog
58+
59+
.. cfgcmd:: set system watchdog timeout <seconds>
60+
61+
Set the watchdog timeout for normal runtime operation in seconds.
62+
63+
Valid range: 1-86400 seconds (1 second to 24 hours)
64+
65+
Default: 10 seconds
66+
67+
This is the interval during which the system must respond to the watchdog.
68+
If the system does not respond within this time, the watchdog will trigger
69+
a reboot.
70+
71+
Example:
72+
73+
.. code-block:: none
74+
75+
set system watchdog timeout 30
76+
77+
.. cfgcmd:: set system watchdog shutdown-timeout <seconds>
78+
79+
Set the watchdog timeout during system shutdown in seconds.
80+
81+
Valid range: 1-86400 seconds (1 second to 24 hours)
82+
83+
Default: 120 seconds
84+
85+
This extended timeout allows the system to complete a graceful shutdown
86+
without triggering the watchdog.
87+
88+
.. warning:: Setting this value too low (below 120 seconds) may cause
89+
unclean shutdowns, as the system may not have enough time to properly
90+
stop all services and flush disk buffers. The recommended minimum value
91+
is 120 seconds.
92+
93+
Example:
94+
95+
.. code-block:: none
96+
97+
set system watchdog shutdown-timeout 180
98+
99+
.. cfgcmd:: set system watchdog reboot-timeout <seconds>
100+
101+
Set the watchdog timeout during system reboot in seconds.
102+
103+
Valid range: 1-86400 seconds (1 second to 24 hours)
104+
105+
Default: 120 seconds
106+
107+
This extended timeout allows the system to complete the reboot process
108+
without triggering the watchdog during the transition.
109+
110+
.. warning:: Setting this value too low (below 120 seconds) may cause
111+
unclean reboots, as the system may not have enough time to properly
112+
stop all services before restarting. The recommended minimum value
113+
is 120 seconds.
114+
115+
Example:
116+
117+
.. code-block:: none
118+
119+
set system watchdog reboot-timeout 180
120+
121+
Examples
122+
========
123+
124+
Basic Configuration with Software Watchdog
125+
-------------------------------------------
126+
127+
This example configures a basic software watchdog with default timeouts:
128+
129+
.. code-block:: none
130+
131+
set system watchdog
132+
set system watchdog module softdog
133+
134+
This will:
135+
136+
* Enable the watchdog feature
137+
* Load the ``softdog`` kernel module
138+
* Use a 10-second runtime timeout (default)
139+
* Use 120-second shutdown and reboot timeouts (default)
140+
141+
Advanced Configuration
142+
----------------------
143+
144+
This example shows a more customized configuration suitable for a production
145+
system:
146+
147+
.. code-block:: none
148+
149+
set system watchdog
150+
set system watchdog module iTCO_wdt
151+
set system watchdog timeout 30
152+
set system watchdog shutdown-timeout 300
153+
set system watchdog reboot-timeout 300
154+
155+
This configuration:
156+
157+
* Enables the watchdog feature
158+
* Loads the Intel TCO hardware watchdog module
159+
* Sets a 30-second runtime timeout
160+
* Allows 5 minutes for shutdown and reboot operations
161+
162+
Best Practices
163+
==============
164+
165+
* **Start with conservative timeouts**: Use longer timeouts initially and
166+
reduce them as you gain confidence in system stability.
167+
168+
* **Test before deployment**: Verify the watchdog works as expected in a
169+
non-production environment before deploying to production systems.
170+
171+
* **Choose appropriate modules**: Use hardware watchdog modules (like
172+
``iTCO_wdt``) when available, as they are more reliable than software
173+
watchdogs.
174+
175+
* **Consider shutdown time**: Set ``shutdown-timeout`` and ``reboot-timeout``
176+
values high enough to allow for normal shutdown procedures, especially on
177+
systems with many services or slow storage.
178+
179+
* **Monitor watchdog events**: Check system logs after any unexpected reboots
180+
to determine if the watchdog triggered the reboot.
181+
182+
* **Remote systems**: For systems without physical console access, use
183+
conservative timeout values to avoid false-positive reboots during high
184+
load conditions.
185+
186+
.. note:: The watchdog configuration takes effect immediately after commit,
187+
but systemd must be reloaded. This happens automatically during commit.
188+
189+
.. warning:: Incorrect watchdog configuration on remote systems can result
190+
in unexpected reboots. Always test watchdog settings in a controlled
191+
environment before deploying to production systems.

0 commit comments

Comments
 (0)