Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/configuration/system/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ System
task-scheduler
time-zone
updates
watchdog


.. toctree::
Expand Down
191 changes: 191 additions & 0 deletions docs/configuration/system/watchdog.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
.. _system_watchdog:

########
Watchdog
########

VyOS supports hardware watchdog timers to automatically reboot the system if
it becomes unresponsive. This is particularly useful for remote or embedded
systems where physical access is limited.

A watchdog timer is a hardware or software mechanism that automatically resets
the system if the operating system stops responding within a configured timeout
period. The system will periodically notify the watchdog that it is still
running. If the watchdog is not notified within the timeout period, the watchdog
will reset the system.

Configuration
=============

The watchdog feature is configured under the ``system watchdog`` configuration
tree. The presence of the ``system watchdog`` node enables the watchdog feature.

.. cfgcmd:: set system watchdog

Enable hardware watchdog support. This command creates the watchdog
configuration node, which automatically enables watchdog functionality.

.. cfgcmd:: set system watchdog module <module-name>

Specify the kernel module to load for the watchdog device.

**In most cases, this option is not required** as the kernel will automatically
load the appropriate hardware watchdog module for your system. Only use this
option if the kernel fails to automatically load the required module, such as
when you want to use the software watchdog (``softdog``) instead of a hardware
watchdog.

Common modules include:

* ``softdog`` - Software watchdog timer (available on all systems)
* ``iTCO_wdt`` - Intel TCO watchdog timer
* ``sp5100_tco`` - AMD SP5100 TCO watchdog timer
* ``i6300esb`` - Intel 6300ESB watchdog timer

.. warning:: ``softdog`` is not a real hardware watchdog and is implemented
using kernel timers. It should only be used if the system does not support
a real hardware watchdog. Hardware watchdog modules are more reliable as
they operate independently of the operating system kernel.

If no module is specified, VyOS will attempt to use an existing
``/dev/watchdog0`` device if available.

Example:

.. code-block:: none

set system watchdog module softdog

.. cfgcmd:: set system watchdog timeout <seconds>

Set the watchdog timeout for normal runtime operation in seconds.

Valid range: 1-86400 seconds (1 second to 24 hours)

Default: 10 seconds

This is the interval during which the system must respond to the watchdog.
If the system does not respond within this time, the watchdog will trigger
a reboot.

Example:

.. code-block:: none

set system watchdog timeout 30

.. cfgcmd:: set system watchdog shutdown-timeout <seconds>

Set the watchdog timeout during system shutdown in seconds.

Valid range: 60-86400 seconds (60 seconds to 24 hours)

Default: 120 seconds

This extended timeout allows the system to complete a graceful shutdown
without triggering the watchdog.

.. warning:: Setting this value too low (below 120 seconds) may cause
unclean shutdowns, as the system may not have enough time to properly
stop all services and flush disk buffers. The recommended minimum value
is 120 seconds.

Example:

.. code-block:: none

set system watchdog shutdown-timeout 180

.. cfgcmd:: set system watchdog reboot-timeout <seconds>

Set the watchdog timeout during system reboot in seconds.

Valid range: 60-86400 seconds (60 seconds to 24 hours)

Default: 120 seconds

This extended timeout allows the system to complete the reboot process
without triggering the watchdog during the transition.

.. warning:: Setting this value too low (below 120 seconds) may cause
unclean reboots, as the system may not have enough time to properly
stop all services before restarting. The recommended minimum value
is 120 seconds.

Example:

.. code-block:: none

set system watchdog reboot-timeout 180

Examples
========

Basic Configuration with Software Watchdog
-------------------------------------------

This example configures a basic software watchdog with default timeouts:

.. code-block:: none

set system watchdog
set system watchdog module softdog

This will:

* Enable the watchdog feature
* Load the ``softdog`` kernel module
* Use a 10-second runtime timeout (default)
* Use 120-second shutdown and reboot timeouts (default)

Advanced Configuration
----------------------

This example shows a more customized configuration suitable for a production
system:

.. code-block:: none

set system watchdog
set system watchdog module iTCO_wdt
set system watchdog timeout 30
set system watchdog shutdown-timeout 300
set system watchdog reboot-timeout 300

This configuration:

* Enables the watchdog feature
* Loads the Intel TCO hardware watchdog module
* Sets a 30-second runtime timeout
* Allows 5 minutes for shutdown and reboot operations

Best Practices
==============

* **Start with conservative timeouts**: Use longer timeouts initially and
reduce them as you gain confidence in system stability.

* **Test before deployment**: Verify the watchdog works as expected in a
non-production environment before deploying to production systems.

* **Choose appropriate modules**: Use hardware watchdog modules (like
``iTCO_wdt``) when available, as they are more reliable than software
watchdogs.

* **Consider shutdown time**: Set ``shutdown-timeout`` and ``reboot-timeout``
values high enough to allow for normal shutdown procedures, especially on
systems with many services or slow storage.

* **Monitor watchdog events**: Check system logs after any unexpected reboots
to determine if the watchdog triggered the reboot.

* **Remote systems**: For systems without physical console access, use
conservative timeout values to avoid false-positive reboots during high
load conditions.

.. note:: The watchdog configuration takes effect immediately after commit,
but systemd must be reloaded. This happens automatically during commit.

.. warning:: Incorrect watchdog configuration on remote systems can result
in unexpected reboots. Always test watchdog settings in a controlled
environment before deploying to production systems.