Skip to content

T7488: add utility for automatic rollback of section on apply stage error #4552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 12, 2025

Conversation

jestabro
Copy link
Contributor

@jestabro jestabro commented Jun 10, 2025

Change summary

This needs the corresponding PR for vyatta-cfg (vyos/vyatta-cfg#102) to have effect within a config session. The current PR will need to be merged first.

Provide a utility for automatic rollback of a config section in case of an apply stage error.
This is the required tool for the VPP restart work of PR vyos/vyos-vpp#34

Under the modern backend, this will simply be a post-commit hook, however, for current use under the legacy backend, it requires some workarounds, notably because we are constrained by the legacy locking mechanism, which prevents a post commit hook from calling commit. This is already possible under vyconf which uses distinct locks for data vs. session.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes)
  • Migration from an old Vyatta component to vyos-1x, please link to related PR inside obsoleted component
  • Other (please describe):

Related Task(s)

Related PR(s)

vyos/vyatta-cfg#102
vyos/vyos-vpp#34

How to test / Smoketest result

Tested by @natali-rs1985 in the context of vyos/vyos-vpp#34

Checklist:

  • I have read the CONTRIBUTING document
  • I have linked this PR to one or more Phabricator Task(s)
  • I have run the components SMOKETESTS if applicable
  • My commit headlines contain a valid Task id
  • My change requires a change to the documentation
  • I have updated the documentation accordingly

@jestabro jestabro self-assigned this Jun 10, 2025
Copy link

github-actions bot commented Jun 10, 2025

👍
No issues in PR Title / Commit Title

Copilot

This comment was marked as outdated.

@sever-sever sever-sever requested a review from Copilot June 10, 2025 19:48
Copilot

This comment was marked as outdated.

@jestabro jestabro force-pushed the reset-section branch 3 times, most recently from e7a7b7d to 91523ff Compare June 10, 2025 21:39
@jestabro jestabro requested review from Copilot and sever-sever June 10, 2025 21:40
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces an automatic rollback mechanism for configuration sections when an apply-stage error occurs. It adds a new error flag, emits a hint file on failure, and provides a helper script to rollback the affected section.

  • Introduce ERROR_COMMIT_APPLY in vyshim and configd to signal apply-stage failures
  • Create reset_section.py helper to rollback or retry a failed section based on a hint file
  • Extend ConfigSession with a shared mode to prevent premature session teardown

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/shim/vyshim.c Add ERROR_COMMIT_APPLY flag, parse session PID, and write a hint file
src/services/vyos-configd Add ERROR_COMMIT_APPLY response code and separate commit vs apply logic
src/helpers/reset_section.py New CLI helper for reloading or rolling back a section using the hint
python/vyos/configsession.py Add shared parameter to skip teardown in shared-session scenarios
Comments suppressed due to low confidence (3)

src/helpers/reset_section.py:54

  • [nitpick] The variable name 'reload' shadows a built-in and the 'rollback' variable is never used. Rename 'reload' (e.g., to 'is_reload') and remove or utilize the unused 'rollback' flag.
reload = args.reload

python/vyos/configsession.py:149

  • The new 'shared' parameter alters teardown behavior but isn't documented. Please update the constructor docstring to explain its purpose and the effect on session cleanup.
def __init__(self, session_id, app=APP, shared=False):

src/helpers/reset_section.py:1

  • Consider adding automated tests for the reset_section helper to verify both reload and rollback flows, including cases where the hint file is present or absent.
#!/usr/bin/env python3

jestabro added 4 commits June 10, 2025 18:12
Leave hint if vyos-configd encounters an error in the generate/apply
stages: this only detects 'first-order' differences, meaning those
originating from the called config mode script, and not its
dependencies. This is useful for supporting automatic rollback for
certain cases of apply stage error.
Copy link

CI integration 👍 passed!

Details

CI logs

  • CLI Smoketests (no interfaces) 👍 passed
  • CLI Smoketests (interfaces only) 👍 passed
  • Config tests 👍 passed
  • RAID1 tests 👍 passed
  • TPM tests 👍 passed

Copy link
Member

@dmbaturin dmbaturin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start that will eventually be generalized to all sections.

Copy link
Member

@sever-sever sever-sever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as expected:

vyos@r14# set vpp settings buffers page-size 1G 
[edit]
vyos@r14# 
[edit]
vyos@r14# commit
[ vpp ]
Traceback (most recent call last):
  File "/usr/libexec/vyos/services/vyos-configd", line 156, in run_script
    script.apply(c)
  File "/usr/libexec/vyos//conf_mode/vpp.py", line 676, in apply
    vpp_control = VPPControl(attempts=20, interval=500)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/vyos/vpp/control_vpp.py", line 109, in __init__
    raise VPPIOError(2, 'Cannot connect to VPP API')
vpp_papi.vpp_papi.VPPIOError: [Errno 2] Cannot connect to VPP API

[[vpp]] failed
Commit failed

[edit]
vyos@r14# run show vpp interfaces 
Kernel    Dataplane    Type    IP Address     MAC                  MTU  State
--------  -----------  ------  -------------  -----------------  -----  -------
          eth1         dpdk    100.64.0.1/24  52:54:00:28:23:f1   1500  up
          eth1.11      dpdk                   00:00:00:00:00:00   1500  up
          eth1.12      dpdk                   00:00:00:00:00:00   1500  up
          eth1.14      dpdk                   00:00:00:00:00:00   1500  up
          local0       local                  00:00:00:00:00:00      0  down
eth1      tap4096      virtio                 02:fe:84:20:15:ae   9000  up
eth1.11   tap4096.11   virtio                 00:00:00:00:00:00      0  up
eth1.12   tap4096.12   virtio                 00:00:00:00:00:00      0  up
eth1.14   tap4096.14   virtio                 00:00:00:00:00:00      0  up
[edit]
vyos@r14# run show conf com | match vpp
set vpp settings interface eth1 driver 'dpdk'
set vpp settings ipv6 heap-size '32G'
set vpp settings physmem max-size '100G'
set vpp settings unix poll-sleep-usec '222'
[edit]
vyos@r14# compare 
No changes between working and active configurations.

[edit]
vyos@r14# 

Logs:

Jun 12 19:38:48 r14 vyos-configd[8114]: commit_scripts: ['vpp']
Jun 12 19:38:48 r14 vyos-configd[8114]: Received message: {"type": "node", "last": true, "data": "/usr/libexec/vyos/conf_mode/vpp.py"}
Jun 12 19:38:48 r14 systemd[1]: Reloading.
Jun 12 19:38:48 r14 vpp[20698]: received signal SIGTERM, PC 0x7ff9aee93545
Jun 12 19:38:48 r14 vpp[20698]: received SIGTERM from PID 1 UID 0, exiting...
Jun 12 19:38:48 r14 systemd[1]: Stopping vector packet processing engine...
Jun 12 19:38:48 r14 systemd[1]: vpp.service: Deactivated successfully.
Jun 12 19:38:48 r14 systemd[1]: Stopped vector packet processing engine.
Jun 12 19:38:48 r14 systemd[1]: vpp.service: Consumed 3.693s CPU time.
Jun 12 19:38:48 r14 systemd[1]: Starting vector packet processing engine...
Jun 12 19:38:48 r14 systemd[1]: Started vector packet processing engine.
Jun 12 19:38:48 r14 vpp[21202]: vpp[21202]: vlib_physmem_shared_map_create: clib_pmalloc_create_shared_arena: unsupported page size (1048576KB)
Jun 12 19:38:48 r14 vpp[21202]: vpp[21202]: vlib_buffer_main_init: failed to allocate buffer pool(s)
Jun 12 19:38:48 r14 vpp[21202]: vlib_physmem_shared_map_create: clib_pmalloc_create_shared_arena: unsupported page size (1048576KB)
Jun 12 19:38:48 r14 vpp[21202]: vlib_buffer_main_init: failed to allocate buffer pool(s)
Jun 12 19:38:48 r14 systemd[1]: vpp.service: Deactivated successfully.
Jun 12 19:38:48 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:49 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:49 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:50 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:50 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:51 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:51 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:52 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:52 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:53 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:53 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:54 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:54 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:55 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:55 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:56 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:56 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:57 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:57 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:58 r14 python3[8114]: VPP API connection timeout: [Errno 111] Connection refused
Jun 12 19:38:58 r14 vyos-configd[8114]: Traceback (most recent call last):
Jun 12 19:38:58 r14 vyos-configd[8114]:   File "/usr/libexec/vyos/services/vyos-configd", line 156, in run_script
Jun 12 19:38:58 r14 vyos-configd[8114]:     script.apply(c)
Jun 12 19:38:58 r14 vyos-configd[8114]:   File "/usr/libexec/vyos//conf_mode/vpp.py", line 676, in apply
Jun 12 19:38:58 r14 vyos-configd[8114]:     vpp_control = VPPControl(attempts=20, interval=500)
Jun 12 19:38:58 r14 vyos-configd[8114]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jun 12 19:38:58 r14 vyos-configd[8114]:   File "/usr/lib/python3/dist-packages/vyos/vpp/control_vpp.py", line 109, in __init__
Jun 12 19:38:58 r14 vyos-configd[8114]:     raise VPPIOError(2, 'Cannot connect to VPP API')
Jun 12 19:38:58 r14 vyos-configd[8114]: vpp_papi.vpp_papi.VPPIOError: [Errno 2] Cannot connect to VPP API
Jun 12 19:38:58 r14 vyos-configd[8114]: Sending reply: ERROR_COMMIT_APPLY with output
Jun 12 19:38:58 r14 vyos-configd[8114]: scripts_called: ['vpp']
Jun 12 19:38:59 r14 systemd[1]: opt-vyatta-config-tmp-new_config_8472.mount: Deactivated successfully.
Jun 12 19:39:01 r14 vyos-configd[8114]: Received message: {"type": "init"}
Jun 12 19:39:01 r14 vyos-configd[8114]: config session pid is 8472
Jun 12 19:39:01 r14 vyos-configd[8114]: config session sudo_user is vyos
Jun 12 19:39:01 r14 vyos-configd[8114]: commit_scripts: ['vpp']
Jun 12 19:39:01 r14 vyos-configd[8114]: Received message: {"type": "node", "last": true, "data": "/usr/libexec/vyos/conf_mode/vpp.py"}
Jun 12 19:39:01 r14 kernel: pci 0000:07:00.0: [1af4:1041] type 00 class 0x020000
Jun 12 19:39:01 r14 kernel: pci 0000:07:00.0: reg 0x14: [mem 0xfdc80000-0xfdc80fff]
Jun 12 19:39:01 r14 kernel: pci 0000:07:00.0: reg 0x20: [mem 0x383800000000-0x383800003fff 64bit pref]
Jun 12 19:39:01 r14 kernel: pci 0000:07:00.0: reg 0x30: [mem 0xfdc00000-0xfdc7ffff pref]
Jun 12 19:39:01 r14 kernel: pci 0000:07:00.0: BAR 6: assigned [mem 0xfdc00000-0xfdc7ffff pref]
Jun 12 19:39:01 r14 kernel: pci 0000:07:00.0: BAR 4: assigned [mem 0x383800000000-0x383800003fff 64bit pref]
Jun 12 19:39:01 r14 kernel: pci 0000:07:00.0: BAR 1: assigned [mem 0xfdc80000-0xfdc80fff]
Jun 12 19:39:01 r14 vyos_net_name[21264]: Started with arguments: ['/lib/udev/vyos_net_name', 'eth1', '52:54:00:28:23:f1']
Jun 12 19:39:01 r14 vyos_net_name[21264]: boot configuration complete
Jun 12 19:39:01 r14 vyos_net_name[21264]: Finished
Jun 12 19:39:01 r14 (udev-worker)[21263]: Network interface NamePolicy= disabled on kernel command line.
Jun 12 19:39:01 r14 kernel: 8021q: adding VLAN 0 to HW filter on device eth1
Jun 12 19:39:01 r14 vyos-configd[8114]: Sending reply: SUCCESS with output
Jun 12 19:39:01 r14 vyos-configd[8114]: scripts_called: ['vpp']
Jun 12 19:39:02 r14 systemd[1]: opt-vyatta-config-tmp-new_config_8472.mount: Deactivated successfully.
Jun 12 19:39:03 r14 commit[21465]: Successful change to active configuration by user vyos on /dev/pts/0
Jun 12 19:39:03 r14 vyos-configd[8114]: Received message: {"type": "init"}
Jun 12 19:39:03 r14 vyos-configd[8114]: config session pid is 8472
Jun 12 19:39:03 r14 vyos-configd[8114]: config session sudo_user is vyos
Jun 12 19:39:03 r14 vyos-configd[8114]: commit_scripts: ['vpp']
Jun 12 19:39:03 r14 vyos-configd[8114]: Received message: {"type": "node", "last": true, "data": "/usr/libexec/vyos/conf_mode/vpp.py"}
Jun 12 19:39:03 r14 systemd[1]: Reloading.
Jun 12 19:39:03 r14 systemd[1]: Starting vector packet processing engine...
Jun 12 19:39:03 r14 systemd[1]: Started vector packet processing engine.

@sever-sever sever-sever merged commit 8065232 into vyos:current Jun 12, 2025
15 of 16 checks passed
@github-actions github-actions bot added the mirror-initiated This PR initiated for mirror sync workflow label Jun 12, 2025
@vyosbot vyosbot added mirror-completed and removed mirror-initiated This PR initiated for mirror sync workflow labels Jun 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants