Skip to content

Comments

Add progressive batching for VRF/VXLAN device cleanup#202

Open
ypcisco wants to merge 1 commit intoAzure:202506from
ypcisco:vrf_vxlan_progressive_batch_delete
Open

Add progressive batching for VRF/VXLAN device cleanup#202
ypcisco wants to merge 1 commit intoAzure:202506from
ypcisco:vrf_vxlan_progressive_batch_delete

Conversation

@ypcisco
Copy link

@ypcisco ypcisco commented Feb 1, 2026

What I did

  • Implemented progressive batching for VRF and VXLAN device cleanup in vrfmgr and vxlanmgr constructors
  • Batch size grows geometrically with 1.25x factor: 1→2→3→4→5→...→500 (max).
  • Start small (batch=1) to prevent startup netlink congestion
  • Devices deleted in parallel within each batch, batches executed sequentially

Why I did it

  • Config reload/cold restart in scaled Vnet scenarios takes long time due to sequential synchronous deletion
  • vrfmgrd and vxlanmgrd are blocked during cleanup, significantly delaying new device creation

How I verified it

  • Created test_VRFMgr_ConstructorCleanup in test_vrf.py.
  • Created test_vxlanmgr_constructor_cleanup in test_vxlan_tunnel.py.

Signed-off-by: Yash Pandit <ypcisco@gmail.com>
@ypcisco ypcisco requested a review from prsunny as a code owner February 1, 2026 07:56
processed += batch_vrfs.size();
batch_vrfs.clear();

unsigned int increment = std::max(1U, current_batch_size >> 2);
Copy link

@anish-n anish-n Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious what is the value of doing it in increments, why don't we just set the batch size to some static number like 100 and do it in a batch like that? I see that we have stated there is startup netlink congestion if we batch too soon, have we seen performance issues if we batch too soon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants