Skip to content

[sonic-mgmt] Fix flakiness of "arp/test_stress_arp.py"#21616

Merged
bingwang-ms merged 1 commit intosonic-net:masterfrom
vkjammala-arista:fix-test-stress-arp
Dec 18, 2025
Merged

[sonic-mgmt] Fix flakiness of "arp/test_stress_arp.py"#21616
bingwang-ms merged 1 commit intosonic-net:masterfrom
vkjammala-arista:fix-test-stress-arp

Conversation

@vkjammala-arista
Copy link
Contributor

Description of PR

Summary: Fix flakiness of arp/test_stress_arp.py
Fixes # https://github.com/aristanetworks/sonic-qual.msft/issues/948

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

  1. arp/test_stress_arp.py::test_ipv6_nd_incomplete is failing during teardown with
 Match Messages:
E               2025 Nov 16 15:46:05.479379 bjw3-can-7260-13 ERR monit[644974]: 'dualtorNeighborCheck' status failed (1) -- no output

Based on the syslogs, we can see there are some stale arp entries in ASIC of standby dut which are populated due to previous testcase test_ipv4_arp which is sending bunch of arp response packets for different ip addresses in subnet 172.16.0.1/16

2025 Dec  3 16:08:01.555784 bjw3-can-7260-14 ERR dualtor_neighbor_check.py: NEIGHBOR       MAC                PORT       MUX_STATE    IN_MUX_TOGGLE    NEIGHBOR_IN_ASIC    TUNNEL_IN_ASIC    HWSTATUS
2025 Dec  3 16:08:01.555903 bjw3-can-7260-14 ERR dualtor_neighbor_check.py: -------------  -----------------  ---------  -----------  ---------------  ------------------  ----------------  ------------
2025 Dec  3 16:08:01.555956 bjw3-can-7260-14 ERR dualtor_neighbor_check.py: 172.16.21.124  00:00:01:02:18:7e  Ethernet0  standby      no               yes                 no                inconsistent
2025 Dec  3 16:08:01.556004 bjw3-can-7260-14 ERR dualtor_neighbor_check.py: 172.16.21.125  00:00:01:02:18:7f  Ethernet0  standby      no               yes                 no                inconsistent
...
  1. Currently test_ipv6_nd_incomplete can pick different dut than it's config fixtures (due to different fixture usage for test and config fixtures like config_facts etc, which might result in unexpected behaviour

How did you do it?

  1. It's always better to clear arp and fdb tables on both the ToRs before and after the testcase by updating arp_cache_fdb_cleanup fixture to iterate over duthosts for cleanup.

  2. Use enum_rand_one_per_hwsku_frontend_hostname for test_ipv6_nd_incomplete to select the duthost (as being used by other config fixtures like config_facts, ip_and_intf_info etc). [Test was updated recently in PR20780 to use enum_rand_one_per_hwsku_frontend_hostname fixture but it seems mistakenly reverted by PR20932].

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@xwjiang-ms xwjiang-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@lolyu lolyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix

@lolyu lolyu added Request for 202505 branch Request for 202511 branch Request to backport a change to 202511 branch labels Dec 15, 2025
@lolyu
Copy link
Contributor

lolyu commented Dec 15, 2025

Hi @vkjammala-arista, could you please fix the pr checker? Looks you didn't sign your commit.

1) Use "enum_rand_one_per_hwsku_frontend_hostname" for "test_ipv6_nd_incomplete"
   to select the duthost (as being used by other config fixtures like "config_facts",
   "ip_and_intf_info" etc). [Test was updated recently in PR20780 to use
   "enum_rand_one_per_hwsku_frontend_hostname" fixture but it seems mistakenly
   reverted by sonic-net#20932]

2) Most of testcases are sending protocol packets (arp, ipv6_echo requests etc) from
   server towards ToR device. On dualtor, these protocol packets will be processed by
   both the ToR duts and thus NEIGH_TABLE in APPL_DB will be populated in both the
   duts (though hardware/ASIC_DB programming will happen only on active ToR).

   Fix: It's always better to clear arp and fdb tables on both the ToRs before and
   after the testcase by updating "arp_cache_fdb_cleanup" fixture to iterate over
   "duthosts" for cleanup.

Signed-off-by: Vinod <vkjammala@arista.com>
@mssonicbld
Copy link
Collaborator

/azp run

@github-actions github-actions bot requested a review from xwjiang-ms December 15, 2025 05:03
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vkjammala-arista
Copy link
Contributor Author

Hi @vkjammala-arista, could you please fix the pr checker? Looks you didn't sign your commit.

Thanks @lolyu fixed it!

@bingwang-ms bingwang-ms merged commit 05cbabc into sonic-net:master Dec 18, 2025
18 checks passed
@mssonicbld
Copy link
Collaborator

@vkjammala-arista PR conflicts with 202505 branch

@lolyu
Copy link
Contributor

lolyu commented Dec 18, 2025

Hi @vkjammala-arista please help cherry-pick into 202505

gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 21, 2025
1) Use "enum_rand_one_per_hwsku_frontend_hostname" for "test_ipv6_nd_incomplete"
   to select the duthost (as being used by other config fixtures like "config_facts",
   "ip_and_intf_info" etc). [Test was updated recently in PR20780 to use
   "enum_rand_one_per_hwsku_frontend_hostname" fixture but it seems mistakenly
   reverted by sonic-net#20932]

2) Most of testcases are sending protocol packets (arp, ipv6_echo requests etc) from
   server towards ToR device. On dualtor, these protocol packets will be processed by
   both the ToR duts and thus NEIGH_TABLE in APPL_DB will be populated in both the
   duts (though hardware/ASIC_DB programming will happen only on active ToR).

   Fix: It's always better to clear arp and fdb tables on both the ToRs before and
   after the testcase by updating "arp_cache_fdb_cleanup" fixture to iterate over
   "duthosts" for cleanup.

Signed-off-by: Vinod <vkjammala@arista.com>
Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
vrajeshe pushed a commit to Akshath-17/sonic-mgmt that referenced this pull request Jan 4, 2026
1) Use "enum_rand_one_per_hwsku_frontend_hostname" for "test_ipv6_nd_incomplete"
   to select the duthost (as being used by other config fixtures like "config_facts",
   "ip_and_intf_info" etc). [Test was updated recently in PR20780 to use
   "enum_rand_one_per_hwsku_frontend_hostname" fixture but it seems mistakenly
   reverted by sonic-net#20932]

2) Most of testcases are sending protocol packets (arp, ipv6_echo requests etc) from
   server towards ToR device. On dualtor, these protocol packets will be processed by
   both the ToR duts and thus NEIGH_TABLE in APPL_DB will be populated in both the
   duts (though hardware/ASIC_DB programming will happen only on active ToR).

   Fix: It's always better to clear arp and fdb tables on both the ToRs before and
   after the testcase by updating "arp_cache_fdb_cleanup" fixture to iterate over
   "duthosts" for cleanup.

Signed-off-by: Vinod <vkjammala@arista.com>
Signed-off-by: Venkata Gouri Rajesh Etla <vrajeshe@cisco.com>
@vkjammala-arista
Copy link
Contributor Author

Hi @vkjammala-arista please help cherry-pick into 202505

Hi @lolyu, i have created #21850 for 202505 branch

venu-nexthop pushed a commit to venu-nexthop/sonic-mgmt that referenced this pull request Jan 13, 2026
1) Use "enum_rand_one_per_hwsku_frontend_hostname" for "test_ipv6_nd_incomplete"
   to select the duthost (as being used by other config fixtures like "config_facts",
   "ip_and_intf_info" etc). [Test was updated recently in PR20780 to use
   "enum_rand_one_per_hwsku_frontend_hostname" fixture but it seems mistakenly
   reverted by sonic-net#20932]

2) Most of testcases are sending protocol packets (arp, ipv6_echo requests etc) from
   server towards ToR device. On dualtor, these protocol packets will be processed by
   both the ToR duts and thus NEIGH_TABLE in APPL_DB will be populated in both the
   duts (though hardware/ASIC_DB programming will happen only on active ToR).

   Fix: It's always better to clear arp and fdb tables on both the ToRs before and
   after the testcase by updating "arp_cache_fdb_cleanup" fixture to iterate over
   "duthosts" for cleanup.

Signed-off-by: Vinod <vkjammala@arista.com>
yifan-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Jan 14, 2026
1) Use "enum_rand_one_per_hwsku_frontend_hostname" for "test_ipv6_nd_incomplete"
   to select the duthost (as being used by other config fixtures like "config_facts",
   "ip_and_intf_info" etc). [Test was updated recently in PR20780 to use
   "enum_rand_one_per_hwsku_frontend_hostname" fixture but it seems mistakenly
   reverted by sonic-net#20932]

2) Most of testcases are sending protocol packets (arp, ipv6_echo requests etc) from
   server towards ToR device. On dualtor, these protocol packets will be processed by
   both the ToR duts and thus NEIGH_TABLE in APPL_DB will be populated in both the
   duts (though hardware/ASIC_DB programming will happen only on active ToR).

   Fix: It's always better to clear arp and fdb tables on both the ToRs before and
   after the testcase by updating "arp_cache_fdb_cleanup" fixture to iterate over
   "duthosts" for cleanup.

Signed-off-by: Vinod <vkjammala@arista.com>
Signed-off-by: YiFan Wang <yifan@nexthop.ai>
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Jan 20, 2026
1) Use "enum_rand_one_per_hwsku_frontend_hostname" for "test_ipv6_nd_incomplete"
   to select the duthost (as being used by other config fixtures like "config_facts",
   "ip_and_intf_info" etc). [Test was updated recently in PR20780 to use
   "enum_rand_one_per_hwsku_frontend_hostname" fixture but it seems mistakenly
   reverted by sonic-net#20932]

2) Most of testcases are sending protocol packets (arp, ipv6_echo requests etc) from
   server towards ToR device. On dualtor, these protocol packets will be processed by
   both the ToR duts and thus NEIGH_TABLE in APPL_DB will be populated in both the
   duts (though hardware/ASIC_DB programming will happen only on active ToR).

   Fix: It's always better to clear arp and fdb tables on both the ToRs before and
   after the testcase by updating "arp_cache_fdb_cleanup" fixture to iterate over
   "duthosts" for cleanup.

Signed-off-by: Vinod <vkjammala@arista.com>
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202511: #22066

PriyanshTratiya pushed a commit to PriyanshTratiya/sonic-mgmt that referenced this pull request Jan 21, 2026
1) Use "enum_rand_one_per_hwsku_frontend_hostname" for "test_ipv6_nd_incomplete"
   to select the duthost (as being used by other config fixtures like "config_facts",
   "ip_and_intf_info" etc). [Test was updated recently in PR20780 to use
   "enum_rand_one_per_hwsku_frontend_hostname" fixture but it seems mistakenly
   reverted by sonic-net#20932]

2) Most of testcases are sending protocol packets (arp, ipv6_echo requests etc) from
   server towards ToR device. On dualtor, these protocol packets will be processed by
   both the ToR duts and thus NEIGH_TABLE in APPL_DB will be populated in both the
   duts (though hardware/ASIC_DB programming will happen only on active ToR).

   Fix: It's always better to clear arp and fdb tables on both the ToRs before and
   after the testcase by updating "arp_cache_fdb_cleanup" fixture to iterate over
   "duthosts" for cleanup.

Signed-off-by: Vinod <vkjammala@arista.com>
Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
StormLiangMS pushed a commit that referenced this pull request Jan 29, 2026
1) Use "enum_rand_one_per_hwsku_frontend_hostname" for "test_ipv6_nd_incomplete"
   to select the duthost (as being used by other config fixtures like "config_facts",
   "ip_and_intf_info" etc). [Test was updated recently in PR20780 to use
   "enum_rand_one_per_hwsku_frontend_hostname" fixture but it seems mistakenly
   reverted by #20932]

2) Most of testcases are sending protocol packets (arp, ipv6_echo requests etc) from
   server towards ToR device. On dualtor, these protocol packets will be processed by
   both the ToR duts and thus NEIGH_TABLE in APPL_DB will be populated in both the
   duts (though hardware/ASIC_DB programming will happen only on active ToR).

   Fix: It's always better to clear arp and fdb tables on both the ToRs before and
   after the testcase by updating "arp_cache_fdb_cleanup" fixture to iterate over
   "duthosts" for cleanup.

Signed-off-by: Vinod <vkjammala@arista.com>
Co-authored-by: vkjammala-arista <152394203+vkjammala-arista@users.noreply.github.com>
ytzur1 pushed a commit to ytzur1/sonic-mgmt that referenced this pull request Feb 2, 2026
1) Use "enum_rand_one_per_hwsku_frontend_hostname" for "test_ipv6_nd_incomplete"
   to select the duthost (as being used by other config fixtures like "config_facts",
   "ip_and_intf_info" etc). [Test was updated recently in PR20780 to use
   "enum_rand_one_per_hwsku_frontend_hostname" fixture but it seems mistakenly
   reverted by sonic-net#20932]

2) Most of testcases are sending protocol packets (arp, ipv6_echo requests etc) from
   server towards ToR device. On dualtor, these protocol packets will be processed by
   both the ToR duts and thus NEIGH_TABLE in APPL_DB will be populated in both the
   duts (though hardware/ASIC_DB programming will happen only on active ToR).

   Fix: It's always better to clear arp and fdb tables on both the ToRs before and
   after the testcase by updating "arp_cache_fdb_cleanup" fixture to iterate over
   "duthosts" for cleanup.

Signed-off-by: Vinod <vkjammala@arista.com>
Signed-off-by: Yael Tzur <ytzur@nvidia.com>
abhishek-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Feb 6, 2026
1) Use "enum_rand_one_per_hwsku_frontend_hostname" for "test_ipv6_nd_incomplete"
   to select the duthost (as being used by other config fixtures like "config_facts",
   "ip_and_intf_info" etc). [Test was updated recently in PR20780 to use
   "enum_rand_one_per_hwsku_frontend_hostname" fixture but it seems mistakenly
   reverted by sonic-net#20932]

2) Most of testcases are sending protocol packets (arp, ipv6_echo requests etc) from
   server towards ToR device. On dualtor, these protocol packets will be processed by
   both the ToR duts and thus NEIGH_TABLE in APPL_DB will be populated in both the
   duts (though hardware/ASIC_DB programming will happen only on active ToR).

   Fix: It's always better to clear arp and fdb tables on both the ToRs before and
   after the testcase by updating "arp_cache_fdb_cleanup" fixture to iterate over
   "duthosts" for cleanup.

Signed-off-by: Vinod <vkjammala@arista.com>
lakshmi-nexthop pushed a commit to lakshmi-nexthop/sonic-mgmt that referenced this pull request Feb 11, 2026
…) (sonic-net#22066)

1) Use "enum_rand_one_per_hwsku_frontend_hostname" for "test_ipv6_nd_incomplete"
   to select the duthost (as being used by other config fixtures like "config_facts",
   "ip_and_intf_info" etc). [Test was updated recently in PR20780 to use
   "enum_rand_one_per_hwsku_frontend_hostname" fixture but it seems mistakenly
   reverted by sonic-net#20932]

2) Most of testcases are sending protocol packets (arp, ipv6_echo requests etc) from
   server towards ToR device. On dualtor, these protocol packets will be processed by
   both the ToR duts and thus NEIGH_TABLE in APPL_DB will be populated in both the
   duts (though hardware/ASIC_DB programming will happen only on active ToR).

   Fix: It's always better to clear arp and fdb tables on both the ToRs before and
   after the testcase by updating "arp_cache_fdb_cleanup" fixture to iterate over
   "duthosts" for cleanup.

Signed-off-by: Vinod <vkjammala@arista.com>
Co-authored-by: vkjammala-arista <152394203+vkjammala-arista@users.noreply.github.com>
Signed-off-by: Lakshmi Yarramaneni <lakshmi@nexthop.ai>
nnelluri-cisco pushed a commit to nnelluri-cisco/sonic-mgmt that referenced this pull request Feb 12, 2026
1) Use "enum_rand_one_per_hwsku_frontend_hostname" for "test_ipv6_nd_incomplete"
   to select the duthost (as being used by other config fixtures like "config_facts",
   "ip_and_intf_info" etc). [Test was updated recently in PR20780 to use
   "enum_rand_one_per_hwsku_frontend_hostname" fixture but it seems mistakenly
   reverted by sonic-net#20932]

2) Most of testcases are sending protocol packets (arp, ipv6_echo requests etc) from
   server towards ToR device. On dualtor, these protocol packets will be processed by
   both the ToR duts and thus NEIGH_TABLE in APPL_DB will be populated in both the
   duts (though hardware/ASIC_DB programming will happen only on active ToR).

   Fix: It's always better to clear arp and fdb tables on both the ToRs before and
   after the testcase by updating "arp_cache_fdb_cleanup" fixture to iterate over
   "duthosts" for cleanup.

Signed-off-by: Vinod <vkjammala@arista.com>
Signed-off-by: nnelluri-cisco <nnelluri@cisco.com>
rraghav-cisco pushed a commit to rraghav-cisco/sonic-mgmt that referenced this pull request Feb 13, 2026
1) Use "enum_rand_one_per_hwsku_frontend_hostname" for "test_ipv6_nd_incomplete"
   to select the duthost (as being used by other config fixtures like "config_facts",
   "ip_and_intf_info" etc). [Test was updated recently in PR20780 to use
   "enum_rand_one_per_hwsku_frontend_hostname" fixture but it seems mistakenly
   reverted by sonic-net#20932]

2) Most of testcases are sending protocol packets (arp, ipv6_echo requests etc) from
   server towards ToR device. On dualtor, these protocol packets will be processed by
   both the ToR duts and thus NEIGH_TABLE in APPL_DB will be populated in both the
   duts (though hardware/ASIC_DB programming will happen only on active ToR).

   Fix: It's always better to clear arp and fdb tables on both the ToRs before and
   after the testcase by updating "arp_cache_fdb_cleanup" fixture to iterate over
   "duthosts" for cleanup.

Signed-off-by: Vinod <vkjammala@arista.com>
Signed-off-by: Raghavendran Ramanathan <rraghav@cisco.com>
rraghav-cisco pushed a commit to rraghav-cisco/sonic-mgmt that referenced this pull request Feb 18, 2026
1) Use "enum_rand_one_per_hwsku_frontend_hostname" for "test_ipv6_nd_incomplete"
   to select the duthost (as being used by other config fixtures like "config_facts",
   "ip_and_intf_info" etc). [Test was updated recently in PR20780 to use
   "enum_rand_one_per_hwsku_frontend_hostname" fixture but it seems mistakenly
   reverted by sonic-net#20932]

2) Most of testcases are sending protocol packets (arp, ipv6_echo requests etc) from
   server towards ToR device. On dualtor, these protocol packets will be processed by
   both the ToR duts and thus NEIGH_TABLE in APPL_DB will be populated in both the
   duts (though hardware/ASIC_DB programming will happen only on active ToR).

   Fix: It's always better to clear arp and fdb tables on both the ToRs before and
   after the testcase by updating "arp_cache_fdb_cleanup" fixture to iterate over
   "duthosts" for cleanup.

Signed-off-by: Vinod <vkjammala@arista.com>
Signed-off-by: Raghavendran Ramanathan <rraghav@cisco.com>
anilal-amd pushed a commit to anilal-amd/anilal-forked-sonic-mgmt that referenced this pull request Feb 19, 2026
1) Use "enum_rand_one_per_hwsku_frontend_hostname" for "test_ipv6_nd_incomplete"
   to select the duthost (as being used by other config fixtures like "config_facts",
   "ip_and_intf_info" etc). [Test was updated recently in PR20780 to use
   "enum_rand_one_per_hwsku_frontend_hostname" fixture but it seems mistakenly
   reverted by sonic-net#20932]

2) Most of testcases are sending protocol packets (arp, ipv6_echo requests etc) from
   server towards ToR device. On dualtor, these protocol packets will be processed by
   both the ToR duts and thus NEIGH_TABLE in APPL_DB will be populated in both the
   duts (though hardware/ASIC_DB programming will happen only on active ToR).

   Fix: It's always better to clear arp and fdb tables on both the ToRs before and
   after the testcase by updating "arp_cache_fdb_cleanup" fixture to iterate over
   "duthosts" for cleanup.

Signed-off-by: Vinod <vkjammala@arista.com>
Signed-off-by: Zhuohui Tan <zhuohui.tan@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants