Skip to content

[action] [PR:21616] [sonic-mgmt] Fix flakiness of "arp/test_stress_arp.py"#22066

Merged
StormLiangMS merged 1 commit intosonic-net:202511from
mssonicbld:cherry/202511/21616
Jan 29, 2026
Merged

[action] [PR:21616] [sonic-mgmt] Fix flakiness of "arp/test_stress_arp.py"#22066
StormLiangMS merged 1 commit intosonic-net:202511from
mssonicbld:cherry/202511/21616

Conversation

@mssonicbld
Copy link
Collaborator

Description of PR

Summary: Fix flakiness of arp/test_stress_arp.py
Fixes # https://github.com/aristanetworks/sonic-qual.msft/issues/948

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
  • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

  1. arp/test_stress_arp.py::test_ipv6_nd_incomplete is failing during teardown with
 Match Messages:
E 2025 Nov 16 15:46:05.479379 bjw3-can-7260-13 ERR monit[644974]: 'dualtorNeighborCheck' status failed (1) -- no output

Based on the syslogs, we can see there are some stale arp entries in ASIC of standby dut which are populated due to previous testcase test_ipv4_arp which is sending bunch of arp response packets for different ip addresses in subnet 172.16.0.1/16

2025 Dec 3 16:08:01.555784 bjw3-can-7260-14 ERR dualtor_neighbor_check.py: NEIGHBOR MAC PORT MUX_STATE IN_MUX_TOGGLE NEIGHBOR_IN_ASIC TUNNEL_IN_ASIC HWSTATUS
2025 Dec 3 16:08:01.555903 bjw3-can-7260-14 ERR dualtor_neighbor_check.py: ------------- ----------------- --------- ----------- --------------- ------------------ ---------------- ------------
2025 Dec 3 16:08:01.555956 bjw3-can-7260-14 ERR dualtor_neighbor_check.py: 172.16.21.124 00:00:01:02:18:7e Ethernet0 standby no yes no inconsistent
2025 Dec 3 16:08:01.556004 bjw3-can-7260-14 ERR dualtor_neighbor_check.py: 172.16.21.125 00:00:01:02:18:7f Ethernet0 standby no yes no inconsistent
...
  1. Currently test_ipv6_nd_incomplete can pick different dut than it's config fixtures (due to different fixture usage for test and config fixtures like config_facts etc, which might result in unexpected behaviour

How did you do it?

  1. It's always better to clear arp and fdb tables on both the ToRs before and after the testcase by updating arp_cache_fdb_cleanup fixture to iterate over duthosts for cleanup.

  2. Use enum_rand_one_per_hwsku_frontend_hostname for test_ipv6_nd_incomplete to select the duthost (as being used by other config fixtures like config_facts, ip_and_intf_info etc). [Test was updated recently in PR20780 to use enum_rand_one_per_hwsku_frontend_hostname fixture but it seems mistakenly reverted by PR20932].

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

1) Use "enum_rand_one_per_hwsku_frontend_hostname" for "test_ipv6_nd_incomplete"
   to select the duthost (as being used by other config fixtures like "config_facts",
   "ip_and_intf_info" etc). [Test was updated recently in PR20780 to use
   "enum_rand_one_per_hwsku_frontend_hostname" fixture but it seems mistakenly
   reverted by sonic-net#20932]

2) Most of testcases are sending protocol packets (arp, ipv6_echo requests etc) from
   server towards ToR device. On dualtor, these protocol packets will be processed by
   both the ToR duts and thus NEIGH_TABLE in APPL_DB will be populated in both the
   duts (though hardware/ASIC_DB programming will happen only on active ToR).

   Fix: It's always better to clear arp and fdb tables on both the ToRs before and
   after the testcase by updating "arp_cache_fdb_cleanup" fixture to iterate over
   "duthosts" for cleanup.

Signed-off-by: Vinod <vkjammala@arista.com>
@mssonicbld
Copy link
Collaborator Author

Original PR: #21616

@mssonicbld
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@StormLiangMS StormLiangMS merged commit 9b48370 into sonic-net:202511 Jan 29, 2026
17 of 18 checks passed
lakshmi-nexthop pushed a commit to lakshmi-nexthop/sonic-mgmt that referenced this pull request Feb 11, 2026
…) (sonic-net#22066)

1) Use "enum_rand_one_per_hwsku_frontend_hostname" for "test_ipv6_nd_incomplete"
   to select the duthost (as being used by other config fixtures like "config_facts",
   "ip_and_intf_info" etc). [Test was updated recently in PR20780 to use
   "enum_rand_one_per_hwsku_frontend_hostname" fixture but it seems mistakenly
   reverted by sonic-net#20932]

2) Most of testcases are sending protocol packets (arp, ipv6_echo requests etc) from
   server towards ToR device. On dualtor, these protocol packets will be processed by
   both the ToR duts and thus NEIGH_TABLE in APPL_DB will be populated in both the
   duts (though hardware/ASIC_DB programming will happen only on active ToR).

   Fix: It's always better to clear arp and fdb tables on both the ToRs before and
   after the testcase by updating "arp_cache_fdb_cleanup" fixture to iterate over
   "duthosts" for cleanup.

Signed-off-by: Vinod <vkjammala@arista.com>
Co-authored-by: vkjammala-arista <152394203+vkjammala-arista@users.noreply.github.com>
Signed-off-by: Lakshmi Yarramaneni <lakshmi@nexthop.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants