Skip to content

Comments

[msft-202412] Enhance test_verify_fec_histogram logic to handle a stable link with stale transient FEC symbol errors#1014

Open
kewei-arista wants to merge 1 commit intoAzure:202412from
kewei-arista:pr-sonic.6_msft_202412
Open

[msft-202412] Enhance test_verify_fec_histogram logic to handle a stable link with stale transient FEC symbol errors#1014
kewei-arista wants to merge 1 commit intoAzure:202412from
kewei-arista:pr-sonic.6_msft_202412

Conversation

@kewei-arista
Copy link

Cherry pick sonic-net/sonic-mgmt#21685 to msft-202412.

Description of PR

platform_tests/test_intf_fec.py test_verify_fec_histogram checks no critical FEC bins for a link and may fail the test if there are some errors with the link. However, this test doesn't consider the transient symbol errors accumulated during interface state transition (or other transient state) and thus it can also fail for a stable link with stale errors during the testing time.

For example, it may fail with following BIN output:

(Pdb) intf_name
'Ethernet280'
(Pdb) fec_hist
[{'symbol errors per codeword': 'BIN0', 'codewords': '99235826365'},
{'symbol errors per codeword': 'BIN1', 'codewords': '406154'}, {'symbol
errors per codeword': 'BIN2', 'codewords': '1781'}, {'symbol errors per
codeword': 'BIN3', 'codewords': '0'}, {'symbol errors per codeword':
'BIN4', 'codewords': '0'}, {'symbol errors per codeword': 'BIN5',
'codewords': '0'}, {'symbol errors per codeword': 'BIN6', 'codewords':
'0'}, {'symbol errors per codeword': 'BIN7', 'codewords': '0'}, {'symbol
errors per codeword': 'BIN8', 'codewords': '0'}, {'symbol errors per
codeword': 'BIN9', 'codewords': '0'}, {'symbol errors per codeword':
'BIN10', 'codewords': '0'}, {'symbol errors per codeword': 'BIN11',
'codewords': '0'}, {'symbol errors per codeword': 'BIN12', 'codewords':
'0'}, {'symbol errors per codeword': 'BIN13', 'codewords': '0'},
{'symbol errors per codeword': 'BIN14', 'codewords': '0'}, {'symbol
errors per codeword': 'BIN15', 'codewords': '1'}]

Note that there's only 1 BIN15 symbol error and it's not incrementing for a long time, so this link is stable. However, the test can still fail in this case.

This change is trying to enhance the test case to handle these stale errors by checking whether a test interface is susceptible to this issue with the 1st snapshot of fec histogram. If so, it will extend waiting time to 10 minutes for each loop and that's 20 minutes in total and make sure no critical BINs will ever increment during this 20 min period. The rational is that it's very likely to see the changes in these BINs in this long window if the link is not stable. Usually we can see the changes in every seconds for a marginal link.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • msft-202412
  • msft-202503

Approach

What is the motivation for this PR?

Improve the pass rate by handling these corner cases

How did you do it?

Wait for enough long time to make sure the links are actually stable

How did you verify/test it?

Confirmed the test now can pass with a stable link but transient FEC symbol errors

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

…stale transient FEC symbol errors

Signed-off-by: kewei <kewei@arista.com>
@r12f
Copy link
Contributor

r12f commented Feb 16, 2026

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants