Skip to content

Comments

Add pop batch size support for ZMQ Consumer#1084

Merged
qiluo-msft merged 4 commits intosonic-net:masterfrom
vivekrnv:pop_batch_zmq
Oct 8, 2025
Merged

Add pop batch size support for ZMQ Consumer#1084
qiluo-msft merged 4 commits intosonic-net:masterfrom
vivekrnv:pop_batch_zmq

Conversation

@vivekrnv
Copy link
Contributor

What i did

Add pop batch size support to ZmqConsumerState Table to optimize memory and increase the speed for updating CRM counters/DASH Feedback when applying dash configuration at scale

Example:

Let's say we have a GNMI server which pushed X entries to orchagent. Current logic of ZmqConsumerState table would move X entries to m_toSync map.

Dashorch would create X entries in bulker. However, max_bulk size is often limited (currently 1000) And definitely much less than the size of m_toSync in this scale scenario.

So, effective memory during this time is 2 * X (1 copy in m_toSync + 1 copy in bulker)* size per object until all those entries are applied to ASIC.

  1. With this change, only pop batch size entries are popped out to m_toSync and added to bulker. Thus peak memory utilization is cut in half in case of Dash Scale.

  2. Another side effect of this change is the postprocessing for pop batch size items is done immediately in orchagent and there is no delay on updating CRM or GNMI Feedback loop. If not, post processing starts only after all the entries in m_toSync are applied to syncd which is not capped for current design

How i verified

UT and applying DASH config and making sure everything works

Before the update:

[ RUN      ] ZmqConsumerStateTablePopSize.test
Consumer thread started
Entering select
Producer sent 150 elements
pops: 150
Consumer thread joined
tests/zmq_state_ut.cpp:636: Failure
Expected equality of these values:
 popCount
   Which is: 1
 4
popCount: 1, expected: 4
tests/zmq_state_ut.cpp:639: Failure
Expected equality of these values:
 recvdSizes[i]
   Which is: 150
 expectedSizes[i]
   Which is: 40
recvdSizes[0]: 150, expected: 40
[  FAILED ] ZmqConsumerStateTablePopSize.test (16017 ms)

After the update:

[----------] 1 test from ZmqConsumerStateTablePopSize
[ RUN      ] ZmqConsumerStateTablePopSize.test
Consumer thread started
Entering select
Producer sent 150 elements
pops: 40
Entering select
pops: 40
Entering select
pops: 40
Entering select
pops: 30
Consumer thread joined
[       OK ] ZmqConsumerStateTablePopSize.test (2008 ms)
[----------] 1 test from ZmqConsumerStateTablePopSize (2008 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (2012 ms total)
[  PASSED  ] 1 test.

Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@prsunny prsunny requested a review from qiluo-msft October 2, 2025 17:57
@prsunny
Copy link
Contributor

prsunny commented Oct 2, 2025

@qiluo-msft , please review/merge.

mssonicbld added a commit to mssonicbld/sonic-buildimage-msft that referenced this pull request Oct 2, 2025
<!--
     Please make sure you've read and understood our contributing guidelines:
     https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

     ** Make sure all your commits include a signature generated with `git commit -s` **

     If this is a bug fix, make sure your description includes "fixes #xxxx", or
     "closes #xxxx" or "resolves #xxxx"

     Please provide the following information:
-->

#### Why I did it

Increase the pop batch size and Max Bulker limit to 65536 to speed up applying the high volume Dash configuration

Depends on
sonic-net/sonic-sairedis#1660
sonic-net/sonic-swss-common#1084
sonic-net/sonic-swss#3910

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it

#### How to verify it

```
root@sonic:/home/admin# ps -aux | grep orch
root       11118  1.5  0.4 464804 267368 pts/0   Sl   02:50   0:00 /usr/bin/orchagent -d /var/log/swss -b 65536 -z zmq_sync -k 65536 -m B0:CF:0E:20:8E:DE -q tcp://eth0-midplane

2025 Sep 30 18:48:38.911835 sonic NOTICE swss#orchagent: :- main: Setting maximum bulk size in bulk mode as 65536
```

Apply Scale config and verify

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 202205
- [ ] 202211
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld added a commit to Azure/sonic-buildimage-msft that referenced this pull request Oct 3, 2025
…1691)

<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "fixes #xxxx", or
 "closes #xxxx" or "resolves #xxxx"

 Please provide the following information:
-->

#### Why I did it

Increase the pop batch size and Max Bulker limit to 65536 to speed up applying the high volume Dash configuration

Depends on
sonic-net/sonic-sairedis#1660
sonic-net/sonic-swss-common#1084
sonic-net/sonic-swss#3910

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it

#### How to verify it

```
root@sonic:/home/admin# ps -aux | grep orch
root 11118 1.5 0.4 464804 267368 pts/0 Sl 02:50 0:00 /usr/bin/orchagent -d /var/log/swss -b 65536 -z zmq_sync -k 65536 -m B0:CF:0E:20:8E:DE -q tcp://eth0-midplane

2025 Sep 30 18:48:38.911835 sonic NOTICE swss#orchagent: :- main: Setting maximum bulk size in bulk mode as 65536
```

Apply Scale config and verify

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 202205
- [ ] 202211
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vivekrnv
Copy link
Contributor Author

vivekrnv commented Oct 8, 2025

@qiluo-msft, handled all comments. Please help signoff

@qiluo-msft qiluo-msft merged commit c253917 into sonic-net:master Oct 8, 2025
18 checks passed
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to msft-202506: Azure/sonic-swss-common.msft#67

prsunny pushed a commit to sonic-net/sonic-swss that referenced this pull request Oct 9, 2025
* [ZmqOrch] Optimize memory by popping batch size at a time
What I did

Used a reference instead of a unnecessary copy of a set object
Optimize memory by popping batch size at a time
NOTE: Please merge only after the below two PR's are merged

sonic-net/sonic-swss-common#1084
sonic-net/sonic-sairedis#1660
mssonicbld added a commit to mssonicbld/sonic-swss.msft that referenced this pull request Oct 9, 2025
<!--
Please make sure you have read and understood the contribution guildlines:
https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

1. Make sure your commit includes a signature generted with `git commit -s`
2. Make sure your commit title follows the correct format: [component]: description
3. Make sure your commit message contains enough details about the change and related tests
4. Make sure your pull request adds related reviewers, asignees, labels

Please also provide the following information in this pull request:
-->

**What I did**

1. Used a reference instead of a unnecessary copy of a set object
2. Optimize memory by popping batch size at a time

**NOTE: Please merge only after the below two PR's are merged**

sonic-net/sonic-swss-common#1084
sonic-net/sonic-sairedis#1660

**Why I did it**

To reduce peak memory usage when applying high-volume dash configuration

**How I verified it**

**Details if related**
mssonicbld added a commit to Azure/sonic-swss.msft that referenced this pull request Oct 9, 2025
<!--
Please make sure you have read and understood the contribution guildlines:
https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

1. Make sure your commit includes a signature generted with `git commit -s`
2. Make sure your commit title follows the correct format: [component]: description
3. Make sure your commit message contains enough details about the change and related tests
4. Make sure your pull request adds related reviewers, asignees, labels

Please also provide the following information in this pull request:
-->

**What I did**

1. Used a reference instead of a unnecessary copy of a set object
2. Optimize memory by popping batch size at a time

**NOTE: Please merge only after the below two PR's are merged**

sonic-net/sonic-swss-common#1084
sonic-net/sonic-sairedis#1660

**Why I did it**

To reduce peak memory usage when applying high-volume dash configuration

**How I verified it**

**Details if related**
balanokia pushed a commit to balanokia/sonic-swss that referenced this pull request Nov 17, 2025
…3910)

* [ZmqOrch] Optimize memory by popping batch size at a time
What I did

Used a reference instead of a unnecessary copy of a set object
Optimize memory by popping batch size at a time
NOTE: Please merge only after the below two PR's are merged

sonic-net/sonic-swss-common#1084
sonic-net/sonic-sairedis#1660
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202511: #1126

theasianpianist pushed a commit to theasianpianist/sonic-swss that referenced this pull request Feb 4, 2026
…3910)

* [ZmqOrch] Optimize memory by popping batch size at a time
What I did

Used a reference instead of a unnecessary copy of a set object
Optimize memory by popping batch size at a time
NOTE: Please merge only after the below two PR's are merged

sonic-net/sonic-swss-common#1084
sonic-net/sonic-sairedis#1660

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
baorliu pushed a commit to baorliu/sonic-swss that referenced this pull request Feb 23, 2026
…3910)

* [ZmqOrch] Optimize memory by popping batch size at a time
What I did

Used a reference instead of a unnecessary copy of a set object
Optimize memory by popping batch size at a time
NOTE: Please merge only after the below two PR's are merged

sonic-net/sonic-swss-common#1084
sonic-net/sonic-sairedis#1660

Signed-off-by: Baorong Liu <96146196+baorliu@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants