manually cherry-pick from public master branch#20
Merged
prabhataravind merged 14 commits intoAzure:202506from Oct 1, 2025
Merged
manually cherry-pick from public master branch#20prabhataravind merged 14 commits intoAzure:202506from
prabhataravind merged 14 commits intoAzure:202506from
Conversation
### why pr checker failed due to common-libs are packaging bookworm debs instead of bullseye. ### what this PR does change bullseye to bookworm
### why producer_state_table_bridge_check_dup test failed. Suspect swss-common behaviour changed. ### what this PR does Make the check more strict to make sure no entries are received from consumer.
Translate "Unspecified" DesiredHaState from dash ha scope config to "standby" in DashHaScopeTable
Implements route exchange feature specified in the wiki. For the detail behaviour, see https://github.com/sonic-net/sonic-dash-ha/wiki/SWBus-(Switch-Bus)#route-exchange-working-theory
Update dpu scope state table name to match https://github.com/sonic-net/sonic-swss-common/blob/e7ee75dfcd44de934d49aea43c991a6aa20db63b/common/schema.h#L557
### why actor has the retry logic in outgoing state. If a message is not acked, it will resend the message to make sure receiver has received it successfully. When an actor is terminated, the retry will be terminated as well so it can't guarantee the receiver getting the message. ### what this PR does Introduce mark-delete concept. 1. when an actor is going to terminate, add "mark_deleted" flag to the driver of the actor. 2. In the run loop of the actor, which is triggered each time it receives a message, it will check if the actor is ready_for_delete. 3. ready_for_delete checks if there is unacked message in outgoing state. Only exits from the run loop when there is non 4. When an actor is in mark_deleted state, stop processing incoming requests but always replies OK. So 2 mark_deleted actors won't form a dead loop. 5. response is processed normally so unacked messages can be ACKed. 6. management_request is processed normally so we can still dump actor state using swbus-cli
### why
show hamgrd actor command is broken after route_exchange PR is merged.
In log we can see this below error
Sep 5 12:11:25 ott-ss-010 swbusd: 2025-09-05T16:11:25.820184Z ERROR
ConnWorker{conn_id="swbs-from://127.0.0.1:39642"}: 96: Failed to process
the incoming message: Input:InvalidArgs - Invalid management request:
ManagementRequest { request: HamgrdGetActorState, arguments: [] }
This is because swbusd incorrectly intercepting all ManagementRequest.
### What this PR does
1. check if the ManagementRequest has swbusd's service path as
destination. If not, route the message
2. fix some misc issues exposed after above code change.
3. use init_logger_for_test from Logger and remove the proprietary
implementation.
local_ha_state was being set to ha_role, update so it is correctly set to the DPU's ha_state. This fixes issue #91
### why This addresses issue #100. When upstream deletes the DB entry that is the originator of the actor, the actor should cleanup all the db entries it has created before terminating itself. For example, deleting DashHaSetConfig entry should triggers the cleanup actor in the corresponding HaSetActor, which includes removing DASH_HA_SET_TABLE it creates in DPU_APPL_DB and VNET_ROUTE_TUNNEL_TABLE in APPL_DB. ### what this PR does 1. Implements cleanup for all the actors. - DpuActor: remove entries in DPU_APPL_DB/BFD_SESSION_TABLE - VDpuActor: unregister from DpuActor - HASetActor: remove entry from DPU_APPL_DB/DASH_HA_SET_TABLE, remove entry from APPL_DB/VNET_ROUTE_TUNNEL_TABLE, unregistered from VDpuActor - HAScopeActor: remove entry from DPU_APPL_DB/DASH_HA_SCOPE_TABLE, remove entry from STATE_DB/DASH_HA_SCOPE_STATE and unregister from VDpuActor and HaSetActor 3. Extend ChkDb macro to check a db entry doesn't exist 4. Extend Internal state with deleting an entry from db
### why vnet_tunnel_route_table needs to be updated via ProducerStateTable to properly trigger orchagent handlers ### what this PR does move the table from internal state to outgoing state via producer bridge
Adding local_nexthop_ip so correct nexthop IP is used as endpoint in VNET_ROUTE_TUNNEL_TABLE when DPU is local.
### why when actor terminates itself, the handler in SwbusEdgeRuntime is not removed. When a new actor is spawned with the same service path, it will be ignored because the ActorCreator replies on "NoRoute" to discover new actor. ### what this PR does when ActorDriver exits from run loop, the SimpleSwbusEdgeClient it owns will be destructed. From the destructor, handler will be removed. This addresses issue #111
### why currently Incoming::get returns error if the entry is not found and caller typically propagates the error further. Sometimes it is normal that an entry doesn't exist. It needs to be treated differently from message decode error. ### what this PR does Incoming::get returns Option. Caller needs to handle the None return accordingly, which means the entry is not found.
### why currently the deserializer for dash_bfd_probe_state has strict requirements on the format of the fields. If it doesn't follow the format, it will reject it. Specifically, the timestamp field is enclosed in double-quotes, which caused parsing error. ### what this PR does make the deserializer more forgiven with format. If the value has double or single quotes or whitespaces, remove them first. If the value of v4_bfd_up_sessions or v6_bfd_up_sessions has quotes or space between comma, remove them.
zjswhhh
pushed a commit
to zjswhhh/sonic-dash-ha.msft
that referenced
this pull request
Nov 19, 2025
- Swbusd is up with an initial routes loaded from yaml - swbusd can be interconnected and routing is working - swbuscli is implemented for troubleshooting - ping in swbuscli to remote swbusd through local swbusd is working - show route in swbuscli displays route table in connected swbusd TODOs: - Client reconnect to remote swbusd is implemented but without source port randomization in each retry - Change logging from println to trace - Add unit tests - Implement route update through route queries --------- Co-authored-by: r12f <r12f.code@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
sign-off: Jing Zhang zhangjing@microsoft.com