Skip to content

Fix deadlock between syncd and orchagent syncd during initialization failure#1723

Merged
prsunny merged 8 commits intosonic-net:masterfrom
DavidZagury:master_deadlock_oa_syncd
Jan 23, 2026
Merged

Fix deadlock between syncd and orchagent syncd during initialization failure#1723
prsunny merged 8 commits intosonic-net:masterfrom
DavidZagury:master_deadlock_oa_syncd

Conversation

@DavidZagury
Copy link
Contributor

@DavidZagury DavidZagury commented Dec 10, 2025

When syncd requests a shutdown, orchagent may be blocked waiting for a response to an init view (or other NOTIFY) command. Since syncd stops processing commands while waiting for the shutdown response, orchagent never receives its response and cannot acknowledge the shutdown request - resulting in a deadlock.

This fix adds the selectable channel to the select loop during shutdown-wait mode and handles incoming commands appropriately:

  • NOTIFY commands receive a SAI_STATUS_FAILURE response to unblock the waiting orchagent
  • Other commands are logged and ignored

This prevents orchagent from hanging until timout when syncd is failing.

This fix - sonic-net/sonic-buildimage#24799

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@DavidZagury DavidZagury force-pushed the master_deadlock_oa_syncd branch from fccbb5b to 6859348 Compare December 15, 2025 13:42
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@volodymyrsamotiy
Copy link
Collaborator

@lolyu , could you please help to review?

@r12f
Copy link
Contributor

r12f commented Dec 15, 2025

Requested @prsunny to review. Looks to be a very nice (better) to have fix.

@r12f r12f requested a review from prsunny December 15, 2025 19:24
@DavidZagury DavidZagury changed the title Fix deadlock between syncd and orchagent syncd initialization failure Fix deadlock between syncd and orchagent syncd during initialization failure Dec 15, 2025
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@DavidZagury
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@DavidZagury
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@DavidZagury DavidZagury force-pushed the master_deadlock_oa_syncd branch from 479cb16 to c8b1c16 Compare January 14, 2026 14:41
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

When syncd requests a shutdown, orchagent may be blocked waiting for
a response to an init view (or other NOTIFY) command. Since syncd
stops processing commands while waiting for the shutdown response,
orchagent never receives its response and cannot acknowledge the
shutdown request - resulting in a deadlock.

This fix adds the selectable channel to the select loop during
shutdown-wait mode and handles incoming commands appropriately:
- NOTIFY commands receive a SAI_STATUS_FAILURE response to unblock
  the waiting orchagent
- Other commands are logged and ignored

This prevents orchagent from hanging until timeout when syncd is
failing.

Signed-off-by: david.zagury <davidza@nvidia.com>
Signed-off-by: david.zagury <davidza@nvidia.com>
Signed-off-by: david.zagury <davidza@nvidia.com>
Signed-off-by: david.zagury <davidza@nvidia.com>
Signed-off-by: david.zagury <davidza@nvidia.com>
Signed-off-by: david.zagury <davidza@nvidia.com>
Signed-off-by: david.zagury <davidza@nvidia.com>
@DavidZagury DavidZagury force-pushed the master_deadlock_oa_syncd branch from c8b1c16 to 0b72321 Compare January 15, 2026 07:38
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@DavidZagury
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@DavidZagury
Copy link
Contributor Author

@DavidZagury lgtm. Please address copilot comments or close them if not applicable.

@prabhataravind I addressed the copilot comments.

Copy link
Contributor

@lolyu lolyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

Cherry-pick PR to msft-202412: Azure/sonic-sairedis.msft#104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.