Skip to content

Commit

Permalink
Fix deadlock caused by STP violation
Browse files Browse the repository at this point in the history
  • Loading branch information
petervdonovan committed Jan 13, 2024
1 parent a961d9c commit 9a797bb
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion core/reactor_common.c
Original file line number Diff line number Diff line change
Expand Up @@ -401,7 +401,9 @@ void _lf_pop_events(environment_t *env) {
// the MLAA could get stuck, causing the program to lock up.
// This should not call update_last_known_status_on_input_port because we
// are starting a new tag step execution, so there are no reactions blocked on this input.
event->trigger->last_known_status_tag = env->current_tag;
if (lf_tag_compare(env->current_tag, event->trigger->last_known_status_tag) > 0) {
event->trigger->last_known_status_tag = env->current_tag;
}
}
}
#endif
Expand Down

1 comment on commit 9a797bb

@edwardalee
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I would like to add a comment on why this commit is necessary to prevent the deadlock we were observing. The situation that led to the deadlock is when a federate is being bombarded with tardy messages. It is using decentralized coordination, so it advances its tag based on the local physical clock. It then receives a message that is "in the past" relative to its current tag. Each time it receives a message, it updates the last_known_status_tag field of the action that handles the message. Here, it is also updating this field when it pops events off the event queue that point to this trigger. The problem is that if more than one tardy messages gets pushed onto the event queue before previous ones have been popped, before this fix, the last_known_status_tag field would move backwards in time.

As far as I know, the last_known_status_tag field is only used in federated execution, and since it gets set when messages arrive, it probably doesn't need to be set at all in here, in _lf_pop_events. It may not actually be possible to use this field reliably in general because it won't tell you much with multiports, where one channel could lag behind another. In federated, this is not a problem because multiports get separated into individual actions.

Please sign in to comment.