Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updatable loop occasionally fails to start #60

Closed
yassiezar opened this issue Jul 19, 2022 · 13 comments
Closed

Updatable loop occasionally fails to start #60

yassiezar opened this issue Jul 19, 2022 · 13 comments

Comments

@yassiezar
Copy link
Contributor

yassiezar commented Jul 19, 2022

Issue:

I have an updatable client behaviour defined which is activated within the state machine's initial state. However, I've noticed that the update loop occasionally fails to begin getting called and the update callback is never entered. This seems to be totally random and I'm not able to trigger it manually

Analysis:

I've taken a look at the update loop here and noticed that, at times, the client behaviour table is updated here before all the client behaviours have finished being added and initialised. Then, at the next loop when the CBs have actually been added, the table is not updated (since now lastState_ == currentState) and no update() functions can be executed. The log output below shows the behaviour I'm talking about where the CB table is updated to soon:

...
[ INFO] [1658248892.188056645]: [sm::StRest] Configuring orthogonal: sm::OrHttp
[ INFO] [1658248892.188219663]: [PollOnce] current state: 2
[ INFO] [1658248892.188328821]: [PollOnce] last state: 0
[ INFO] [1658248892.188391033]: [PollOnce] detected new state, refreshing updatable client behavior table
[ INFO] [1658248892.188716652]: updatable state elements: 0
[ INFO] [1658248892.188908727]: [Orthogonal sm::OrHttp] adding client behavior: sm::CbRobotStatusUpdate
[ INFO] [1658248892.189014429]: [sm::StRest] Creating static client behavior: sm::CbAthenaTeleopActivate
[ INFO] [1658248892.189069920]: [sm::StRest] Configuring orthogonal: sm::OrAthena
[ INFO] [1658248892.189136324]: [Orthogonal fox_rt::OrAthena] adding client behavior: fox_rt::CbAthenaTeleopActivate
[ INFO] [1658248892.189194567]: [sm::StRest] ---- END STATIC DESCRIPTION
[ INFO] [1658248892.189241322]: [sm::StRest] State runtime configuration
[ INFO] [1658248892.189299437]: [Orthogonal sm::OrAthena] runtimeConfigure, current Behavior: sm::CbAthenaTeleopActivate
[ INFO] [1658248892.189358129]: [Orthogonal sm::OrHttp] runtimeConfigure, current Behavior: sm::CbRobotStatusUpdate
[ INFO] [1658248892.189444054]: [Orthogonal sm::OrTeleop] runtimeConfigure, current Behavior: sm::CbTeleopActivate
[ WARN] [1658248892.189534459]: [StateMachine] setting state active : sm::MsAutonomousOperationMode/sm::StRest
[ INFO] [1658248892.189664259]: [sm::StRest] State OnEntry
[ WARN] [1658248892.189789194]: Required client [sm::ClTeleopSafeStopService] not found in current orthogonal. Searching in other orthogonals.
[ WARN] [1658248892.189890032]: Required client [sm::ClTeleopSafeStopService] found in other orthogonal.
[ WARN] [1658248892.189974741]: Brakes already active
[ INFO] [1658248892.190018167]: In rest state
[ INFO] [1658248892.190064186]: [sm::StRest] State OnEntry code finished
[ INFO] [1658248892.190115645]: [Orthogonal sm::OrAthena] OnEntry, current Behavior: sm::CbAthenaTeleopActivate
[ INFO] [1658248892.190658684]: [StateMachine] life-time constrained smacc signal subscription created. Subscriber is sm::CbAthenaTeleopActivate
[ INFO] [1658248892.190730497]: [Orthogonal sm::OrFollowMe] OnEntry
[ INFO] [1658248892.190783524]: [Orthogonal sm::OrFrontZedCamera] OnEntry
[ INFO] [1658248892.190879113]: [Orthogonal sm::OrGpio] OnEntry
[ INFO] [1658248892.190928492]: [Orthogonal sm::OrHttp] OnEntry, current Behavior: sm::CbRobotStatusUpdate
[ INFO] [1658248892.191234302]: [StateMachine] life-time constrained smacc signal subscription created. Subscriber is sm::CbRobotStatusUpdate
[ INFO] [1658248892.191296961]: [Orthogonal sm::OrMapping] OnEntry
[ INFO] [1658248892.191341444]: [Orthogonal sm::OrNavigation] OnEntry
[ INFO] [1658248892.191384166]: [Orthogonal sm::OrRearZedCamera] OnEntry
[ INFO] [1658248892.191426505]: [Orthogonal sm::OrRfid] OnEntry
[ INFO] [1658248892.191468683]: [Orthogonal sm::OrScales] OnEntry
[ INFO] [1658248892.191517230]: [Orthogonal sm::OrTeleop] OnEntry, current Behavior: fox_rt::CbTeleopActivate
[ INFO] [1658248892.192057838]: [StateMachine] life-time constrained smacc signal subscription created. Subscriber is sm::CbTeleopActivate
[ INFO] [1658248892.192154355]: [Orthogonal sm::OrTimer] OnEntry
[ WARN] [1658248892.192233112]: [StateMachine] setting state active : sm::MsAutonomousOperationMode/sm::StRest
[ INFO] [1658248892.238259719]: [PollOnce] current state: 2
[ INFO] [1658248892.238412271]: [PollOnce] last state: 2
[ INFO] [1658248892.238478035]: updatable state elements: 0
[ INFO] [1658248892.288266301]: [PollOnce] current state: 2
...

Possible fix:

I'm not sure how to approach this. My guess is that the checks here should prevent this behaviour, but its not doing so reliably (at least in this case). The workaround we're using is to start a timer at the CB's initialisation which executes a state transition back into the state to recreate everything, if the timer isn't stopped within an allotted time.

┆Issue is synchronized with this Jira Task by Unito

@yassiezar
Copy link
Contributor Author

Looking at it a bit more, I noticed that the ModeState static configuration loop gets executed, along with its runtime configure options, before moving onto the inner initial state's static, runtime and onEntry functions. Is it possible that during the process of executing the outer state's configuration, the guards preventing the CBs from being added are momentarily disabled, allowing the CB list to be prematurely updated?

@pabloinigoblasco
Copy link
Collaborator

pabloinigoblasco commented Jul 22, 2022

You are right.

Precisely I fixed that lately in SMACC2.
The update call could be called before OnEntry is finished.

Have a look with this line of code of the run_async branch of SMACC2:
https://github.com/robosoft-ai/SMACC2/blob/89be6c3ba61515a8718932f5a2b87217db125ee4/smacc2/src/smacc2/signal_detector.cpp#L269

does that solution look like it fixed your problem?

We have to backport that change to SMACC1

@yassiezar
Copy link
Contributor Author

@pabloinigoblasco Funnily enough, I tried adding an extra flag, but didn't see the one you linked. This looks like it might work though, I'll apply it on our robot and see if the behaviour is fixed.

@pabloinigoblasco
Copy link
Collaborator

The case is that that flag is only in SMACC2 because some lately issues, so we also need to add it to SMACC1. It should be fast, I will do that asap.

@yassiezar
Copy link
Contributor Author

Sounds good! Feel free to close the issue once the changes are merged in and I'll update our code

@yassiezar
Copy link
Contributor Author

Fixed in #61

@yassiezar
Copy link
Contributor Author

yassiezar commented Feb 16, 2023

I'm afraid the guard @pabloinigoblasco added in #61 doesn't seem to be working as expected and this issue is still popping up intermittently (see the log output below)

[ WARN] [1676540771.882932016]: [fox_rt::MsAutonomousOperationMode] creating 
[ INFO] [1676540771.883732753]: [fox_rt::MsAutonomousOperationMode] Creating ros NodeHandle for this state: SmFox
[ INFO] [1676540771.883941626]: [fox_rt::MsAutonomousOperationMode] State object created. Initializating...
[ INFO] [1676540771.886767758]: [fox_rt::MsAutonomousOperationMode] -- STATIC STATE DESCRIPTION --
[ INFO] [1676540771.886944821]: [fox_rt::MsAutonomousOperationMode] ---- END STATIC DESCRIPTION
[ INFO] [1676540771.887019672]: [fox_rt::MsAutonomousOperationMode] State runtime configuration
[ WARN] [1676540771.887096315]: [StateMachine] setting state active : fox_rt::MsAutonomousOperationMode
[ INFO] [1676540771.887170718]: [fox_rt::MsAutonomousOperationMode] State OnEntry
[ INFO] [1676540771.887230657]: [fox_rt::MsAutonomousOperationMode] State OnEntry code finished
[ INFO] [1676540771.887287619]: [Orthogonal fox_rt::OrAsync] OnEntry
[ INFO] [1676540771.887373223]: [Orthogonal fox_rt::OrAthena] OnEntry
[ INFO] [1676540771.887456778]: [Orthogonal fox_rt::OrGpio] OnEntry
[ INFO] [1676540771.887506220]: [Orthogonal fox_rt::OrHttp] OnEntry
[ INFO] [1676540771.887562254]: [Orthogonal fox_rt::OrMapping] OnEntry
[ INFO] [1676540771.887640689]: [Orthogonal fox_rt::OrMotors] OnEntry
[ INFO] [1676540771.887714773]: [Orthogonal fox_rt::OrNavigation] OnEntry
[ INFO] [1676540771.887789816]: [Orthogonal fox_rt::OrSensors] OnEntry
[ INFO] [1676540771.887853050]: [Orthogonal fox_rt::OrTeleop] OnEntry
[ INFO] [1676540771.887901404]: [Orthogonal fox_rt::OrTimer] OnEntry
[ INFO] [1676540771.887957663]: [Orthogonal fox_rt::OrZedCamera] OnEntry
[ WARN] [1676540771.888007009]: [StateMachine] setting state active : fox_rt::MsAutonomousOperationMode
[ WARN] [1676540771.888085060]: [fox_rt::StRest] creating 
[ INFO] [1676540771.888906213]: [fox_rt::StRest] State object created. Initializating...
[ INFO] [1676540771.891562802]: [fox_rt::StRest] -- STATIC STATE DESCRIPTION --
[ INFO] [1676540771.891987748]: [fox_rt::StRest] Creating static client behavior: fox_rt::CbTeleopActivate
    [ INFO] [1676540771.892091752]: [fox_rt::StRest] Configuring orthogonal: fox_rt::OrTeleop
    [ INFO] [1676540771.995668611]: [PollOnce] detected new state, refreshing updatable client behavior table
    [ INFO] [1676540771.996076724]: updatable state elements: 0
    [ INFO] [1676540771.996353055]: [Orthogonal fox_rt::OrTeleop] adding client behavior: fox_rt::CbTeleopActivate
    [ INFO] [1676540771.996601674]: [fox_rt::StRest] Creating static client behavior: fox_rt::CbRobotStatusUpdate
    [ INFO] [1676540771.996686957]: [fox_rt::StRest] Configuring orthogonal: fox_rt::OrHttp
    [ INFO] [1676540771.996792529]: [Orthogonal fox_rt::OrHttp] adding client behavior: fox_rt::CbRobotStatusUpdate
[ INFO] [1676540771.996903254]: [fox_rt::StRest] Creating static client behavior: fox_rt::CbAthenaTeleopActivate
[ INFO] [1676540771.997003642]: [fox_rt::StRest] Configuring orthogonal: fox_rt::OrAthena
[ INFO] [1676540771.997152576]: [Orthogonal fox_rt::OrAthena] adding client behavior: fox_rt::CbAthenaTeleopActivate

Note the indented lines - CbRobotStateUpdate is an updatable CB and the number of updatable state elements should be 1, not 0 . It looks like the client behaviours are being added to the orthogonals after the CB table is refreshed in line 223 in the signal_detector.cpp. I guess there's a race condition somewhere where the StateMachineInternalAction is being set by the orthogonal congiurator and where its being read in the signal_detector.cpp guards on line 200 preventing the CB table from being filled out too soon

In the case above, we're using a modestate with its inital internal state being the one transitioned into. Could it be that one of the StateMachineInternalAction guards are changing during this 'transition' from the outer modestate to the inner state, which will let the signal_detector prematurely populate the CB table before the 'actual' state is properly initialised and configured?

@yassiezar yassiezar reopened this Feb 16, 2023
@pabloinigoblasco
Copy link
Collaborator

Hello @yassierzar,

How often does the issue occur in your example?

The use of ModeState and Updateable Behaviors might be some untested use case. I suspect that the issue occurs when a child state begins where the stateMachine action switch from STEADY (mode state) TO CONFIGURING (child state). I've created a PR #87 to address this problem. Can you try the patch and let me know if it resolves the issue?

Thanks,

@yassiezar
Copy link
Contributor Author

yassiezar commented Feb 17, 2023

Its quite unpredictable, but happens often enough - at times it works fine for 10 restarts, but other times the bug comes up for the next 4 restarts.

I suspect that the issue occurs when a child state begins where the stateMachine action switch from STEADY (mode state) TO CONFIGURING (child state)

That's exactly what I was thinking too. Thanks for addressing this so quickly, I'll try your patch today and let you know.

@yassiezar
Copy link
Contributor Author

Unfortunately it seems your patch didn't quite do it:

[ INFO] [1676654310.121086971]: In rest state
[ INFO] [1676654310.121252580]: [fox_rt::StRest] State OnEntry code finished
[ INFO] [1676654310.121324488]: [Orthogonal fox_rt::OrAsync] OnEntry
[ INFO] [1676654310.121383724]: [Orthogonal fox_rt::OrAthena] OnEntry, current Behavior: fox_rt::CbAthenaTeleopActivate
[ INFO] [1676654310.121719519]: [StateMachine] life-time constrained smacc signal subscription created. Subscriber is fox_rt::CbAthenaTeleopActivate
[ INFO] [1676654310.121808196]: [Orthogonal fox_rt::OrGpio] OnEntry
[ INFO] [1676654310.121866631]: [Orthogonal fox_rt::OrHttp] OnEntry, current Behavior: fox_rt::CbRobotStatusUpdate
[ WARN] [1676654310.122005135]: Required client [fox_rt::ClBms<fox_rt::FrontBms>] not found in current orthogonal. Searching in other orthogonals.
[ WARN] [1676654310.122143191]: Required client  [fox_rt::ClBms<fox_rt::FrontBms>] found in other orthogonal.
[ WARN] [1676654310.122245565]: Required client [fox_rt::ClBms<fox_rt::RearBms>] not found in current orthogonal. Searching in other orthogonals.
[ WARN] [1676654310.122332354]: Required client  [fox_rt::ClBms<fox_rt::RearBms>] found in other orthogonal.
[ WARN] [1676654310.122419335]: Required client [cl_move_base_z::ClMoveBaseZ] not found in current orthogonal. Searching in other orthogonals.
[ WARN] [1676654310.122489003]: Required client  [cl_move_base_z::ClMoveBaseZ] found in other orthogonal.
[ INFO] [1676654310.122692694]: [StateMachine] life-time constrained smacc signal subscription created. Subscriber is fox_rt::CbRobotStatusUpdate
[ INFO] [1676654310.122767034]: [Orthogonal fox_rt::OrMapping] OnEntry
[ INFO] [1676654310.122823838]: [Orthogonal fox_rt::OrMotors] OnEntry
[ INFO] [1676654310.122910307]: [Orthogonal fox_rt::OrNavigation] OnEntry
[ INFO] [1676654310.122976262]: [Orthogonal fox_rt::OrSensors] OnEntry
[ INFO] [1676654310.123067243]: [Orthogonal fox_rt::OrTeleop] OnEntry, current Behavior: fox_rt::CbTeleopActivate
[ WARN] [1676654310.123277751]: Required client [fox_rt::ClSafeStopService] not found in current orthogonal. Searching in other orthogonals.
[ WARN] [1676654310.123360956]: Required client  [fox_rt::ClSafeStopService] found in other orthogonal.
[ INFO] [1676654310.124816431]: [StateMachine] life-time constrained smacc signal subscription created. Subscriber is fox_rt::CbTeleopActivate
[ INFO] [1676654310.124986137]: [Orthogonal fox_rt::OrTimer] OnEntry
[ INFO] [1676654310.125077886]: [Orthogonal fox_rt::OrZedCamera] OnEntry
[ INFO] [1676654313.168064505]: [SignalDetector] heartbeat
[ INFO] [1676654323.206133422]: [SignalDetector] heartbeat
[ INFO] [1676654333.249130333]: [SignalDetector] heartbeat
[ INFO] [1676654343.294498721]: [SignalDetector] heartbeat
[ INFO] [1676654353.306506218]: [SignalDetector] heartbeat

I expect to see log output from the updatable in between the heartbeats

@yassiezar
Copy link
Contributor Author

I managed to hack together something that looks like it might fix the issue. I've opened up PR #88 with more detail, but testing with a more diverse state configuration set is needed to make sure it actually works as expected. Let me know what you think

@pabloinigoblasco
Copy link
Collaborator

From what I understand, your proposal mainly involves avoiding the intermediate STEADY state by skipping the Steady state update for "parent states." Additionally, there are some aesthetic changes that involve the usage of std::atomic, which looks good.

My position is to accept your PR because it is good and it fixes a clearly defined use case that crashes.

These changes subtly alter the behavior of the state machine during the initialization of mode states, so this PR is important.

I believe that the initialization order of parent states and child states is something that has been less tested, so this research work and fix you did here is really welcome. It is unlikely that any of our examples or user developments have faced this issue and I think they will keep working the same

I want to test this PR in some of our examples. After doing this validation I will accept the PR and I will migrate these changes to all SMACC and SMACC2 versions.

Thank you very much for this research and fix @yassiezar .

@yassiezar
Copy link
Contributor Author

No problem @pabloinigoblasco, hopefully it works as expected and gets pulled in. I'd be curious what the interaction would look like with a deeper state hierarchy (e.g. superstate -> modestate -> child state). Also, some additional refactoring might be necessary, since some configuration calls may no longer triggered when a parent state has completed initialisation (this is something I didn't look at too much). Let me know if I can help with integration or testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants