Skip to content

Validate that the LRA coordinator getStatus matches the state model#282

Open
marcosgopen wants to merge 1 commit intojbosstm:mainfrom
marcosgopen:158
Open

Validate that the LRA coordinator getStatus matches the state model#282
marcosgopen wants to merge 1 commit intojbosstm:mainfrom
marcosgopen:158

Conversation

@marcosgopen
Copy link
Member

@marcosgopen marcosgopen commented Mar 12, 2026

@marcosgopen marcosgopen requested a review from mmusgrov March 12, 2026 17:20
Copy link
Member

@tomjenkinson tomjenkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use byteman in order to verify the precise states are all tested? The "||" side of things makes it possible that not all transitions are tested

@marcosgopen
Copy link
Member Author

Is it possible to use byteman in order to verify the precise states are all tested? The "||" side of things makes it possible that not all transitions are tested

Yes @tomjenkinson , sure. I am working also on #124 so I adapt the test to use byteman and check the final status.

@marcosgopen marcosgopen force-pushed the 158 branch 2 times, most recently from 5d46dd9 to fef2510 Compare March 13, 2026 13:40
@marcosgopen
Copy link
Member Author

Actually @tomjenkinson the NarayanaLRAClient calls are "synchronous" meaning they use 'toCompletableFuture' but then wait for the response:

Response response = client.getNestedLRAStatus(encodedLRA)
                    .toCompletableFuture().get(QUERY_TIMEOUT, TimeUnit.SECONDS);

so the LRA and Participant should actually be already in a terminal status. I will update the tests.
However you touched a good point and I am going to work on #124 for crash tests (e.g. lraCoordinatorRecoveryTwoLRAs)

@marcosgopen marcosgopen force-pushed the 158 branch 2 times, most recently from 897adab to c400536 Compare March 13, 2026 13:58
}
try {
lraClient.closeLRA(parentId1);
} catch (Exception ignore) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment to explain why the exception is expected would be good. And if it is expected then maybe there should be a fail after the closeLRA call if it did succeed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tomjenkinson for your comment. The try catch block is not needed there. I will review also all the other try catch blocks in the LRATest to make sure they are strictly needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a commit to remove it and other 2 try-catch blocks which are unnecessary.

* When the parent LRA is cancelled, the nested LRA (acting as a participant)
* must follow the compensate path:
*
* Active -> Compensating -> Compensated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I am missing something (very possible) I don't think this test verifies that the Compensating stage occurred? Just that it went to Compensated. I mean I could approve the PR but I suppose it wouldn't fully address #158 until we also had a test to check that?

In other words, specifically this test verifies Active -> Compensated IIUC

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I will have a look if I can check also the transient statuses

*/
@Test
@BMRules(rules = {
@BMRule(name = "pause at child Closing state", targetClass = "io.narayana.lra.coordinator.domain.model.LongRunningAction", targetMethod = "updateState(org.eclipse.microprofile.lra.annotation.LRAStatus, boolean)", targetLocation = "AT EXIT", condition = "$! && $1 == org.eclipse.microprofile.lra.annotation.LRAStatus.Closing && !$0.isTopLevel()", action = "io.narayana.lra.coordinator.domain.model.BytemanHelper.signalRendezvous(\"child-closing-reached\");"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomjenkinson I added a byteman rule and helper methods to stop the thread when the status is updated to the transient state so that we can check the transient status and then release the thread to complete the test. WDYT?

"Nested LRA participant must be Active before parent close");

// Close the parent in a separate thread (synchronous call)
Thread closeThread = new Thread(() -> lraClient.closeLRA(parentId));
Copy link
Member Author

@marcosgopen marcosgopen Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary to run the rest call in a different thread to have it async

@marcosgopen
Copy link
Member Author

This PR partially addresses issue #124, but I will create a follow up PR to address it and add byteman in arquillian tests too

Verify LRA are in terminal state after sync call

LRA could be in non terminal state only if external participants are enrolled in the nested LRA

Remove try catch blocks when an exception is not expected

Adding byteman to check transient statuses
// Issue: there is already a logic to check if the nested LRA should be deleted or not (see fromHierarchy),
// but the Coordinator.completeNestedLRA endpoint always call endLRA with fromHierarchy=true
// whilst it should set it to true only when the parent is committing
@Disabled
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #287

@mmusgrov
Copy link
Member

This is nice set of tests and the byteman rules are an excellent verification that the implementation is following the model correctly.

I tried ticking off which of the state transitions are tested and four are definitely tested but I may have missed some of the other transitions that the new tests validate.

The four transitions from transient states to failed states are missing and there's a nested participant transition from completed to compensating (although I'd expect there should already be a test for that somewhere). Some of the existing tests may or may not already test for these - we have loads of model tests plus the ones in the TCK (I also looked at all calls to LRATest.assertStatus and none of the calls checked for transient states):

LRA state model:

active -> closing -> closed (testClosingTransientState and testParticipantStateModelParentCloseCascade)
active -> closing -> failed to closed
active -> cancelling -> cancelled (testCancellingTransientState and testParticipantStateModelParentCancelCascade)
active -> cancelling -> failed to cancel

Participant state model:

active -> completing -> completed (testParticipantStateModelClosePath)
active -> completing -> completed -> compensating
active -> completing -> failed to complete
active -> compensating -> compensated (testParticipantStateModelCancelPath)
active -> compensating -> failed to compensate

I think the missing transitions can be addressed in a followup PR.

@marcosgopen
Copy link
Member Author

This is nice set of tests and the byteman rules are an excellent verification that the implementation is following the model correctly.

I tried ticking off which of the state transitions are tested and four are definitely tested but I may have missed some of the other transitions that the new tests validate.

The four transitions from transient states to failed states are missing and there's a nested participant transition from completed to compensating (although I'd expect there should already be a test for that somewhere). Some of the existing tests may or may not already test for these - we have loads of model tests plus the ones in the TCK (I also looked at all calls to LRATest.assertStatus and none of the calls checked for transient states):

LRA state model:

active -> closing -> closed (testClosingTransientState and testParticipantStateModelParentCloseCascade) active -> closing -> failed to closed active -> cancelling -> cancelled (testCancellingTransientState and testParticipantStateModelParentCancelCascade) active -> cancelling -> failed to cancel

Participant state model:

active -> completing -> completed (testParticipantStateModelClosePath) active -> completing -> completed -> compensating active -> completing -> failed to complete active -> compensating -> compensated (testParticipantStateModelCancelPath) active -> compensating -> failed to compensate

I think the missing transitions can be addressed in a followup PR.

Thanks @mmusgrov for your analysis, I think I can add the missing transitions in this PR as well. I will add them shortly. It might be good to create a new test class for testing the state model transition tests (LRATest class is getting far too long ;))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants