CHA-RL1h5 doesn’t respect operation atomicity #253

lawrence-forooghian · 2024-12-03T20:00:06Z

*** @(CHA-RL1h4)@ @[Testable]@ If the underlying Realtime Channel entered the @FAILED@ state, then the room status will transition to @FAILED@, using the error from the realtime channels attach() call as the @cause@ field of the @ErrorInfo@. The status code shall be @500@ and the error code whatever attachment error code corresponds to the feature that has failed, per "the error code list":#error-codes. The same error shall be thrown as the result of the @ATTACH@ operation.
*** @(CHA-RL1h5)@ @[Testable]@ When the room enters the @FAILED@ status as a result of @CHA-RL1h4@, asynchronously with respect to @CHA-RL1h4@, then the room will detach all channels that are not in the @FAILED@ state.
*** @(CHA-RL1h6)@ @[Testable]@ If, when performing @CHA-RL1h5@, a channel fails to detach on command, then the detach operation will be repeated until such a time that all channels have detached successfully.

The ATTACH operation ends in CHA-RL1h4, implying that the CHA-RL1h5 asynchronous detach happens outside of any room lifecycle operation. This means that another room lifecycle operation could run at the same time as this detach operation, which doesn't seem intentional.

I looked at the JS implementation and it seems that it keeps the lifecycle manager’s mutex locked during this "rundown" (as it calls the CHA-RL1h5 detach operation). But this is not implied in the spec. I think that to translate this behaviour to the spec, which implements mutual exclusion through lifecycle operations, we should do something analogous to the RETRY operation; that is, define a new internal-only room lifecycle operation (I’ll call it RUNDOWN for want of a better term), which is scheduled by CHA-RL1h5 and which:

performs the detach-all-non-FAILED-contributors behaviour of CHA-RL1h5
implements the retry behaviour of CHA-RL1h6

The text was updated successfully, but these errors were encountered:

sacOO7 · 2024-12-04T09:46:37Z

@lawrence-forooghian there is ambiguity around how channel-windown works when room gets into Failed state.
There's active issue on chat-js -> ably/ably-chat-js#405

sacOO7 · 2024-12-04T09:48:05Z

I went through the chat-js behaviour and noted down few discrepancies => ably/ably-chat-js#405 (comment)

sacOO7 · 2024-12-04T10:07:10Z

The conclusion was to support Full runDownChannels( atomic internal op - CHA-RL7a1) for all cases where room goes into Failed state.

This implementation reflects my suggested spec changes [1] which aim to preserve lifecycle operation atomicity by introducing a new RUNDOWN operation: > The `ATTACH` operation ends in CHA-RL1h4, implying that the CHA-RL1h5 > asynchronous detach happens _outside of any room lifecycle operation_. > This means that another room lifecycle operation could run at the same > time as this detach operation, which doesn't seem intentional. > > I looked at the JS implementation [2] and it seems that it keeps the > lifecycle manager’s mutex locked during this "rundown" (as it calls the > CHA-RL1h5 detach operation). But this is not implied in the spec. I > think that to translate this behaviour to the spec, which implements > mutual exclusion through lifecycle operations, we should do something > analogous to the `RETRY` operation; that is, define a new internal-only > room lifecycle operation (I’ll call it `RUNDOWN` for want of a better > term), which is scheduled by CHA-RL1h5 and which: > > - performs the detach-all-non-`FAILED`-contributors behaviour of CHA-RL1h5 > - implements the retry behaviour of CHA-RL1h6 Resolves #119. [1] ably/specification#253 [2] https://github.com/ably/ably-chat-js/blob/e8380583424a83f7151405cc0716e01302295eb6/src/core/room-lifecycle-manager.ts#L506-L509

lawrence-forooghian · 2024-12-04T14:35:47Z

In ably/ably-chat-swift#174 I’ve implemented this spec point in Swift in line with my suggested changes above.

This implementation reflects my suggested spec changes [1] which aim to preserve lifecycle operation atomicity by introducing a new RUNDOWN operation: > The `ATTACH` operation ends in CHA-RL1h4, implying that the CHA-RL1h5 > asynchronous detach happens _outside of any room lifecycle operation_. > This means that another room lifecycle operation could run at the same > time as this detach operation, which doesn't seem intentional. > > I looked at the JS implementation [2] and it seems that it keeps the > lifecycle manager’s mutex locked during this "rundown" (as it calls the > CHA-RL1h5 detach operation). But this is not implied in the spec. I > think that to translate this behaviour to the spec, which implements > mutual exclusion through lifecycle operations, we should do something > analogous to the `RETRY` operation; that is, define a new internal-only > room lifecycle operation (I’ll call it `RUNDOWN` for want of a better > term), which is scheduled by CHA-RL1h5 and which: > > - performs the detach-all-non-`FAILED`-contributors behaviour of CHA-RL1h5 > - implements the retry behaviour of CHA-RL1h6 Resolves #119. [1] ably/specification#253 [2] https://github.com/ably/ably-chat-js/blob/e8380583424a83f7151405cc0716e01302295eb6/src/core/room-lifecycle-manager.ts#L506-L509

maratal · 2024-12-05T17:28:27Z

When the room enters the @failed@ status as a result of @CHA-RL1h4@, asynchronously with respect to @CHA-RL1h4@, then the room will detach all channels that are not in the @failed@ state.

Isn't it reduces "fault tolerance" of the room? F.e. if the channel for reactions fails due to high-load, messages could still serve users and in this configurations it just fails as a whole. wdyt @lawrence-forooghian @AndyTWF @sacOO7

sacOO7 · 2024-12-05T17:30:03Z

No, thats not the point. Room is treated as one unit, hence either all parts are working or all in detached/failed state.

sacOO7 · 2024-12-05T17:30:37Z

That's the reason, we have concept of room atomicity where we make sure every operation conforms to a certain result, only then next operation is performed.

sacOO7 · 2024-12-05T17:31:58Z

@maratal you might like to go through ably/ably-chat-js#405 (comment) to understand what RoomStatus value exactly represents for underlying channels.

sacOO7 · 2024-12-05T17:33:16Z

We do not want RoomStatus where it can't represent deterministic state for underlying channels.

maratal · 2024-12-05T17:40:57Z

I would understand this if all channels have equal chances of fail. But the idea of splitting came from the fact that they are not (some channels might require different server). Merging those statuses makes this splitting questionable, why not just have one room channel connected to the single server then?

sacOO7 · 2024-12-05T17:55:36Z

what do you mean some channels may require different server :
Channel is a logical abstraction build on top of single connection. In the end, all channels messages are sent over a single connection.
You might like to read first paragraph of https://ably.com/blog/building-a-realtime-chat-app-with-react-laravel-and-websockets#laravel-broadcasting

sacOO7 · 2024-12-05T18:00:14Z

When we publish messages over 2 channels
channel1 -> text: hello, john here
channel2 -> text: hello, nice to meet you !
Those messages will be encoded as

{
channeName: channel1,
message: { text: hello, john here }
}

{
channeName: channel2,
message: { text: hello, nice to meet you ! }
}

Both of those will be sent over a single connection and decoded accordingly at receiver side. It may feel like receiving on different channel, but real magic happens while encoding/decoding

maratal · 2024-12-05T18:11:41Z

Channel is a logical abstraction build on top of single connection

Sure thing, I forgot about it! Then ignore me 😆

lawrence-forooghian added the chat Related to the Chat SDK. label Dec 3, 2024

lawrence-forooghian assigned AndyTWF Dec 3, 2024

lawrence-forooghian mentioned this issue Dec 3, 2024

Implement "asynchronously" part of CHA-RL1h5 ably/ably-chat-swift#119

Closed

lawrence-forooghian mentioned this issue Dec 4, 2024

[ECO-5122] Implement the "asynchronously" part of CHA-RL1h5 ably/ably-chat-swift#174

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHA-RL1h5 doesn’t respect operation atomicity #253

CHA-RL1h5 doesn’t respect operation atomicity #253

lawrence-forooghian commented Dec 3, 2024 •

edited

Loading

sacOO7 commented Dec 4, 2024

sacOO7 commented Dec 4, 2024

sacOO7 commented Dec 4, 2024

lawrence-forooghian commented Dec 4, 2024

maratal commented Dec 5, 2024

sacOO7 commented Dec 5, 2024

sacOO7 commented Dec 5, 2024

sacOO7 commented Dec 5, 2024

sacOO7 commented Dec 5, 2024

maratal commented Dec 5, 2024

sacOO7 commented Dec 5, 2024 •

edited

Loading

sacOO7 commented Dec 5, 2024

maratal commented Dec 5, 2024

CHA-RL1h5 doesn’t respect operation atomicity #253

CHA-RL1h5 doesn’t respect operation atomicity #253

Comments

lawrence-forooghian commented Dec 3, 2024 • edited Loading

sacOO7 commented Dec 4, 2024

sacOO7 commented Dec 4, 2024

sacOO7 commented Dec 4, 2024

lawrence-forooghian commented Dec 4, 2024

maratal commented Dec 5, 2024

sacOO7 commented Dec 5, 2024

sacOO7 commented Dec 5, 2024

sacOO7 commented Dec 5, 2024

sacOO7 commented Dec 5, 2024

maratal commented Dec 5, 2024

sacOO7 commented Dec 5, 2024 • edited Loading

sacOO7 commented Dec 5, 2024

maratal commented Dec 5, 2024

lawrence-forooghian commented Dec 3, 2024 •

edited

Loading

sacOO7 commented Dec 5, 2024 •

edited

Loading