Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHA-RL1h5 doesn’t respect operation atomicity #253

Open
lawrence-forooghian opened this issue Dec 3, 2024 · 13 comments
Open

CHA-RL1h5 doesn’t respect operation atomicity #253

lawrence-forooghian opened this issue Dec 3, 2024 · 13 comments
Assignees
Labels
chat Related to the Chat SDK.

Comments

@lawrence-forooghian
Copy link
Collaborator

lawrence-forooghian commented Dec 3, 2024

*** @(CHA-RL1h4)@ @[Testable]@ If the underlying Realtime Channel entered the @FAILED@ state, then the room status will transition to @FAILED@, using the error from the realtime channels attach() call as the @cause@ field of the @ErrorInfo@. The status code shall be @500@ and the error code whatever attachment error code corresponds to the feature that has failed, per "the error code list":#error-codes. The same error shall be thrown as the result of the @ATTACH@ operation.
*** @(CHA-RL1h5)@ @[Testable]@ When the room enters the @FAILED@ status as a result of @CHA-RL1h4@, asynchronously with respect to @CHA-RL1h4@, then the room will detach all channels that are not in the @FAILED@ state.
*** @(CHA-RL1h6)@ @[Testable]@ If, when performing @CHA-RL1h5@, a channel fails to detach on command, then the detach operation will be repeated until such a time that all channels have detached successfully.

The ATTACH operation ends in CHA-RL1h4, implying that the CHA-RL1h5 asynchronous detach happens outside of any room lifecycle operation. This means that another room lifecycle operation could run at the same time as this detach operation, which doesn't seem intentional.

I looked at the JS implementation and it seems that it keeps the lifecycle manager’s mutex locked during this "rundown" (as it calls the CHA-RL1h5 detach operation). But this is not implied in the spec. I think that to translate this behaviour to the spec, which implements mutual exclusion through lifecycle operations, we should do something analogous to the RETRY operation; that is, define a new internal-only room lifecycle operation (I’ll call it RUNDOWN for want of a better term), which is scheduled by CHA-RL1h5 and which:

  • performs the detach-all-non-FAILED-contributors behaviour of CHA-RL1h5
  • implements the retry behaviour of CHA-RL1h6
@sacOO7
Copy link
Collaborator

sacOO7 commented Dec 4, 2024

@lawrence-forooghian there is ambiguity around how channel-windown works when room gets into Failed state.
There's active issue on chat-js -> ably/ably-chat-js#405

@sacOO7
Copy link
Collaborator

sacOO7 commented Dec 4, 2024

I went through the chat-js behaviour and noted down few discrepancies => ably/ably-chat-js#405 (comment)

@sacOO7
Copy link
Collaborator

sacOO7 commented Dec 4, 2024

The conclusion was to support Full runDownChannels( atomic internal op - CHA-RL7a1) for all cases where room goes into Failed state.

lawrence-forooghian added a commit to ably/ably-chat-swift that referenced this issue Dec 4, 2024
This implementation reflects my suggested spec changes [1] which aim to
preserve lifecycle operation atomicity by introducing a new RUNDOWN
operation:

> The `ATTACH` operation ends in CHA-RL1h4, implying that the CHA-RL1h5
> asynchronous detach happens _outside of any room lifecycle operation_.
> This means that another room lifecycle operation could run at the same
> time as this detach operation, which doesn't seem intentional.
>
> I looked at the JS implementation [2] and it seems that it keeps the
> lifecycle manager’s mutex locked during this "rundown" (as it calls the
> CHA-RL1h5 detach operation). But this is not implied in the spec. I
> think that to translate this behaviour to the spec, which implements
> mutual exclusion through lifecycle operations, we should do something
> analogous to the `RETRY` operation; that is, define a new internal-only
> room lifecycle operation (I’ll call it `RUNDOWN` for want of a better
> term), which is scheduled by CHA-RL1h5 and which:
>
> - performs the detach-all-non-`FAILED`-contributors behaviour of CHA-RL1h5
> - implements the retry behaviour of CHA-RL1h6

Resolves #119.

[1] ably/specification#253
[2] https://github.com/ably/ably-chat-js/blob/e8380583424a83f7151405cc0716e01302295eb6/src/core/room-lifecycle-manager.ts#L506-L509
@lawrence-forooghian
Copy link
Collaborator Author

In ably/ably-chat-swift#174 I’ve implemented this spec point in Swift in line with my suggested changes above.

lawrence-forooghian added a commit to ably/ably-chat-swift that referenced this issue Dec 4, 2024
This implementation reflects my suggested spec changes [1] which aim to
preserve lifecycle operation atomicity by introducing a new RUNDOWN
operation:

> The `ATTACH` operation ends in CHA-RL1h4, implying that the CHA-RL1h5
> asynchronous detach happens _outside of any room lifecycle operation_.
> This means that another room lifecycle operation could run at the same
> time as this detach operation, which doesn't seem intentional.
>
> I looked at the JS implementation [2] and it seems that it keeps the
> lifecycle manager’s mutex locked during this "rundown" (as it calls the
> CHA-RL1h5 detach operation). But this is not implied in the spec. I
> think that to translate this behaviour to the spec, which implements
> mutual exclusion through lifecycle operations, we should do something
> analogous to the `RETRY` operation; that is, define a new internal-only
> room lifecycle operation (I’ll call it `RUNDOWN` for want of a better
> term), which is scheduled by CHA-RL1h5 and which:
>
> - performs the detach-all-non-`FAILED`-contributors behaviour of CHA-RL1h5
> - implements the retry behaviour of CHA-RL1h6

Resolves #119.

[1] ably/specification#253
[2] https://github.com/ably/ably-chat-js/blob/e8380583424a83f7151405cc0716e01302295eb6/src/core/room-lifecycle-manager.ts#L506-L509
@maratal
Copy link
Collaborator

maratal commented Dec 5, 2024

When the room enters the @failed@ status as a result of @CHA-RL1h4@, asynchronously with respect to @CHA-RL1h4@, then the room will detach all channels that are not in the @failed@ state.

Isn't it reduces "fault tolerance" of the room? F.e. if the channel for reactions fails due to high-load, messages could still serve users and in this configurations it just fails as a whole. wdyt @lawrence-forooghian @AndyTWF @sacOO7

@sacOO7
Copy link
Collaborator

sacOO7 commented Dec 5, 2024

No, thats not the point. Room is treated as one unit, hence either all parts are working or all in detached/failed state.

@sacOO7
Copy link
Collaborator

sacOO7 commented Dec 5, 2024

That's the reason, we have concept of room atomicity where we make sure every operation conforms to a certain result, only then next operation is performed.

@sacOO7
Copy link
Collaborator

sacOO7 commented Dec 5, 2024

@maratal you might like to go through ably/ably-chat-js#405 (comment) to understand what RoomStatus value exactly represents for underlying channels.

@sacOO7
Copy link
Collaborator

sacOO7 commented Dec 5, 2024

We do not want RoomStatus where it can't represent deterministic state for underlying channels.

@maratal
Copy link
Collaborator

maratal commented Dec 5, 2024

I would understand this if all channels have equal chances of fail. But the idea of splitting came from the fact that they are not (some channels might require different server). Merging those statuses makes this splitting questionable, why not just have one room channel connected to the single server then?

@sacOO7
Copy link
Collaborator

sacOO7 commented Dec 5, 2024

what do you mean some channels may require different server :
Channel is a logical abstraction build on top of single connection. In the end, all channels messages are sent over a single connection.
You might like to read first paragraph of https://ably.com/blog/building-a-realtime-chat-app-with-react-laravel-and-websockets#laravel-broadcasting

@sacOO7
Copy link
Collaborator

sacOO7 commented Dec 5, 2024

When we publish messages over 2 channels
channel1 -> text: hello, john here
channel2 -> text: hello, nice to meet you !
Those messages will be encoded as

{
channeName: channel1,
message: { text: hello, john here }
}
{
channeName: channel2,
message: { text: hello, nice to meet you ! }
}

Both of those will be sent over a single connection and decoded accordingly at receiver side. It may feel like receiving on different channel, but real magic happens while encoding/decoding

@maratal
Copy link
Collaborator

maratal commented Dec 5, 2024

Channel is a logical abstraction build on top of single connection

Sure thing, I forgot about it! Then ignore me 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chat Related to the Chat SDK.
Development

No branches or pull requests

4 participants