Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Received unknown submessage kind HEARTBEAT_FRAG #329

Open
YeahhhhzZ opened this issue Mar 17, 2024 · 5 comments
Open

Received unknown submessage kind HEARTBEAT_FRAG #329

YeahhhhzZ opened this issue Mar 17, 2024 · 5 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@YeahhhhzZ
Copy link

YeahhhhzZ commented Mar 17, 2024

HEARTBEAT_FRAG is a standard submessageID, but is considered unknown by RustDDS and warns it:
https://github.com/jhelovuo/RustDDS/blob/master/src/rtps/submessage.rs#L281

But in fact rustdds also defines it:
https://github.com/jhelovuo/RustDDS/blob/master/src/messages/submessages/submessage_kind.rs#L27

So when parsing sub_header, what is the reason for ignoring HEARTBEAT_FRAG?

@jhelovuo
Copy link
Owner

A HEARTBEAT_FRAG is sent from a Writer to Readers to inform that some (but likely not all) fragments of a sample are available.

The error about unknown submessage type originates from Submessage::read_from_buffer(). I just fixed this by adding a handler branch in the latest master commit.

However, this does not solve the real issue that handling HEARTBEAT_FRAG is not implemented in Reader::handle_heartbeatfrag_msg. That will still log a message about unimplemented functionality.

RustDDS can still receive fragmented (i.e. large) samples, but it requires that the publisher is ready to send all the fragments of the sample, in which case the Writer should use normal HEARTBEAT to announce the availability of the data, instead of HEARTBEAT_FRAG.

The submessage HEARTBEAT_FRAG is mostly an optimization. It would be strictly necessary in such a case that the sample to be transmitted is so large that the publisher is unable cache even a single sample. Or for some other reason the transmission of the sample fragments must be started before the entire sample is available for transmit. Note that the DDS API does not have any functionality for passing partial samples over the API.

Are you experiencing actual interoperability problems because of this, or just seeing errors in the log?

@jhelovuo jhelovuo added enhancement New feature or request help wanted Extra attention is needed labels Mar 18, 2024
@YeahhhhzZ
Copy link
Author

A HEARTBEAT_FRAG is sent from a Writer to Readers to inform that some (but likely not all) fragments of a sample are available.

The error about unknown submessage type originates from Submessage::read_from_buffer(). I just fixed this by adding a handler branch in the latest master commit.

However, this does not solve the real issue that handling HEARTBEAT_FRAG is not implemented in Reader::handle_heartbeatfrag_msg. That will still log a message about unimplemented functionality.

RustDDS can still receive fragmented (i.e. large) samples, but it requires that the publisher is ready to send all the fragments of the sample, in which case the Writer should use normal HEARTBEAT to announce the availability of the data, instead of HEARTBEAT_FRAG.

The submessage HEARTBEAT_FRAG is mostly an optimization. It would be strictly necessary in such a case that the sample to be transmitted is so large that the publisher is unable cache even a single sample. Or for some other reason the transmission of the sample fragments must be started before the entire sample is available for transmit. Note that the DDS API does not have any functionality for passing partial samples over the API.

Are you experiencing actual interoperability problems because of this, or just seeing errors in the log?

Thanks for the explanation!

Our previous process (named A) was implemented based on C++.
Currently, we have added a new process (named B) using rust, so the dds communication part is completed by rustdds and cyclonedds

When process A and process B communicate, the B-side log reports the following error:

E20240319 15:53:02.883694178 5990 external/cargo_crate__rustdds-0.9.0/src/rtps/submessage.rs:281] Received unknown submessage kind HEARTBEAT_FRAG
E20240319 15:53:02.883750628 5990 external/cargo_crate__rustdds-0.9.0/src/rtps/submessage.rs:281] Received unknown submessage kind HEARTBEAT_FRAG
E20240319 15:53:02.883765605 5990 external/cargo_crate__rustdds-0.9.0/src/rtps/fragment_assembler.rs:108] Received DATAFRAG too small. fragment_starting_num=21 out of fragment_count=28, frags_in_submessage=8, frag_size=1344 but payload length =9976

However, judging from the data reception situation at both ends, there is no obvious abnormality.

But When i restart process A. The B-side starts to report the following error:

E20240319 15:54:32.917085873 5990 external/cargo_crate__rustdds-0.9.0/src/rtps/submessage.rs:281] Received unknown submessage kind HEARTBEAT_FRAG
E20240319 15:54:32.917174029 5990 external/cargo_crate__rustdds-0.9.0/src/rtps/submessage.rs:281] Received unknown submessage kind HEARTBEAT_FRAG
E20240319 15:54:32.917193455 5990 external/cargo_crate__rustdds-0.9.0/src/rtps/fragment_assembler.rs:108] Received DATAFRAG too small. fragment_starting_num=21 out of fragment_count=28, frags_in_submessage=8, frag_size=1344 but payload length =9976
E20240319 15:54:33.068700005 5990 external/cargo_crate__rustdds-0.9.0/src/rtps/writer.rs:1129] handle ack_nack writer GUID {0112a42677c1b93e74ba72c2 EntityId::P2P_BUILTIN_PARTICIPANT_MESSAGE_WRITER} seq.number SequenceNumber(14) missing from instant map
E20240319 15:54:34.069248585 5990 external/cargo_crate__rustdds-0.9.0/src/rtps/writer.rs:1129] handle ack_nack writer GUID {0112a42677c1b93e74ba72c2 EntityId::P2P_BUILTIN_PARTICIPANT_MESSAGE_WRITER} seq.number SequenceNumber(15) missing from instant map
E20240319 15:54:35.069652256 5990 external/cargo_crate__rustdds-0.9.0/src/rtps/writer.rs:1129] handle ack_nack writer GUID {0112a42677c1b93e74ba72c2 EntityId::P2P_BUILTIN_PARTICIPANT_MESSAGE_WRITER} seq.number SequenceNumber(16) missing from instant map

At the same time, process A cannot receive data from process B.

@jhelovuo
Copy link
Owner

Based on the error messages, there is some debugging you could do:

0.9.0/src/rtps/fragment_assembler.rs:108] Received DATAFRAG too small. fragment_starting_num=21 out of fragment_count=28, frags_in_submessage=8, frag_size=1344 but payload length =9976

Here RustDDS is complaining about a malformed DATAFRAG it receives. The DATAFRAG header says it contains 8 fragments x 1344 bytes/fragment, which is expected to be 10752 bytes, but the serialized payload was found to be only 9976 bytes.

Now, there could be roughly three causes:

  • CycloneDDS is sending a malformed message.
  • your network is corrupting the message
  • RustDDS is misinterpreting a correctly formed message.

The offending message needs to captured by e.g. Wireshark to see what is going on.

E20240319 15:54:34.069248585 5990 external/cargo_crate__rustdds-0.9.0/src/rtps/writer.rs:1129] handle ack_nack writer GUID {0112a42677c1b93e74ba72c2 EntityId::P2P_BUILTIN_PARTICIPANT_MESSAGE_WRITER} seq.number SequenceNumber(15) missing from instant map
E20240319 15:54:35.069652256 5990 external/cargo_crate__rustdds-0.9.0/src/rtps/writer.rs:1129] handle ack_nack writer GUID {0112a42677c1b93e74ba72c2 EntityId::P2P_BUILTIN_PARTICIPANT_MESSAGE_WRITER} seq.number SequenceNumber(16) missing from instant map

At the same time, process A cannot receive data from process B.

Here Participant A is requesting (via ACKNACK submessage) some past Writer Liveliness samples from RustDDS, but the requested messages no longer exist. Possibly they have been garbage collected, as the Topic should have QoS History with depth=1. Here RustDDS should reply with a GAP submessage to indicate that the requested samples no longer exist.

Please see RTPS Spec v2.5 Section "8.4.13 Writer Liveliness Protocol" for the meaning of EntityId::P2P_BUILTIN_PARTICIPANT_MESSAGE_WRITER.

Since the messages appear only once per SequenceNumber and there only a fixed amount of them, it should not be a great cause for alarm.

For the communication not working B -> A, if you can construct a minimal code that reproduces the error, then we can take a more detailed look.

@YeahhhhzZ
Copy link
Author

Based on the error messages, there is some debugging you could do:

0.9.0/src/rtps/fragment_assembler.rs:108] Received DATAFRAG too small. fragment_starting_num=21 out of fragment_count=28, frags_in_submessage=8, frag_size=1344 but payload length =9976

Here RustDDS is complaining about a malformed DATAFRAG it receives. The DATAFRAG header says it contains 8 fragments x 1344 bytes/fragment, which is expected to be 10752 bytes, but the serialized payload was found to be only 9976 bytes.

Now, there could be roughly three causes:

* CycloneDDS is sending a malformed message.

* your network is corrupting the message

* RustDDS is misinterpreting a correctly formed message.

The offending message needs to captured by e.g. Wireshark to see what is going on.

E20240319 15:54:34.069248585 5990 external/cargo_crate__rustdds-0.9.0/src/rtps/writer.rs:1129] handle ack_nack writer GUID {0112a42677c1b93e74ba72c2 EntityId::P2P_BUILTIN_PARTICIPANT_MESSAGE_WRITER} seq.number SequenceNumber(15) missing from instant map
E20240319 15:54:35.069652256 5990 external/cargo_crate__rustdds-0.9.0/src/rtps/writer.rs:1129] handle ack_nack writer GUID {0112a42677c1b93e74ba72c2 EntityId::P2P_BUILTIN_PARTICIPANT_MESSAGE_WRITER} seq.number SequenceNumber(16) missing from instant map

At the same time, process A cannot receive data from process B.

Here Participant A is requesting (via ACKNACK submessage) some past Writer Liveliness samples from RustDDS, but the requested messages no longer exist. Possibly they have been garbage collected, as the Topic should have QoS History with depth=1. Here RustDDS should reply with a GAP submessage to indicate that the requested samples no longer exist.

Please see RTPS Spec v2.5 Section "8.4.13 Writer Liveliness Protocol" for the meaning of EntityId::P2P_BUILTIN_PARTICIPANT_MESSAGE_WRITER.

Since the messages appear only once per SequenceNumber and there only a fixed amount of them, it should not be a great cause for alarm.

For the communication not working B -> A, if you can construct a minimal code that reproduces the error, then we can take a more detailed look.

Thanks again for your patience in explaining! I will check it in detail and then reply :)

@jhelovuo
Copy link
Owner

jhelovuo commented Apr 8, 2024

RustDDS 0.9.2 has been released. There are fixes possibly affecting the symptoms you report above.

Do you still experience the errors using the latest release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants