Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect sequence of messages while reading dataChan after closing #1257

Closed
Gilthoniel opened this issue Jul 26, 2024 · 3 comments
Closed

Comments

@Gilthoniel
Copy link
Contributor

Gilthoniel commented Jul 26, 2024

Expected behavior

Closing a producer should never produce an inconsistent sequence of messages according to Send / SendAsync order of calls.

Actual behavior

When a producer is starting a reconnect loop, and is requested to close at the same time, it can happen that one or more pending batches are dropped and following ones are published.

Steps to reproduce

I can't provide a consistent way of reproducing this because it is very random and rare as you need to get unlucky on the sequence of events.

Here is the list of logs that lead me to that discovery:

{"ts": "2024-07-23T20:50:42.023Z", "msg": "Closing producer", "producerID": 36}
{"ts": "2024-07-23T20:50:42.023Z", "msg": "Connected producer", "producerID": 36, "epoch": 10}
{"ts": "2024-07-23T20:50:42.038Z", "msg": "Failing 1 messages on closing producer", "producerID": 36}

I think that what happens is that the producer is in reconnect loop and during that time we are accumulating sending requests and a close request. After successfully reconnecting, the close will eventually be processed but it only closes the channel so remaining sending requests can be written to the connection, in parallel of the producer closing (different channels in the client).

In the logs above, we can assume one sending request went through and has been dropped later on by the close but then at least one message has been written to the connection and successfully published in the broker.

From what I can see, closing does not actually prevent further messages to go through because closing the producer is done on a different channel:

	go func() {
		for {
			select {
			// ...
			case req := <-c.incomingRequestsCh:
                                // ...
				c.internalSendRequest(req)
			}
		}
	}()

	for {
		select {
		// ...
		case cmd := <-c.incomingCmdCh:
			c.internalReceivedCommand(cmd.cmd, cmd.headersAndPayload)
		case data := <-c.writeRequestsCh:
			// ....
			c.internalWriteData(data)
		}
	}

System configuration

Pulsar version: v3.0.5
Pulsar Go client: v12.1

@Gilthoniel Gilthoniel changed the title Incorrect sequence of messages due to a race between reconnect and close in producer Incorrect sequence of messages while reading dataChan after closing Jul 26, 2024
@Gilthoniel
Copy link
Contributor Author

If I'm correct, that could be fixed by simply emptying the channel after closing it to ensure that it is not processing further.

@gunli
Copy link
Contributor

gunli commented Jul 29, 2024

@Gilthoniel Good catch, could pls check if #1249 can fix this?

@Gilthoniel
Copy link
Contributor Author

@gunli Yes I think that's enough to avoid this situation. Please note that using a context like #1249 is not really what context is intended to and that should be handled via a channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants