Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve graceful shutdown sequence #256

Open
nikitapecasa opened this issue Sep 23, 2021 · 0 comments
Open

Improve graceful shutdown sequence #256

nikitapecasa opened this issue Sep 23, 2021 · 0 comments

Comments

@nikitapecasa
Copy link
Contributor

nikitapecasa commented Sep 23, 2021

Currently it persists the state without updating committable offsets, so after shutdown we have updated state in persistent storage but no committed offsets, which is not a critical issue by itself as any sane application should support deduplication of input messages, meaning that when we start an app again it would load latest state and re-process some messages by simply ignoring them.

Improved graceful shutdown sequence should look like

  • finish processing of current messages we got from consumer.poll
  • persist the state
  • collect committable offsets after state is persisted (currently TimerFlowOf's release does nothing related to offsets on release)

with following conditions

  • if messages' processing fails, no persistence and no commit is required
  • if state's persisting fails, committable offsets cannot be updated and no commit is required

A minimum set of tests needs to be added

  • if state is persisted, correct offsets are committed, so that on next app run there's no re-processing of messages
  • if state's persistence failed, then commit should not happen

Also it might affect a special case of graceful shutdown for a single PartitionFlow instead of the whole app

  • on consumer group's rebalance some partitions might got removed from current consumer instance
  • as a result corresponding PartitionFlow would be removed from TopicFlow's cache, triggering the release of PartitionFlow, which would flush/persist the state if corresponding config is enabled (flushOnRevoke)
  • current implementation does not try to commit any offsets afterwards, it simply removes/forgets pending commits

Chances are quite high that it won't be an easy task as

  • we don't have access to pending offsets to be committed in TimerFlowOf
  • shutdown sequence is a result of a composition of different resources, so multiple files have to be adjusted, mb even some considerable part of the processing flow and the way we assemble the structure of Topic/Partition/Key/Timer flows
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant