Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement idempotency & message queue for critical operations #7656

Open
3 tasks
rikukissa opened this issue Sep 25, 2024 · 0 comments
Open
3 tasks

Implement idempotency & message queue for critical operations #7656

rikukissa opened this issue Sep 25, 2024 · 0 comments
Assignees
Labels
🔬 Needs tech design This ticket is waiting for technical design 💾 Persistence Storage, databases & data formats related tasks Tech

Comments

@rikukissa
Copy link
Member

rikukissa commented Sep 25, 2024

Design principles

We need to handle errors in the system more gracefully. If there is any error whilst processing an application so that the data is never lost. The following approach aims to

  • Ensure data integrity in all situations
  • Provide clear signalling of process steps end-to-end
  • Allow the client to retry sending the same payload until data integrity and persistence can be verified
  • Allow the client to keep hold of the data until verified

Note

A proof of concept of this has been implemented in this PR. The changes only apply for when a completely new record is created so the approach needs to be extended to other operations.

Requirements

Client should never purge draft unless it can verify the record was fully written. To do this, it needs to verify the record was received either by

  1. the backend returning an unique id for the record and the client polling with this ID to see the status
  2. the client deciding on a unique id (e.g. draft id for record creation) that is similarly used for polling

Status should never be OK and the client should never remove the local record before metadata is persisted in MongoDB, attachments are stored in Minio and search indexing happens in Elasticsearch. If any of these fails, the record stays in the queue indefinitely and alerts are sent to system admin.

If one of these steps fails, the backend needs to work so the client can safely retry without duplicate entries. In other words, the backend operations need to be idempotent. The outcome in the database should be the same even if you first unsuccessfully submit a record 10 times, then submit it successfully once and then try submitting it 5 more times. The output should be one record written in the database.

The endpoints that this apply to are at this phase the ones writing mission critical data with a lot of user input that might not be available anymore after a potential failure. Specifically these queries:

Important

Tech design: What other operations should we consider?

requestRegistrationCorrection(id: ID!, details: CorrectionInput!): ID!
createBirthRegistrationCorrection(
  id: ID!
  details: BirthRegistrationInput!
): ID!
createDeathRegistrationCorrection(
  id: ID!
  details: DeathRegistrationInput!
): ID!
createMarriageRegistrationCorrection(
  id: ID!
  details: MarriageRegistrationInput!
): ID!
createBirthRegistration(details: BirthRegistrationInput!): CreatedIds!
createDeathRegistration(details: DeathRegistrationInput!): CreatedIds!
createMarriageRegistration(details: MarriageRegistrationInput!): CreatedIds!

Note

Proof of concept implements a message queue and retrying for create[Event]Registration operations.

Tech tasks

  • Go through the mutations mentioned above. Implement idempotency for each write operation
  • Deploy & shut down Elasticsearch. Submit a declaration. Allow it to fail a few times. Verify correct 5** status code (or equivalent) was returned by both search service and gateway. start ES up again. Verify **one ** record was written both to MongoDB and ES, audit log and that there are no attachments not connected to the right correction event.
  • Verify error was reported on Sentry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔬 Needs tech design This ticket is waiting for technical design 💾 Persistence Storage, databases & data formats related tasks Tech
Projects
Status: Backlog
Development

No branches or pull requests

2 participants