Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete from Acoustic when deleting contact from DB #571

Closed
leplatrem opened this issue Mar 6, 2023 · 7 comments
Closed

Delete from Acoustic when deleting contact from DB #571

leplatrem opened this issue Mar 6, 2023 · 7 comments
Labels
enhancement New feature or request

Comments

@leplatrem
Copy link
Contributor

leplatrem commented Mar 6, 2023

Currently, the GDPR requests to delete contacts are handled in at least two different steps. One for the DB (see #240), and another for Acoustic.

Deleting the recipients from Acoustic from the DELETE /ctms/email endpoint is not trivial using the current code/architecture, since Acoustic is only accessed from the sync job.

But, since anyone with a token can issue a DELETE request (#239), outside the GDPR process, we could easily end up in a situation where contacts are soft deleted from CTMS but remain in Acoustic. Plus, there does not seem to be easy way to compare the two databases (#565).

We need to find a solution to solve deletions.

solution 1: soft-delete #60

Instead of full delete as done in #548, we implement #60.

A contact will become deleted from the DB after the following steps:

  1. Request DELETE /ctms/{email_id}
  2. A deleted flag is set on contact, and contact is marked as "pending sync"
  3. The Acoustic sync job deletes it from Acoustic, and remove the "pending sync" state
  4. An additional job (or the same) removes all contacts from the DB that have the deleted flag and no pending sync

solution 2: delete from Acoustic from delete endpoint

This solution would consist in moving some of the Acoustic code from the sync command to the Web app, so that a synchronous call to Acoustic could be made within the request/response cycle on the DELETE endpoint.

AFAIU the Web app does not have all env vars necessary to initialize the Acoustic client, like the sync job does.

solution 3: _____

@leplatrem leplatrem added the enhancement New feature or request label Mar 6, 2023
@bsieber-mozilla
Copy link
Contributor

An additional job (or the same) removes all contacts from the DB that have the deleted flag and no pending sync

For MoCo versus MoFo data retention policies--the deletes are a little different; some of the specifications can be seen in this document--the document/processes are currently being circulated for feedback: https://docs.google.com/document/d/1CHE3o7ZHprNBaN-2RvgApbI_8kpanJGdQ-ALZZZSRtE/edit#heading=h.u4hbegz4rvw3

@grahamalama
Copy link
Contributor

To be clear, this process we're describing is for the purposes of GDPR, correct?

On the Acoustic side, I'll think we'll want to make sure we're using the "GDPR Erasure" endpoint, rather than any sort of "DELETE" endpoint. In fact, I can't seem to find any docs that hint at a "Delete contact" endpoint.

@leplatrem
Copy link
Contributor Author

To be clear, this process we're describing is for the purposes of GDPR, correct?

Not only. Now that we have a DELETE endpoint on CTMS (#548), and since we don't have fine-grained permissions (#239), anyone that has access to the API can delete contacts. Once contacts are deleted, we lost track of them in Acoustic.

I can't seem to find any docs that hint at a "Delete contact" endpoint.

FWIW there is this beta API https://developer.goacoustic.com/acoustic-content/reference/post_authoring-v1-changes-delete

@grahamalama
Copy link
Contributor

@leplatrem It looks like that API endpoint is for Acoustic's CMS offering Content (API home), not Acoustic Campaign.

I did just find this though -- the XML API has a RemoveContact endpoint, which sounds like what we want.

@grahamalama
Copy link
Contributor

Currently, the pending_acoustic table relies on a FK relationship with the emails table. This gets us into trouble with deletes, since we can't mark a contact to be synced that doesn't exist.

Solution 3: Redesign sync queue

We could redesign the pending_acoustic queue so that it's decoupled from the emails table

Solution 4: add some new to_delete queue

Add an additional to_delete table so that only part of the syncing removes the FK constraint

Then, in either case, a request to DELETE /ctms/{email_id} would be:

  • Add the email_id to a queue to be deleted
  • Delete the contact from the CTMS database
  • Wait for the background sync to complete

We could also use this opportunity to reassess our approach for the background sync so that we improve our retry strategy and/or are louder about when things fail so we can perform manual intervention

@bsieber-mozilla
Copy link
Contributor

As for mofo_relevant=true contacts; there does require some data hygiene finesse.

We'll need some way to determine which contacts have donated within 4 years--within 4 years donation data can't be fully deleted. (4years is the requirement currently, but the ability to modify this was a desire.)

Getting the donation data, I imagine that's going to require a call to a MoFo Service (possibly SFNP)?

I think it could be time to rethink the background "job" for something more standard, be it just an external service or a DAG.

Aside:
We could leave the PendingAcoustic process as-is, provide the queries as APIs, and have the background job externalized from the CTMS deployments/etc.

@leplatrem
Copy link
Contributor Author

As for mofo_relevant=true contacts; there does require some data hygiene finesse.

We'll need some way to determine which contacts have donated within 4 years--within 4 years donation data can't be fully deleted. (4years is the requirement currently, but the ability to modify this was a desire.)

Currently, there's absolutely no finesse :) We are trying to find a solution to make sure contacts deleted from CTMS are also deleted from Acoustic.
I understand that what you describe here is part of the GDPR process already, and currently out of CTMS's scope.

I think it could be time to rethink the background "job" for something more standard, be it just an external service or a DAG.

Why not, but this is orthogonal to this issue, and would have to be detailed in an ADR or blueprint doc first IMO

We could leave the PendingAcoustic process as-is, provide the queries as APIs, and have the background job externalized from the CTMS deployments/etc.

What do you mean with provide the queries as APIs ?
What would be the pros/cons of have the background job externalized from the CTMS deployments?

Solution 4: add some new to_delete queue

I like that idea! It's pretty simple to implement and give us some leverage to retry etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants