Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trashless Wagtail #8310

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Trashless Wagtail #8310

wants to merge 13 commits into from

Conversation

willbarton
Copy link
Member

@willbarton willbarton commented Apr 12, 2024

We disabled Wagtail's built-in deletion back in 2016, for good reason at the time. In its place we created a "Trash" site in Wagtail and our approach was to move content to that site rather than delete it.

This PR does four things:

  • Restores the ability to delete pages in Wagtail, in 6ba1b74.

    We had disabled deletion in before_delete_page and before_bulk_action Wagtail hooks, and this commit removes those.

  • Prevents deletion of locked pages in Wagtail, in 87f6ec6.

    Bizarrely, locked pages are still deletable, an issue that's been open in Wagtail since 2015. This commit adds in before_delete_page and before_bulk_action Wagtail hooks that check whether a page is locked before proceeding with the deletion.

  • Exports a JSON dump of page data (limited to the data directly belonging to a page, not related data) to a configured location before deletion and enables importing of that data via the admin, in 6f0fe69.

    On the off chance that a page is accidentally deleted, want to be able to recover that page with the ease that restoring the entire database from backup is not.

    This commit uses the before_delete_page and before_bulk_action Wagtail hooks to connect a Django pre_delete signal handler to the Page model that will export the specific page's serializable_data. This JSON is intended to be reasonably human-readable so that an editor could reconstruct the page from it if needed.

    This commit also adds an Import button to Wagtail via standard hooks that consists of a view, template, and import function that will create a new page from exported page data with the click of a couple of buttons.

    There are significant limitations to the data that gets exported and imported: our recently added Wagtail footnotes is an example. Footnotes are stored as references in a separate table to the page. Because they're not stored on the page itself, they will not be exported, but the references on the page will be. This will break on import without editing the JSON data.

    The JSON export and import is hypothesized to be useful. If we find we are not using it, and that deletion is happening without the need to restore content with any significant frequency, we should revisit and remove this functionality.

  • Modifies the EmailSignUp snippet's privacy disclosure page field to SET_NULL when the destination page is deleted, in 189314f. This means if we delete a disclosure page that an email signup is using, it will remove that page from the email signup. An editor will need to set a new one.

How to test this PR

There are a few things to test:

Deletion and JSON export

  1. Set ARCHIVE_FILESYSTEM in the environment, i.e. export ARCHIVE_FILESYSTEM=/tmp/json_archives. Run cf.gov.
  2. Find a page and delete it, observe that deletion occurs.
  3. Find a page with child pages and delete it, observe that deletion occurs for all child pages.
  4. Select multiple pages and delete them, observe that deletion occurs.
  5. Visit the directory set in ARCHIVE_FILESYSTEM and observe that there are JSON files for all pages deleted, in their site slug paths.

Import a deleted page

  1. Choose a JSON file deleted above, and use the Import action in the Wagtail page explorer "…" menu to import it.

Locked pages

  1. Lock a page and attempt to delete it. Observe that deletion cannot occur.

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)

@willbarton willbarton force-pushed the experiment/bakery-export branch 5 times, most recently from e392897 to 14ce4e9 Compare August 7, 2024 14:08
@willbarton willbarton changed the title Bakery export experiment Trashless Wagtail Aug 7, 2024
@willbarton willbarton force-pushed the experiment/bakery-export branch 2 times, most recently from 85b1b44 to 6b6248e Compare August 13, 2024 13:03
@willbarton willbarton force-pushed the experiment/bakery-export branch 5 times, most recently from edc5da8 to e1816db Compare September 10, 2024 13:00
@willbarton willbarton force-pushed the experiment/bakery-export branch 2 times, most recently from 892c70b to e5921f0 Compare September 10, 2024 14:40
@willbarton willbarton marked this pull request as ready for review September 11, 2024 12:16
.env_SAMPLE Outdated Show resolved Hide resolved
cfgov/archival/apps.py Show resolved Hide resolved
cfgov/archival/forms.py Outdated Show resolved Hide resolved
cfgov/cfgov/settings/base.py Outdated Show resolved Hide resolved
cfgov/archival/utils.py Outdated Show resolved Hide resolved
cfgov/archival/utils.py Outdated Show resolved Hide resolved
cfgov/archival/utils.py Outdated Show resolved Hide resolved
cfgov/archival/wagtail_hooks.py Outdated Show resolved Hide resolved
cfgov/archival/wagtail_hooks.py Outdated Show resolved Hide resolved
def add_archival_signal_reciever(request, page):
from wagtail.models import Page

pre_delete.connect(archive_page_data_receiver, sender=Page)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these signal handlers only hooked up and disconnected at the moment the deletions happen, as opposed to on startup (in app ready(), for example)? Is there a downside to having them always hooked up?

Copy link
Member Author

@willbarton willbarton Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern was to control for deletion outside of an action taken in Wagtail. The practicality is, deletion during test runs triggers the signal handlers — which, could be fine, or there are other ways around it.


# Compare the last migration
last_migration = get_last_migration(app_label)
if last_migration != page_last_migration:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another potential issue with using migrations as a compatibility check is that we may introduce migrations to a different app that nonetheless impact the page model being imported. For example, if we modify v1.CFGOVPage, that'll generate a migration in the v1 app but not in the ask_cfpb app, even though the ask_cfpb.AnswerLandingPage model will be impacted.

Using migrations seems both not conservative enough (because it misses changes in other apps) but also potentially too conservative (a migration might be introduced in another model in the app, or to some non-breaking thing like help text). Is it too risky to remove this check entirely and just attempt the import, and (a) handle errors more nicely (as suggested above) and/or (b) let the user work out if things haven't been restored properly?

Copy link
Member Author

@willbarton willbarton Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

The problem this is an attempt to solve is that you can write anything to a StreamField as JSON. but that doesn't mean it matches the current schema as Wagtail understands it. So you could import data with fields that no longer exist, and then when dumped into the edit, that data is just gone.

Most of our content is in content fields, and if a descendent of CFGOVPage modifies its content field (I think they all do?), it has to redefine it, and will generate a migration for it. You're correct that if CFGOVPage.sidefoot changes, trying to import a page of a type that doesn't modify it won't have that change reflected.

I think there's a balance here between being feature-complete and usable enough to test the hypothesis that we'll need to restore pages this way. It's probably worth talking to @csebianlander and @sonnakim to discuss what the editor experience should be (we had talked about it, but I don't see it reflected in the partial GHE search results ATM).

My view was this was a good-enough-to-start-with way to do some validation that a page import would be okay.

We could also track migrations all the way down the inheritance chain, but I think at that point my preference would be to remove the migration check entirely and let users fend for themselves. Because

Is it too risky to remove this check entirely

I think the risk is very low either way. Ultimately this is to enable the hypothesis-testing: do we need the ability to restore deleted content?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants