-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Readd Pages Background Job #2396
Comments
Jotting down some more impl thoughts:
|
ikreymer
added a commit
that referenced
this issue
Feb 16, 2025
- ensure upload pages are always added with a new uuid, to avoid any duplicates with existing uploads, even if upload wacz is actually a crawl from different browsertrix instance, etc.. - cleanup upload names with slugify, which also replaces spaces, fixes uploading wacz filenames with spaces in them - part of fix for #2396
Merged
Related PR #2400 moves re-adding pages from background jobs into the migration to avoid later migrations running before pages have been re-added. |
ikreymer
added a commit
that referenced
this issue
Feb 17, 2025
- ensure upload pages are always added with a new uuid, to avoid any duplicates with existing uploads, even if upload wacz is actually a crawl from different browsertrix instance, etc.. - cleanup upload names with slugify, which also replaces spaces, fixes uploading wacz filenames with spaces in them - part of fix for #2396 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
ikreymer
added a commit
that referenced
this issue
Feb 18, 2025
Related to #2396 Changes to migration 0037: - Re-adds pages in migration rather than in background job to avoid race condition with later migrations - Re-adds pages for all uploads in all orgs Fix for readd pages for org: - Ensure org filter is applied! - Fix wrong type - Remove distinct, use iterator to iterate over crawls faster. --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The re-add pages background job should be updated in the following ways, to ensure all pages are in the right format for use with #2347.
Based on issues running migration with background jobs, here's a list of things that should be improved:
Handle duplicate pages somehow: if the same WACZ is uploaded multiple times, generate a unique id for the page and store page id in a pageId field?The text was updated successfully, but these errors were encountered: