You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the collection C(S, t) for a source S at time t, collection-level numbering can collide (and effectively lock) if a smaller collection C(S, 1) is restored from backup over a larger collection C(S, 2). This is a by-product of #5225.
Yikes. Probably not a regression but it is a nasty bug in backup/restore procedure when restoring over updated data.
I think one mitigating factor in practice is that the inaccessible reply from C(S,2) will be deleted by the orphaned submission nightly job. But that would also imply that any new submissions or replies since the backup would be nuked, so you'd always just be restoring only to the backup point anyway.
We could make that explicit by nuking the in-place data first or asking folks to do so. At the very least this would require a docs update to make the backup behaviour explicit.
(One reason we may not have encountered this in the wild is that folks may only be backing up after complete data loss, and to new instances, in which case it wouldn't be an issue.)
cfm
changed the title
per-source collection numbering can collide after restoration of a larger collection from backup
per-source collection numbering can collide after restoration of a larger collection from a smaller backup
Mar 6, 2024
I support formalizing the restore operation as a filesystem- as well as database-level equivalent of a strict DROP TABLE; CREATE TABLE; INSERT INTO, rather than the filesystem upsert it basically is now. Whether that strictness is immediate (enforced by the restore operation) or eventual (enforced by the nightly clean-up job) is, I agree, mostly an edge concern, as much as I'd like to reduce it to zero.
running the orphaned submission job as part of the restore operation seems doable. We'd need to be more intentional about quiescing the app first tho, otherwise there would be more edge cases to get snagged on.
Description
For the collection
C(S, t)
for a sourceS
at timet
, collection-level numbering can collide (and effectively lock) if a smaller collectionC(S, 1)
is restored from backup over a larger collectionC(S, 2)
. This is a by-product of #5225.Steps to Reproduce
During #7121:
S
and submit something.S
, resulting in collectionC(S, 1)
.C(S, 1)
.S
, resulting in collectionC(S, 2) > C(S, 1)
.S
.Expected Behavior
The reply is saved successfully.
Actual Behavior
HTTP
500
error with log messages:Comments
The reply from step (4) is not removed from disk when the backup (without it) is restored in step (5).
ansible.builtin.unarchive
does not appear to have an argument likersync --delete
.The text was updated successfully, but these errors were encountered: