-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema State Version Migrations #287
Comments
Note the usage of The filter then ignores the migration directory. But if we move towards to having a |
@joshuakarp @tegefaulkes @scottmmorris @emmacasolin comments? |
Note the error when renaming a directory into an existing directory with stuff:
|
Note that I'm not using In this case it's just cleaner everything is done within the same node path. |
Backup and Restore specced out here: #288. It seems both of these issues should be worked out by 1 implementation. |
One issue with using One can imagine that It just seems a bit weird for Note that means |
With |
Specification
The state schema requires migrations once we go beyond state version of 1. Note that the
schema
domain is under development in https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/213 and will be available when that is merged.This issue is about the specification of the
Schema.upgradeVersion
method.This is the proposed structure:
schema/migrations
, create migration files named by the version they are upgrading to. So things like2.ts
for version 2,3.ts
for version 3.nodePath
. It is not the realnodePath
, but a node state that has been copied by the real node path. This means it includes all files in the node state.nodePath
is actually pointing to a temporary directory like~/.polykey/migrations.tmp
.Schema.upgradeVersion
and reported asErrorSchemaVersionMigrate
.schema/migrations/index.ts
. Make sure you do something like:upgradeVersion
should be updated to cycle over all migrations and execute them. And it has to also copy over the changed node state. We can attempt to do this atomically (somewhat) with 2 renames. Renaming.polykey
to.polykey.tmp
and then renaming the.polykey.tmp/migrations.tmp
to.polykey
and if successful delete the.polykey
(no-need to restart the PolykeyAgent program, as everything should work as normal). This however hits a problem involving the usage ofStatus
in which there is a lock covering it. Also on Windows, it doesn't appear that there are atomic guarantees. At any case the reason to do this is to keep the original state of.polykey
intact in.polykey.tmp
when this is occurring. And then then the new upgraded schema is in.polykey/migrations.tmp
, that has to be moved to.polykey
.Now here's a prototype of
upgradeVersion
.Regarding the
Status
. This is the only problem because a lock is held on it. We are able to copy files all over, because nothing should be operating while a migration is occuring, and this happens onPolykeyAgent.start
. However theStatus
is started earlier thanSchema
, which meansStatus
could cause problems when moving.polykey
to.polykey.tmp
. On Linux, the lock is held on the file descriptor and hence related to the inode. It's not related to the name on the filesystem. Which means even if I hold a flock on the file, I can move it. On Windows, this may not work.Another solution may be to consider the
Status
separately from the rest of the system. That would mean.polykey/status
should be separate from.polykey/state
.So then the atomic renaming option works against a subdirectory of
state
that containskeys
,db
... etc. This would also mean thatstatus
is kept static, and this works if there are other concurrent commands to dopk agent start
orpk bootstrap
as they would be blocked from interacting with the state as thestatus
is locked.Another alternative is not to do the renaming at all, but instead just use
fs.promises.cp
again to copy back over the.polykey
. But this doesn't give us the nice backup.So it seems that we would need to change our state structure:
It is essential that migrations do not corrupt the node state. And if there is a corruption, then it must not be silent. Silent corruptions are deadly! If we end up deleting the old
.polykey.tmp
there's no backup of the current polykey state. We may consider not to delete at all if there's a problem, and instead keep it around and tell the user to delete it if they are comfortable with everything. So maybe instead there needs to be a full diagnostic at the end of the migration before we automatically delete the.polykey.tmp
.In the context of backups, backups should be done from 1 PK node to another PK node. But if a fleet of PK nodes are all upgraded and there's a silent corruption, this can cause all backups to be dead. Therefore it's a good idea for us to build in some form of archival backup by "exporting" all PK secrets. The PK secret state is held in leveldb. An "exported" flat file might just be a big JSON file, or a big CSV file or a number of files that are archived up then encrypted.
Additional context
fs.cp
is in node 16, which is the new LTS. However this is not yet available due to nodejs: switch from 14.x to 16.x to keep up with the lts release NixOS/nixpkgs#142915, once this is merged we should update to pointpkgs.nix
to the latest masterTasks
pkgs.nix
and ensure that node16 worksSchema.upgradeVersion
to incorporate the new changesSchema
to usestatePath
, it will be assumed that it can write to the parent directory ofstatePath
. Which meansstatePath
is not allowed to be/
"root".The text was updated successfully, but these errors were encountered: