-
Notifications
You must be signed in to change notification settings - Fork 69
Add script to migrate existing build results to Pulp #3509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
#! /usr/bin/python3 | ||
|
||
""" | ||
Migrate existing build results for a given project and all of its CoprDirs | ||
from one storage (Copr backend) to another (Pulp). | ||
""" | ||
|
||
import os | ||
import sys | ||
import argparse | ||
import logging | ||
from copr_common.log import setup_script_logger | ||
from copr_backend.helpers import BackendConfigReader | ||
from copr_backend.storage import PulpStorage | ||
|
||
|
||
STORAGES = ["backend", "pulp"] | ||
|
||
log = logging.getLogger(__name__) | ||
setup_script_logger(log, "/var/log/copr-backend/change-storage.log") | ||
|
||
|
||
def get_arg_parser(): | ||
|
||
""" | ||
CLI argument parser | ||
""" | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument( | ||
"--src", | ||
required=True, | ||
choices=STORAGES, | ||
help="The source storage", | ||
) | ||
parser.add_argument( | ||
"--dst", | ||
required=True, | ||
choices=STORAGES, | ||
help="The destination storage", | ||
) | ||
parser.add_argument( | ||
"--project", | ||
required=True, | ||
help="Full name of the project that is to be migrated", | ||
) | ||
parser.add_argument( | ||
"--delete", | ||
action="store_true", | ||
default=False, | ||
help="After migrating the data, remove it from the old storage", | ||
) | ||
return parser | ||
|
||
|
||
def is_valid_build_directory(name): | ||
""" | ||
See the `copr-backend-resultdir-cleaner`. We may want to share the code | ||
between them. | ||
""" | ||
if name in ["repodata", "devel"]: | ||
return False | ||
|
||
if name.startswith("repodata.old") or name.startswith(".repodata."): | ||
return False | ||
|
||
if name in ["tmp", "cache", "appdata"]: | ||
return False | ||
|
||
parts = name.split("-") | ||
if len(parts) <= 1: | ||
return False | ||
|
||
number = parts[0] | ||
if len(number) != 8 or any(not c.isdigit() for c in number): | ||
return False | ||
|
||
return True | ||
|
||
|
||
def main(): | ||
|
||
""" | ||
The main function | ||
""" | ||
parser = get_arg_parser() | ||
args = parser.parse_args() | ||
|
||
if args.src == args.dst: | ||
log.info("The source and destination storage is the same, nothing to do.") | ||
return | ||
|
||
if args.src == "pulp": | ||
log.error("Migration from pulp to somewhere else is not supported") | ||
sys.exit(1) | ||
|
||
if args.delete: | ||
log.error("Data removal is not supported yet") | ||
sys.exit(1) | ||
|
||
config_file = "/etc/copr/copr-be.conf" | ||
config = BackendConfigReader(config_file).read() | ||
owner, project = args.project.split("/") | ||
ownerdir = os.path.join(config.destdir, owner) | ||
|
||
for subproject in os.listdir(ownerdir): | ||
if not (subproject == project or subproject.startswith(project + ":")): | ||
continue | ||
|
||
coprdir = os.path.join(ownerdir, subproject) | ||
for chroot in os.listdir(coprdir): | ||
if chroot == "srpm-builds": | ||
continue | ||
|
||
chrootdir = os.path.join(coprdir, chroot) | ||
if not os.path.isdir(chrootdir): | ||
continue | ||
|
||
appstream = None | ||
devel = None | ||
storage = PulpStorage( | ||
owner, subproject, appstream, devel, config, log) | ||
|
||
for builddir in os.listdir(chrootdir): | ||
resultdir = os.path.join(chrootdir, builddir) | ||
if not os.path.isdir(resultdir): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure this check is enough..., maybe the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are right, that would cause problems. Updated. |
||
continue | ||
|
||
if not is_valid_build_directory(builddir): | ||
log.info("Skipping: %s", resultdir) | ||
continue | ||
|
||
# TODO Fault-tolerance and data consistency | ||
# Errors when creating things in Pulp will likely happen | ||
# (networking issues, unforseen Pulp validation, etc). We | ||
# should figure out how to ensure that all RPMs were | ||
# successfully uploaded, and if not, we know about it. | ||
# | ||
# We also need to make sure that no builds, actions, or cron, | ||
# are currently writing into the results directory. Otherwise | ||
# we can end up with incosystent data in Pulp. | ||
|
||
full_name = "{0}/{1}".format(owner, subproject) | ||
result = storage.init_project(full_name, chroot) | ||
if not result: | ||
log.error("Failed to initialize project: %s", resultdir) | ||
break | ||
|
||
# We cannot check return code here | ||
storage.upload_build_results(chroot, resultdir, None) | ||
|
||
result = storage.publish_repository(chroot) | ||
if not result: | ||
log.error("Failed to publish a repository: %s", resultdir) | ||
break | ||
|
||
log.info("OK: %s", resultdir) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suppose we can not make this in a transactional manner.. (if error happens, rollback). But would it be possible to first analyze the situation and gather the tasks that need to be done, fail if some problem happens, and only if no problems happen - start the processing? Also, I'm curious if whether we need a project lock (for building and other modification). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sooo, I am not really sure how helpful this would be. Gathering tasks beforehand would probably avoid issues like the script trying to access a directory it doesn't have permissions to and then failing. Or something like this. But I suppose the majority of failures that can/will happen are going to happen due to networking issues or something else when actually uploading things to Pulp. And having a calculated list of tasks wouldn't IMHO help. I would probably only remember or maybe pre-calculate the number of RPM files we are uploading and after everything is done, query Pulp to find out if we have the same number. Or maybe compare names of RPMs if we wanted to be more precise. If it doesn't match, we can either re-try several times, or just log it and manually review all failures.
That dumping a lockfile in this script would be easy but changing our build-related code, action code, cron jobs, etc to respect the lock, sounds like a bigger problem. If such a locking feature would be generally useful, then sure. But if the only purpose would be for the Pulp migration, I hope we could figure something easier. For initial migrations of test users, I think we would be fine with "please don't submit new builds until the migration is finished". And the mass migration of everything, will be done in batches. So maybe we can just put an ugly hack into our build/action scheduler to temporarily hide all jobs that fall in the currently migrated batch. |
||
|
||
|
||
if __name__ == "__main__": | ||
main() |
Uh oh!
There was an error while loading. Please reload this page.