-
Notifications
You must be signed in to change notification settings - Fork 3
Improvement/utapi 103/support reindex by account #1304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvement/utapi 103/support reindex by account #1304
Conversation
Hello tmacro,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Request integration branchesWaiting for integration branch creation to be requested by the user. To request integration branches, please comment on this pull request with the following command:
Alternatively, the |
/create_integration_branches |
Integration data createdI have created the integration data for the additional destination branches.
The following branches will NOT be impacted:
You can set option
The following options are set: create_integration_branches |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
The following options are set: create_integration_branches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some minor suggestions
lib/reindex/s3_bucketd.py
Outdated
_log.warning( | ||
"DryRun: resource buckets [%s] will be not updated with obj_count %i and total_size %i" % ( | ||
bucket, report['obj_count'], report['total_size'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest _log.info
, as it is a dry-run it is expected by the user.
Also thinking the formulation is somewhat odd or sounds negative and could be improved but I don't have a good substitution in mind at the moment. Thinking that the message should just convey what are the values that would be updated could work better (maybe just rewording as would be updated
could work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just another proposition:
DryRun: obj_count %i and total_size %i was calculated for resource bucket [%s]. The bucket has not been updated.
lib/reindex/s3_bucketd.py
Outdated
if options.dry_run: | ||
for userid, report in account_reports.items(): | ||
_log.warning( | ||
"DryRun: resource account [%s] will be not updated with obj_count %i and total_size %i" % ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here for the messaging and _log.info
lib/reindex/s3_bucketd.py
Outdated
def existing_file(path): | ||
path = Path(path).resolve() | ||
if not path.exists(): | ||
raise argparse.ArgumentTypeError("File does not exist") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be nice to show the path to the file (may not be obvious to the user which file doesn't exist)
lib/reindex/s3_bucketd.py
Outdated
# Break on the first matching bucket if a name is given | ||
break | ||
if names: | ||
seen_buckets.update(b.name for b in buckets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather do the update
with the bucket name just processed after buckets.append(bucket)
lib/reindex/s3_bucketd.py
Outdated
# Break if we have seen all the buckets we are looking for | ||
if all(b in seen_buckets for b in names): | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion: if we know that names
doesn't have duplicate bucket names (or alternatively we could ensure it doesn't), it should be enough to check that the size of the set is equal to len(names)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While fiddling with this I realized that with the addition of get_bucket_md
the names
param isn't used anywhere. I've simplified the function to just remove it.
lib/reindex/s3_bucketd.py
Outdated
if not options.bucket and not options.account: | ||
stale_buckets = recorded_buckets.difference(observed_buckets) | ||
elif options.bucket: | ||
stale_buckets = { b for b in options.bucket if b not in observed_buckets } | ||
elif options.account: | ||
_log.warning('Stale buckets will not be cleared when using the --account or --account-file flags') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be slightly reorganized for simplicity as
if options.bucket:
...
elif options.account:
...
else:
# neither bucket nor account
...
lib/reindex/s3_bucketd.py
Outdated
if options.account: | ||
for account in options.account: | ||
if account in failed_accounts: | ||
_log.error("No metrics updated for %s, one or more buckets failed" % account) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_log.error("No metrics updated for %s, one or more buckets failed" % account) | |
_log.error("No metrics updated for account %s, one or more buckets failed" % account) |
lib/reindex/s3_bucketd.py
Outdated
return parser.parse_args() | ||
group = parser.add_mutually_exclusive_group() | ||
group.add_argument("-b", "--bucket", default=[], help="bucket name", action="append", type=nonempty_string('bucket')) | ||
group.add_argument("--bucket-file", default=None, help="file containing bucket names", type=existing_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
group.add_argument("--bucket-file", default=None, help="file containing bucket names", type=existing_file) | |
group.add_argument("--bucket-file", default=None, help="file containing bucket names, one bucket name per line", type=existing_file) |
lib/reindex/s3_bucketd.py
Outdated
parser.add_argument("--dry-run", action="store_true", help="Do not update redis") | ||
group = parser.add_mutually_exclusive_group() | ||
group.add_argument("-a", "--account", default=[], help="account canonical ID (all account buckets will be processed)", action="append", type=nonempty_string('account')) | ||
group.add_argument("--account-file", default=None, help="file containing account canonical IDs", type=existing_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about giving a hint for the file format.
c14f39e
to
69b94c5
Compare
History mismatchMerge commit #f87a65065ad4e7ad325a556ff2042c39e9794ad8 on the integration branch It is likely due to a rebase of the branch Please use the The following options are set: create_integration_branches |
/reset |
Reset completeI have successfully deleted this pull request's integration branches. The following options are set: create_integration_branches |
Integration data createdI have created the integration data for the additional destination branches.
The following branches will NOT be impacted:
You can set option
The following options are set: create_integration_branches |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
The following options are set: create_integration_branches |
/approve |
I have successfully merged the changeset of this pull request
The following branches have NOT changed:
Please check the status of the associated issue UTAPI-103. Goodbye tmacro. The following options are set: approve, create_integration_branches |
This PR adds support for limiting reindex to particular account(s).
As part of the changes, rather than having
--bucket
and--account
differ in behavior, and to have a more consistent CLI, I've pulled in some of the requested features from S3C-8077 (support multiple buckets).I also added a
--dry-run
flag to skip redis updates to help both dev and field use.Modifies:
--bucket
: Can now be passed multiple times to reindex multiple bucketsAdds:
--account
: Limit reindex to an account canonical Id. Can be passed multiple times.--account-file
: Read canonical Ids from a file. 1 per line.--bucket-file
: Read bucket names from a file. 1 per line.--dry-run
: Skip updating redis.