Improvement/utapi 103/support reindex by account #1304

tmacro · 2024-06-06T20:06:38Z

This PR adds support for limiting reindex to particular account(s).
As part of the changes, rather than having --bucket and --account differ in behavior, and to have a more consistent CLI, I've pulled in some of the requested features from S3C-8077 (support multiple buckets).
I also added a --dry-run flag to skip redis updates to help both dev and field use.

Modifies:
--bucket: Can now be passed multiple times to reindex multiple buckets

Adds:
--account: Limit reindex to an account canonical Id. Can be passed multiple times.
--account-file: Read canonical Ids from a file. 1 per line.
--bucket-file: Read bucket names from a file. 1 per line.
--dry-run: Skip updating redis.

…ributes

…utes

bert-e · 2024-06-06T20:06:41Z

Hello tmacro,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options

name	description	privileged	authored
`/after_pull_request`	Wait for the given pull request id to be merged before continuing with the current one.
`/bypass_author_approval`	Bypass the pull request author's approval	⭐
`/bypass_build_status`	Bypass the build and test status	⭐
`/bypass_commit_size`	Bypass the check on the size of the changeset `TBA`	⭐
`/bypass_incompatible_branch`	Bypass the check on the source branch prefix	⭐
`/bypass_jira_check`	Bypass the Jira issue check	⭐
`/bypass_peer_approval`	Bypass the pull request peers' approval	⭐
`/bypass_leader_approval`	Bypass the pull request leaders' approval	⭐
`/approve`	Instruct Bert-E that the author has approved the pull request.		✍️
`/create_pull_requests`	Allow the creation of integration pull requests.
`/create_integration_branches`	Allow the creation of integration branches.
`/no_octopus`	Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
`/unanimity`	Change review acceptance criteria from `one reviewer at least` to `all reviewers`
`/wait`	Instruct Bert-E not to run until further notice.

Available commands

name	description	privileged
`/help`	Print Bert-E's manual in the pull request.
`/status`	Print Bert-E's current status in the pull request `TBA`
`/clear`	Remove all comments from Bert-E from the history `TBA`
`/retry`	Re-start a fresh build `TBA`
`/build`	Re-start a fresh build `TBA`
`/force_reset`	Delete integration branches & pull requests, and restart merge process from the beginning.
`/reset`	Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

bert-e · 2024-06-06T22:24:35Z

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

tmacro · 2024-06-06T22:28:24Z

/create_integration_branches

bert-e · 2024-06-06T22:28:33Z

Integration data created

I have created the integration data for the additional destination branches.

this pull request will merge improvement/UTAPI-103/support_reindex_by_account into
development/7.70
w/8.1/improvement/UTAPI-103/support_reindex_by_account will be merged into development/8.1

The following branches will NOT be impacted:

development/6.4
development/7.10
development/7.4

You can set option create_pull_requests if you need me to create
integration pull requests in addition to integration branches, with:

@bert-e create_pull_requests

The following options are set: create_integration_branches

bert-e · 2024-06-06T22:28:35Z

Waiting for approval

The following approvals are needed before I can proceed with the merge:

the author
2 peers

The following options are set: create_integration_branches

jonathan-gramain

LGTM with some minor suggestions

jonathan-gramain · 2024-06-12T01:35:10Z

lib/reindex/s3_bucketd.py

+                    _log.warning(
+                        "DryRun: resource buckets [%s] will be not updated with obj_count %i and total_size %i" % (
+                            bucket, report['obj_count'], report['total_size']


I would suggest _log.info, as it is a dry-run it is expected by the user.

Also thinking the formulation is somewhat odd or sounds negative and could be improved but I don't have a good substitution in mind at the moment. Thinking that the message should just convey what are the values that would be updated could work better (maybe just rewording as would be updated could work).

Just another proposition:

DryRun: obj_count %i and total_size %i was calculated for resource bucket [%s]. The bucket has not been updated.

jonathan-gramain · 2024-06-12T01:35:42Z

lib/reindex/s3_bucketd.py

+        if options.dry_run:
+            for userid, report in account_reports.items():
+                _log.warning(
+                    "DryRun: resource account [%s] will be not updated with obj_count %i and total_size %i" % (


same comment here for the messaging and _log.info

jonathan-gramain · 2024-06-12T01:38:24Z

lib/reindex/s3_bucketd.py

+def existing_file(path):
+    path = Path(path).resolve()
+    if not path.exists():
+        raise argparse.ArgumentTypeError("File does not exist")


Could be nice to show the path to the file (may not be obvious to the user which file doesn't exist)

jonathan-gramain · 2024-06-12T01:41:32Z

lib/reindex/s3_bucketd.py

-                    # Break on the first matching bucket if a name is given
-                    break
+                if names:
+                    seen_buckets.update(b.name for b in buckets)


I would rather do the update with the bucket name just processed after buckets.append(bucket)

jonathan-gramain · 2024-06-12T01:43:35Z

lib/reindex/s3_bucketd.py

+                    # Break if we have seen all the buckets we are looking for
+                    if all(b in seen_buckets for b in names):
+                        break


Minor suggestion: if we know that names doesn't have duplicate bucket names (or alternatively we could ensure it doesn't), it should be enough to check that the size of the set is equal to len(names).

While fiddling with this I realized that with the addition of get_bucket_md the names param isn't used anywhere. I've simplified the function to just remove it.

jonathan-gramain · 2024-06-12T01:47:01Z

lib/reindex/s3_bucketd.py

+    if not options.bucket and not options.account:
        stale_buckets = recorded_buckets.difference(observed_buckets)
    elif options.bucket:
        stale_buckets = { b for b in options.bucket if b not in observed_buckets }
+    elif options.account:
+        _log.warning('Stale buckets will not be cleared when using the --account or --account-file flags')


It might be slightly reorganized for simplicity as

if options.bucket: ... elif options.account: ... else: # neither bucket nor account ...

jonathan-gramain · 2024-06-12T01:48:28Z

lib/reindex/s3_bucketd.py

+        if options.account:
+            for account in options.account:
+                if account in failed_accounts:
+                    _log.error("No metrics updated for %s, one or more buckets failed" % account)


Suggested change

_log.error("No metrics updated for %s, one or more buckets failed" % account)

_log.error("No metrics updated for account %s, one or more buckets failed" % account)

dvasilas · 2024-06-12T12:55:42Z

lib/reindex/s3_bucketd.py

-    return parser.parse_args()
+    group = parser.add_mutually_exclusive_group()
+    group.add_argument("-b", "--bucket", default=[], help="bucket name", action="append", type=nonempty_string('bucket'))
+    group.add_argument("--bucket-file", default=None, help="file containing bucket names", type=existing_file)


Suggested change

group.add_argument("--bucket-file", default=None, help="file containing bucket names", type=existing_file)

group.add_argument("--bucket-file", default=None, help="file containing bucket names, one bucket name per line", type=existing_file)

dvasilas · 2024-06-12T13:40:41Z

lib/reindex/s3_bucketd.py

+    parser.add_argument("--dry-run", action="store_true", help="Do not update redis")
+    group = parser.add_mutually_exclusive_group()
+    group.add_argument("-a", "--account", default=[], help="account canonical ID (all account buckets will be processed)", action="append", type=nonempty_string('account'))
+    group.add_argument("--account-file", default=None, help="file containing account canonical IDs", type=existing_file)


Same comment about giving a hint for the file format.

…unt support

bert-e · 2024-06-12T18:27:48Z

History mismatch

Merge commit #f87a65065ad4e7ad325a556ff2042c39e9794ad8 on the integration branch
w/8.1/improvement/UTAPI-103/support_reindex_by_account is merging a branch which is neither the current
branch improvement/UTAPI-103/support_reindex_by_account nor the development branch
development/8.1.

It is likely due to a rebase of the branch improvement/UTAPI-103/support_reindex_by_account and the
merge is not possible until all related w/* branches are deleted or updated.

Please use the reset command to have me reinitialize these branches.

The following options are set: create_integration_branches

tmacro · 2024-06-12T18:27:54Z

/reset

bert-e · 2024-06-12T18:28:03Z

Reset complete

I have successfully deleted this pull request's integration branches.

The following options are set: create_integration_branches

bert-e · 2024-06-12T18:28:12Z

Integration data created

I have created the integration data for the additional destination branches.

this pull request will merge improvement/UTAPI-103/support_reindex_by_account into
development/7.70
w/8.1/improvement/UTAPI-103/support_reindex_by_account will be merged into development/8.1

The following branches will NOT be impacted:

development/6.4
development/7.10
development/7.4

You can set option create_pull_requests if you need me to create
integration pull requests in addition to integration branches, with:

@bert-e create_pull_requests

The following options are set: create_integration_branches

bert-e · 2024-06-12T18:28:13Z

Waiting for approval

The following approvals are needed before I can proceed with the merge:

the author
2 peers

The following options are set: create_integration_branches

tmacro · 2024-06-12T18:28:32Z

/approve

bert-e · 2024-06-12T18:32:36Z

I have successfully merged the changeset of this pull request
into targetted development branches:

✔️ development/7.70
✔️ development/8.1

The following branches have NOT changed:

development/6.4
development/7.10
development/7.4

Please check the status of the associated issue UTAPI-103.

Goodbye tmacro.

The following options are set: approve, create_integration_branches

tmacro added 3 commits June 6, 2024 12:06

impr(UTAPI-103): Add small LRU cache to BucketDClient._get_bucket_att…

402fd40

…ributes

impr(UTAPI-103): Add BucketNotFound Exeception for _get_bucket_attrib…

34af848

…utes

impr(UTAPI-103): Add BucketDClient.get_bucket_md()

210ba2f

scality deleted a comment from bert-e Jun 6, 2024

jonathan-gramain approved these changes Jun 12, 2024

View reviewed changes

dvasilas approved these changes Jun 12, 2024

View reviewed changes

tmacro added 4 commits June 12, 2024 11:27

impr(UTAPI-103): Add --dry-run option

5efb70d

impr(UTAPI-103): Support multiple specified buckets and prep for acco…

ee1c0fc

…unt support

impr(UTAPI-103): Support reindexing by acccount

f5262b7

impr(UTAPI-103): Remove undeclared variable from log message

69b94c5

tmacro force-pushed the improvement/UTAPI-103/support_reindex_by_account branch from c14f39e to 69b94c5 Compare June 12, 2024 18:27

bert-e merged commit 69b94c5 into development/7.70 Jun 12, 2024

bert-e deleted the improvement/UTAPI-103/support_reindex_by_account branch June 12, 2024 18:32

	_log.error("No metrics updated for %s, one or more buckets failed" % account)
	_log.error("No metrics updated for account %s, one or more buckets failed" % account)

	group.add_argument("--bucket-file", default=None, help="file containing bucket names", type=existing_file)
	group.add_argument("--bucket-file", default=None, help="file containing bucket names, one bucket name per line", type=existing_file)

Improvement/utapi 103/support reindex by account #1304

Improvement/utapi 103/support reindex by account #1304

Uh oh!

Conversation

tmacro commented Jun 6, 2024

Uh oh!

bert-e commented Jun 6, 2024

Hello tmacro,

Uh oh!

bert-e commented Jun 6, 2024

Request integration branches

Uh oh!

tmacro commented Jun 6, 2024

Uh oh!

bert-e commented Jun 6, 2024

Integration data created

Uh oh!

bert-e commented Jun 6, 2024

Waiting for approval

Uh oh!

jonathan-gramain left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bert-e commented Jun 12, 2024

History mismatch

Uh oh!

tmacro commented Jun 12, 2024

Uh oh!

bert-e commented Jun 12, 2024

Reset complete

Uh oh!

bert-e commented Jun 12, 2024

Integration data created

Uh oh!

bert-e commented Jun 12, 2024

Waiting for approval

Uh oh!

tmacro commented Jun 12, 2024

Uh oh!

bert-e commented Jun 12, 2024

Uh oh!

Uh oh!