Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust layout of multi-tenant groups bucket #562

Merged
merged 3 commits into from
Jun 24, 2022

Conversation

tsibley
Copy link
Member

@tsibley tsibley commented Jun 21, 2022

Description of proposed changes

The gist here is a behind-the-scenes layout change for the nextstrain-groups bucket to better separate objects within a group.

Currently:

nextstrain-groups/
  some-name/
    a_b.json
    x_y_z.json
    blah.md
    group-overview.md

With this:

nextstrain-groups/
  some-name/
    datasets/
      a_b.json
      x_y_z.json
    narratives/
      blah.md
    group-overview.md

This has been in the back of my mind for a while, but was recently reinvigorated/remotivated by discussion of issues with the current layout when describing #547.

See commit messages for full details.

Related issue(s)

Related to #478.

Testing

  • CI passes
  • Manual testing of the modified migrate-group script with test-private group (after "reverting" its migration)
  • Manual testing of the new migrate-groups-layout script against a private test bucket using objects copied from s3://nextstrain-groups/test-private/.

Deploy

to perform in order

  • Deploy Prepare for adjusting the layout of the multi-tenant groups bucket #563 all the way to production.
  • Run ./scripts/migrate-groups-layout --wet-run to do the initial copy.
  • Check CI passes for this PR.
  • Await review and address feedback.
  • Merge, wait for canary deploy.
  • Verify functionality on canary, then promote.
  • Run ./scripts/migrate-groups-layout --wet-run --delete-after-copy to pick up any intervening changes and do the final move.
  • Remove the now-unnecessary ./scripts/migrate-groups-layout file in a new commit.

@nextstrain-bot nextstrain-bot temporarily deployed to nextstrain-s-trs-groups-n5ma72 June 21, 2022 23:53 Inactive
@tsibley tsibley temporarily deployed to nextstrain-s-trs-groups-n5ma72 June 22, 2022 17:54 Inactive
@tsibley tsibley requested a review from a team June 22, 2022 17:54
@tsibley
Copy link
Member Author

tsibley commented Jun 22, 2022

Ah, CI is failing because it's looking for test fixtures in a public group (blab) that aren't found because I haven't done the initial copy to the new layout (see deploy steps above). It should be safe to do that ahead of time (by design!), so I'll do that and CI should pass.

@tsibley
Copy link
Member Author

tsibley commented Jun 22, 2022

Ah, actually, if I do that nothing will "break", but every migrated group's datasets and narratives will appear duplicated in the listings. I'll address that first with a separate PR to ignore those, in preparation for this PR.

@tsibley tsibley temporarily deployed to nextstrain-s-trs-groups-n5ma72 June 22, 2022 18:59 Inactive
@tsibley tsibley marked this pull request as draft June 22, 2022 19:02
…/scripts module

Two generic functions that I wrote with reusability in mind, and now I
have a need to re-use them.
The bucket now stores datasets and narratives separately under a fixed
prefix within each group's prefix so the user-provided names/paths of
the datasets and narratives can't conflict with control objects we store
at the root of the group's prefix.

Besides making security easier to reason about here with the
user-provided names, this will also, in time, allow us to eliminate a
few special-cased exclusions (like excluding group-overview.md from a
list of *.md narratives).

As part of deploying this, we'll need to perform a one-time manual
layout switch behind the scenes using scripts/migrate-groups-layout.
Though migrate-groups-layout is intended to be run only twice:

  1. Immediately before deploy of the layout change.
  2. Immediately after deploy of the layout change, with --delete-after-copy.

…it is designed to be safe to run repeatedly.  However, as it is a
single-use program, it's not designed to live past those two primary
invocations and we should delete it after it's served its purpose.
I'd been doing this manually while reviewing the previous sync logs, as
before the layout change the sync just migrated everything even though
some objects would be no longer accessible.  Nicer to have it built-in
as part of the rest of the manual steps checklist.
@tsibley tsibley temporarily deployed to nextstrain-s-trs-groups-n5ma72 June 22, 2022 23:15 Inactive
@tsibley tsibley marked this pull request as ready for review June 22, 2022 23:26
Copy link
Contributor

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't speak to the technical details of the changes here, but I love the idea of splitting out datasets from narratives and providing a first-class support for these types of data in the code. Even though I don't know this code base well, I can tell that these changes make the code more readable in that we can look for a Narrative class instead of scanning for *.md to figure out if an action is applied to a narrative or not.

@tsibley tsibley merged commit 9ce32e5 into master Jun 24, 2022
@tsibley tsibley deleted the trs/groups-storage-layout branch June 24, 2022 19:05
tsibley added a commit that referenced this pull request Jun 24, 2022
The migration is complete and this program is no longer needed.  It
worked great during the migration period, though, so good job to it!

Related to <#562>.
@tsibley
Copy link
Member Author

tsibley commented Jun 24, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

3 participants