Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to AnVIL schema v4 (#4617) #4741

Merged

Conversation

nadove-ucsc
Copy link
Contributor

@nadove-ucsc nadove-ucsc commented Nov 15, 2022

Connected issues: #4617

Checklist

Author

  • Target branch is develop
  • Name of PR branch matches issues/<GitHub handle of author>/<issue#>-<slug>
  • PR title references all connected issues
  • PR title matches1 that of a connected issue or comment in PR explains why they're different
  • For each connected issue, there is at least one commit whose title references that issue
  • PR is connected to all connected issues via Zenhub
  • PR description links to connected issues
  • Added partial label to PR or this PR completely resolves all connected issues

1 when the issue title describes a problem, the corresponding PR title is Fix: followed by the issue title

Author (reindex)

  • Added r tag to commit title or this PR does not require reindexing
  • Added reindex label to PR or this PR does not require reindexing

Author (chains)

  • This PR is blocked by previous PR in the chain or this PR is not chained to another PR
  • Added base label to the blocking PR or this PR is not chained to another PR
  • Added chained label to this PR or this PR is not chained to another PR

Author (upgrading)

  • Documented upgrading of deployments in UPGRADING.rst or this PR does not require upgrading
  • Added u tag to commit title or this PR does not require upgrading
  • Added upgrade label to PR or this PR does not require upgrading

Author (operator tasks)

  • Added checklist items for additional operator tasks or this PR does not require additional tasks

Author (hotfixes)

  • Added F tag to main commit title or this PR does not include permanent fix for a temporary hotfix
  • Reverted the temporary hotfixes for any connected issues or the prod branch has no temporary hotfixes for any connected issues

Author (requirements, before every review)

  • Ran make requirements_update or this PR does not touch requirements*.txt, common.mk, Makefile and Dockerfile
  • Added R tag to commit title or this PR does not touch requirements*.txt
  • Added reqs label to PR or this PR does not touch requirements*.txt

Author (rebasing, integration test)

  • make integration_test passes in personal deployment or this PR does not touch functionality that could break the IT
  • Rebased PR branch on develop, squashed old fixups

Peer reviewer (after requesting changes)

Uncheck the Author (requirements) and Author (rebasing, integration test)
checklists.

Peer reviewer (after approval)

  • Ticket is in Review requested column
  • Requested review from primary reviewer
  • Assigned PR to primary reviewer

Primary reviewer (after requesting changes)

Uncheck the Author (requirements) and Author (rebasing, integration test)
checklists. Update the N reviews label.

Primary reviewer (after approval)

  • Actually approved the PR
  • Labeled connected issues as demo or no demo
  • Commented on connected issues about demo expectations or all connected issues are labeled no demo
  • Decided if PR can be labeled no sandbox
  • PR title is appropriate as title of merge commit
  • N reviews label is accurate
  • Moved ticket to Approved column
  • Assigned PR to current operator

Operator (before pushing merge the commit)

  • Checked reindex label and r commit title tag
  • Checked that demo expectations are clear or all connected issues are labeled no demo
  • Rebased and squashed PR branch
  • Sanity-checked history
  • Pushed PR branch to GitHub
  • Pushed PR branch to GitLab dev and added sandbox label or PR is labeled no sandbox
  • Build passes in sandbox deployment or PR is labeled no sandbox
  • Reviewed build log for anomalies in sandbox deployment or PR is labeled no sandbox
  • Deleted unreferenced indices in sandbox or this PR does not remove catalogs or otherwise causes unreferenced indices
  • Started reindex in sandbox or this PR does not require reindexing sandbox
  • Checked for failures in sandbox or this PR does not require reindexing sandbox
  • Pushed PR branch to GitLab anvildev or PR is labeled no sandbox
  • Build passes in anvilbox deployment or PR is labeled no sandbox
  • Reviewed build log for anomalies in anvilbox deployment or PR is labeled no sandbox
  • Deleted unreferenced indices in anvilbox or this PR does not remove catalogs or otherwise causes unreferenced indices
  • Started reindex in anvilbox or this PR does not require reindexing sandbox
  • Checked for failures in anvilbox or this PR does not require reindexing sandbox
  • Added PR reference to merge commit title
  • Collected commit title tags in merge commit title
  • Moved connected issues to Merged column
  • Pushed merge commit to GitHub

Operator (after pushing the merge commit)

  • Shortened the PR chain or this PR is not labeled base
  • Pushed merge commit to GitLab dev or PR is labeled no sandbox
  • Pushed merge commit to GitLab anvildev or PR is labeled no sandbox
  • Build passes on GitLab dev1
  • Reviewed build log for anomalies on GitLab dev1
  • Build passes on GitLab anvildev1
  • Reviewed build log for anomalies on GitLab anvildev1
  • Deleted PR branch from GitHub
  • Deleted PR branch from GitLab dev
  • Deleted PR branch from GitLab anvildev

1 When pushing the merge commit is skipped due to the PR being
labelled no sandbox, the next build triggered by a PR whose merge commit is
pushed determines this checklist item.

Operator (reindex)

  • Deleted unreferenced indices in dev or this PR does not remove catalogs or otherwise causes unreferenced indices
  • Deleted unreferenced indices in anvildev or this PR does not remove catalogs or otherwise causes unreferenced indices
  • Started reindex in dev or this PR does not require reindexing
  • Started reindex in anvildev or this PR does not require reindexing
  • Checked for and triaged indexing failures in dev or this PR does not require reindexing
  • Checked for and triaged indexing failures in anvildev or this PR does not require reindexing
  • Emptied fail queues in dev deployment or this PR does not require reindexing
  • Emptied fail queues in anvildev deployment or this PR does not require reindexing

Operator

  • Unassigned PR

Shorthand for review comments

  • L line is too long
  • W line wrapping is wrong
  • Q bad quotes
  • F other formatting problem

@nadove-ucsc nadove-ucsc added the reindex:dev [process] PR requires reindexing dev label Nov 15, 2022
@nadove-ucsc nadove-ucsc force-pushed the issues/noah-aviel-dove/4617-update-anvil-schema-v4 branch from 607c8fe to e951e22 Compare November 15, 2022 04:15
@github-actions github-actions bot added the orange [process] Done by the Azul team label Nov 15, 2022
@coveralls
Copy link

coveralls commented Nov 15, 2022

Coverage Status

Coverage decreased (-0.02%) to 84.21% when pulling 2196473 on issues/noah-aviel-dove/4617-update-anvil-schema-v4 into c0328ec on develop.

@codecov
Copy link

codecov bot commented Nov 15, 2022

Codecov Report

Merging #4741 (c0328ec) into develop (c0328ec) will not change coverage.
The diff coverage is n/a.

❗ Current head c0328ec differs from pull request most recent head 2196473. Consider uploading reports for the commit 2196473 to get more accurate results

@@           Coverage Diff            @@
##           develop    #4741   +/-   ##
========================================
  Coverage    83.74%   83.74%           
========================================
  Files          140      140           
  Lines        17069    17069           
========================================
  Hits         14295    14295           
  Misses        2774     2774           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@hannes-ucsc hannes-ucsc force-pushed the issues/noah-aviel-dove/4617-update-anvil-schema-v4 branch from e951e22 to f7a50b3 Compare November 17, 2022 01:56
@nadove-ucsc nadove-ucsc force-pushed the issues/noah-aviel-dove/4617-update-anvil-schema-v4 branch 6 times, most recently from aae2da1 to 275cf4b Compare November 23, 2022 22:28
@nadove-ucsc nadove-ucsc changed the base branch from develop to issues/noah-aviel-dove/4761-anvil-broken-file-links November 23, 2022 22:28
@nadove-ucsc nadove-ucsc added the chained [process] PR needs to based of develop before merging label Nov 23, 2022
@nadove-ucsc nadove-ucsc force-pushed the issues/noah-aviel-dove/4761-anvil-broken-file-links branch from d7b4f27 to 6e6b140 Compare November 29, 2022 00:47
@nadove-ucsc nadove-ucsc force-pushed the issues/noah-aviel-dove/4617-update-anvil-schema-v4 branch from 275cf4b to 18a0979 Compare November 29, 2022 23:45
@dsotirho-ucsc dsotirho-ucsc force-pushed the issues/noah-aviel-dove/4761-anvil-broken-file-links branch from b4fdf1b to 68777aa Compare November 30, 2022 23:53
@nadove-ucsc nadove-ucsc force-pushed the issues/noah-aviel-dove/4617-update-anvil-schema-v4 branch from 18a0979 to 2b1fa2f Compare December 1, 2022 00:01
@dsotirho-ucsc dsotirho-ucsc changed the base branch from issues/noah-aviel-dove/4761-anvil-broken-file-links to develop December 1, 2022 01:11
@dsotirho-ucsc dsotirho-ucsc removed the chained [process] PR needs to based of develop before merging label Dec 1, 2022
Copy link
Member

@achave11-ucsc achave11-ucsc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👌🏽

Copy link
Member

@hannes-ucsc hannes-ucsc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about the files names and provenance of the cans.

This comment sounds wrong to me:

Load a canned bundle from DCP/1 and write *.manifest.tdr and *.metadata.tdr

The deletion of .manifest.json and .metadata.json should be an early, separate commit if those files are truly redundant.

The .result and .results in the file names are confusing, very similar but different semantics. I am also confused as to what purpose the .tables file has.

And there's a smell with

def file_paths(parent_dir: str,

I think the code to create it is longer than a literal of the result, even though there is no advantage to making it dynamic since the references are very static.

Please request a PL slot for this.

@@ -59,8 +67,12 @@ def main(argv):
parser.add_argument('--output-dir', '-O',
default=os.path.join(config.project_root, 'test', 'indexer', 'data'),
help='The path to the output directory (default: %(default)s).')
parser.add_argument('--redaction-key', '-K',
help='Provide a key to redact personally senstive information from the output files')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help='Provide a key to redact personally senstive information from the output files')
help='Provide a key to redact confidential or sensitive information from the output files')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scripts/can_bundle.py Show resolved Hide resolved
o = struct.unpack('>Q', hashlib.sha1(key + str(o).encode()).digest()[:8])[0]
o = (o & 0xFFFFFFFFFFFF) + 42000000000000000
elif isinstance(o, list):
o[:] = sorted(redact_json(item, key) for item in o)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment should explain the need for sorting.

redact_json(entity, key)


def redact_json(o: AnyMutableJSON, key: bytes) -> AnyMutableJSON:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either return or mutate in place, never both. If a mix is needed for recursion, a wrapper should hide that. I personally find pure functions easier to write and reason about. I only resort non-pure for performance reasons, or when that is easier to understand, for example, when a surgical change is made. This is more of a transformation so I would write a pure function.

FYI @achave11

@hannes-ucsc hannes-ucsc added the 1 review [process] Lead requested changes once label Dec 5, 2022
@hannes-ucsc hannes-ucsc removed their assignment Dec 5, 2022
@@ -59,8 +67,12 @@ def main(argv):
parser.add_argument('--output-dir', '-O',
default=os.path.join(config.project_root, 'test', 'indexer', 'data'),
help='The path to the output directory (default: %(default)s).')
parser.add_argument('--redaction-key', '-K',
help='Provide a key to redact confidential or senstive information from the output files')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help='Provide a key to redact confidential or senstive information from the output files')
help='Provide a key to redact confidential or sensitive information from the output files')

@hannes-ucsc hannes-ucsc added 3 reviews [process] Lead requested changes thrice and removed 2 reviews [process] Lead requested changes twice labels Dec 6, 2022
@hannes-ucsc hannes-ucsc removed their assignment Dec 6, 2022
@nadove-ucsc nadove-ucsc force-pushed the issues/noah-aviel-dove/4617-update-anvil-schema-v4 branch 2 times, most recently from 88db91b to 5485050 Compare December 6, 2022 20:25
@achave11-ucsc achave11-ucsc force-pushed the issues/noah-aviel-dove/4617-update-anvil-schema-v4 branch from 88db91b to 2196473 Compare December 7, 2022 18:39
@achave11-ucsc achave11-ucsc added the sandbox [process] Resolution is being verified in sandbox deployment label Dec 7, 2022
@achave11-ucsc achave11-ucsc merged commit 05c743d into develop Dec 7, 2022
@achave11-ucsc achave11-ucsc deleted the issues/noah-aviel-dove/4617-update-anvil-schema-v4 branch December 8, 2022 00:41
@achave11-ucsc achave11-ucsc removed their assignment Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 reviews [process] Lead requested changes thrice orange [process] Done by the Azul team reindex:dev [process] PR requires reindexing dev sandbox [process] Resolution is being verified in sandbox deployment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants