-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to AnVIL schema v4 (#4617) #4741
Update to AnVIL schema v4 (#4617) #4741
Conversation
607c8fe
to
e951e22
Compare
Codecov Report
@@ Coverage Diff @@
## develop #4741 +/- ##
========================================
Coverage 83.74% 83.74%
========================================
Files 140 140
Lines 17069 17069
========================================
Hits 14295 14295
Misses 2774 2774 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
e951e22
to
f7a50b3
Compare
aae2da1
to
275cf4b
Compare
d7b4f27
to
6e6b140
Compare
275cf4b
to
18a0979
Compare
b4fdf1b
to
68777aa
Compare
18a0979
to
2b1fa2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👌🏽
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about the files names and provenance of the cans.
This comment sounds wrong to me:
azul/scripts/recan_bundle_tdr.py
Line 358 in 64ae91b
Load a canned bundle from DCP/1 and write *.manifest.tdr and *.metadata.tdr |
The deletion of .manifest.json
and .metadata.json
should be an early, separate commit if those files are truly redundant.
The .result
and .results
in the file names are confusing, very similar but different semantics. I am also confused as to what purpose the .tables
file has.
And there's a smell with
azul/scripts/recan_bundle_tdr.py
Line 77 in 64ae91b
def file_paths(parent_dir: str, |
I think the code to create it is longer than a literal of the result, even though there is no advantage to making it dynamic since the references are very static.
Please request a PL slot for this.
scripts/can_bundle.py
Outdated
@@ -59,8 +67,12 @@ def main(argv): | |||
parser.add_argument('--output-dir', '-O', | |||
default=os.path.join(config.project_root, 'test', 'indexer', 'data'), | |||
help='The path to the output directory (default: %(default)s).') | |||
parser.add_argument('--redaction-key', '-K', | |||
help='Provide a key to redact personally senstive information from the output files') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help='Provide a key to redact personally senstive information from the output files') | |
help='Provide a key to redact confidential or sensitive information from the output files') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scripts/can_bundle.py
Outdated
o = struct.unpack('>Q', hashlib.sha1(key + str(o).encode()).digest()[:8])[0] | ||
o = (o & 0xFFFFFFFFFFFF) + 42000000000000000 | ||
elif isinstance(o, list): | ||
o[:] = sorted(redact_json(item, key) for item in o) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment should explain the need for sorting.
scripts/can_bundle.py
Outdated
redact_json(entity, key) | ||
|
||
|
||
def redact_json(o: AnyMutableJSON, key: bytes) -> AnyMutableJSON: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either return or mutate in place, never both. If a mix is needed for recursion, a wrapper should hide that. I personally find pure functions easier to write and reason about. I only resort non-pure for performance reasons, or when that is easier to understand, for example, when a surgical change is made. This is more of a transformation so I would write a pure function.
FYI @achave11
scripts/can_bundle.py
Outdated
@@ -59,8 +67,12 @@ def main(argv): | |||
parser.add_argument('--output-dir', '-O', | |||
default=os.path.join(config.project_root, 'test', 'indexer', 'data'), | |||
help='The path to the output directory (default: %(default)s).') | |||
parser.add_argument('--redaction-key', '-K', | |||
help='Provide a key to redact confidential or senstive information from the output files') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help='Provide a key to redact confidential or senstive information from the output files') | |
help='Provide a key to redact confidential or sensitive information from the output files') |
88db91b
to
5485050
Compare
88db91b
to
2196473
Compare
Connected issues: #4617
Checklist
Author
develop
issues/<GitHub handle of author>/<issue#>-<slug>
partial
label to PR or this PR completely resolves all connected issues1 when the issue title describes a problem, the corresponding PR title is
Fix:
followed by the issue titleAuthor (reindex)
r
tag to commit title or this PR does not require reindexingreindex
label to PR or this PR does not require reindexingAuthor (chains)
base
label to the blocking PR or this PR is not chained to another PRchained
label to this PR or this PR is not chained to another PRAuthor (upgrading)
u
tag to commit title or this PR does not require upgradingupgrade
label to PR or this PR does not require upgradingAuthor (operator tasks)
Author (hotfixes)
F
tag to main commit title or this PR does not include permanent fix for a temporary hotfixprod
branch has no temporary hotfixes for any connected issuesAuthor (requirements, before every review)
make requirements_update
or this PR does not touch requirements*.txt, common.mk, Makefile and DockerfileR
tag to commit title or this PR does not touch requirements*.txtreqs
label to PR or this PR does not touch requirements*.txtAuthor (rebasing, integration test)
make integration_test
passes in personal deployment or this PR does not touch functionality that could break the ITdevelop
, squashed old fixupsPeer reviewer (after requesting changes)
Uncheck the Author (requirements) and Author (rebasing, integration test)
checklists.
Peer reviewer (after approval)
Primary reviewer (after requesting changes)
Uncheck the Author (requirements) and Author (rebasing, integration test)
checklists. Update the
N reviews
label.Primary reviewer (after approval)
demo
orno demo
no demo
no sandbox
N reviews
label is accurateOperator (before pushing merge the commit)
reindex
label andr
commit title tagno demo
dev
and addedsandbox
label or PR is labeledno sandbox
sandbox
deployment or PR is labeledno sandbox
sandbox
deployment or PR is labeledno sandbox
sandbox
or this PR does not remove catalogs or otherwise causes unreferenced indicessandbox
or this PR does not require reindexingsandbox
sandbox
or this PR does not require reindexingsandbox
anvildev
or PR is labeledno sandbox
anvilbox
deployment or PR is labeledno sandbox
anvilbox
deployment or PR is labeledno sandbox
anvilbox
or this PR does not remove catalogs or otherwise causes unreferenced indicesanvilbox
or this PR does not require reindexingsandbox
anvilbox
or this PR does not require reindexingsandbox
Operator (after pushing the merge commit)
base
dev
or PR is labeledno sandbox
anvildev
or PR is labeledno sandbox
dev
1dev
1anvildev
1anvildev
1dev
anvildev
1 When pushing the merge commit is skipped due to the PR being
labelled
no sandbox
, the next build triggered by a PR whose merge commit ispushed determines this checklist item.
Operator (reindex)
dev
or this PR does not remove catalogs or otherwise causes unreferenced indicesanvildev
or this PR does not remove catalogs or otherwise causes unreferenced indicesdev
or this PR does not require reindexinganvildev
or this PR does not require reindexingdev
or this PR does not require reindexinganvildev
or this PR does not require reindexingdev
deployment or this PR does not require reindexinganvildev
deployment or this PR does not require reindexingOperator
Shorthand for review comments
L
line is too longW
line wrapping is wrongQ
bad quotesF
other formatting problem