-
Notifications
You must be signed in to change notification settings - Fork 6
completing descriptions and validation for genomic alteration annotations #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @aim11,
Thanks for updating the validations with the descriptions and expected values! In some cases I still got some validation errors with the new data, so I put them in the comments for you to double-check.
Another thing we should maybe add in the validation files is a strict: true
clause just under validate:
, so that an error is raised in case some columns are not accounted for by the validation scripts, but present in the data.
nullable: true | ||
checks: | ||
str_matches: ^[0-9]{7,}(,[0-9]{7,})*$ | ||
str_matches: ^[0-9]{7,}(;[0-9]{7,})*$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my side I'm getting an error here because of the semicolon? But the version with the colon ( ^[0-9]{7,}(,[0-9]{7,})*$
) is valid.
- Likely Oncogenic | ||
- Oncogenic | ||
- Unknown | ||
- known # Not present in source documentation description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here known
still comes out as a value present in the data.
dtype: str | ||
description: "Describes the therapeutic implication that applies to the indication in cancer" | ||
nullable: true | ||
referenceGenome: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get an error here stating that referenceGenome
is not present in the data frame.
# nullable: true # FIXME: Encountered point '.' values in data. | ||
description: "ClinVar variant status" | ||
nullable: true | ||
clinvar_assoc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also get an error here stating clinvar_assoc
is not present in the data frame.
# nullable: true # FIXME: Encountered point '.' values in data. | ||
description: "ClinVar databse ID" | ||
nullable: true | ||
clinvar_sig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: clinvar_sig
not present in data frame.
dtype: int64 # FIXME defined as 'float' in documentation. | ||
dtype: float64 | ||
description: "Lower bound for expected homogeneity confidence interval" | ||
hom_hi: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For hom_hi
, I still get the type encountered is int64
.
value: | ||
- HET | ||
- LOH | ||
hom_lo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as with hom_hi
, for hom_lo
I still get the encountered type is int64
.
hom_pbinom_lo: | ||
dtype: float64 | ||
description: "Binomial distribution probability for expected homogeneity" | ||
homogenous: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I still get that the type encountered is object
for homogenous
and not boolean
.
nMinor: | ||
dtype: int64 | ||
description: "Minor allele copynumber (from CNAs if available)" | ||
nMajor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still get that there are some nullable
values here for nMajor
, and also that the values encountered are float64
.
dtype: float64 | ||
description: "Protein change by RefGene" | ||
nullable: true | ||
nMinor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For nMinor
I also get that there are nullable
values, and the type is a float64
.
#27