Skip to content

Conversation

aim11
Copy link
Collaborator

@aim11 aim11 commented May 16, 2025

#27

@jdreo jdreo requested a review from njmmatthieu May 16, 2025 09:13
@mbaric758 mbaric758 self-requested a review May 16, 2025 10:15
Copy link
Collaborator

@mbaric758 mbaric758 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aim11,

Thanks for updating the validations with the descriptions and expected values! In some cases I still got some validation errors with the new data, so I put them in the comments for you to double-check.

Another thing we should maybe add in the validation files is a strict: true clause just under validate:, so that an error is raised in case some columns are not accounted for by the validation scripts, but present in the data.

nullable: true
checks:
str_matches: ^[0-9]{7,}(,[0-9]{7,})*$
str_matches: ^[0-9]{7,}(;[0-9]{7,})*$
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my side I'm getting an error here because of the semicolon? But the version with the colon ( ^[0-9]{7,}(,[0-9]{7,})*$ ) is valid.

- Likely Oncogenic
- Oncogenic
- Unknown
- known # Not present in source documentation description.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here known still comes out as a value present in the data.

dtype: str
description: "Describes the therapeutic implication that applies to the indication in cancer"
nullable: true
referenceGenome:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get an error here stating that referenceGenome is not present in the data frame.

# nullable: true # FIXME: Encountered point '.' values in data.
description: "ClinVar variant status"
nullable: true
clinvar_assoc:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also get an error here stating clinvar_assoc is not present in the data frame.

# nullable: true # FIXME: Encountered point '.' values in data.
description: "ClinVar databse ID"
nullable: true
clinvar_sig:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: clinvar_sig not present in data frame.

dtype: int64 # FIXME defined as 'float' in documentation.
dtype: float64
description: "Lower bound for expected homogeneity confidence interval"
hom_hi:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For hom_hi, I still get the type encountered is int64.

value:
- HET
- LOH
hom_lo:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as with hom_hi, for hom_lo I still get the encountered type is int64.

hom_pbinom_lo:
dtype: float64
description: "Binomial distribution probability for expected homogeneity"
homogenous:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I still get that the type encountered is object for homogenous and not boolean.

nMinor:
dtype: int64
description: "Minor allele copynumber (from CNAs if available)"
nMajor:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still get that there are some nullable values here for nMajor, and also that the values encountered are float64.

dtype: float64
description: "Protein change by RefGene"
nullable: true
nMinor:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For nMinor I also get that there are nullable values, and the type is a float64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants