Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADCID validation #29

Merged
merged 6 commits into from
Dec 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Documentation of release versions of `nacc-form-validator`

## 0.4.0

* Adds `_check_adcid` method to validate a provided ADCID against current list of ADCIDs. (Actual validation should be implemented by overriding the `is_valid_adcid` method in Datastore class)
* Adds `get_previous_record` method to grab previous record from Datastore, which can grab the previous record or the previous record where a specific field is non-empty
* Adds support for comparing against the previous record in `compare_with`
* Adds new rule `compare_age` to handle rules that need to compare ages relative to a date
Expand All @@ -19,7 +20,7 @@ Documentation of release versions of `nacc-form-validator`

## 0.3.0

* Adds `_check_with_rxnorm` function to check whether a given Drug ID is valid RXCUI code
* Adds `_check_with_rxnorm` function to check whether a given Drug ID is valid RXCUI code. (Actual validation should be implemented by overriding the `is_valid_rxcui` method in Datastore class)
* Updates `_validate_compare_with` to allow adjustments to be another field, and for base values to be hardcoded values
* Updates json_logic `less` function to handle None
* Updates `_validate_temporalrules` to iterate on multiple fields for `previous` and `current` clauses, remove `orderby` attribute
Expand Down
74 changes: 73 additions & 1 deletion docs/data-quality-rule-definition-guidelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
- [compatibility](#compatibility)
- [logic](#logic)
- [temporalrules](#temporalrules)
- [check\_adcid](#check_adcid)
- [compute\_gds](#compute_gds)
- [rxnorm](#rxnorm)

Expand Down Expand Up @@ -916,6 +917,77 @@ If field `taxes` (difficulty with taxes, business, and other papers) is 0 (norma
</tr>
</table>

### check_adcid

Used to check whether a specified ADCID is valid.

This validation is implemented using the `function` rule with custom `check_adcid` function in the NACCValidator. The rule definition should be in the following format:

```json
{
"<adcid_variable>": {
"function": {
"name": "check_adcid",
"args": {"own": "<bool, whether to validate against own ADCID or list of current ADCIDs; defaults to True>"}
}
}
}
```

> **NOTE**: To validate `check_adcid`, the validator should have a `Datastore` instance which implements the `is_valid_adcid` function (which should have access to center's ADCID and the list of current ADCIDs).

**Example:**

The `adcid` must match the center's own ADCID, whereas `oldadcid` should be a valid ADCID in the current ADCIDs list.

<table>
<tr>
<th>YAML Rule Definition</th>
<th>JSON Rule Definition</th>
<th>When Validating</th>
</tr>
<tr>
<td style="vertical-align:top;">
<pre><code>adcid:
type: integer
function:
name: check_adcid
oldadcid:
type: integer
function:
name: check_adcid
args:
own: false
</code></pre>
</td>
<td style="vertical-align:top;">
<pre><code>{
"adcid": {
"type": "integer",
"function": {
"name": "check_adcid"
}
},
"oldadcid": {
"type": "integer",
"function": {
"name": "check_adcid",
"args": {"own": False}
}
}
</code></pre>
</td>
<td style="vertical-align:top;">
<pre><code># assume the center's own ADCID is 0, and ADCIDs 0-5 inclusive are valid

{"adcid": 0, "oldadcid": 5} # passes
{"adcid": 2, "oldadcid": 5} # fails
{"adcid": 0, "oldadcid": 9} # fails
</code></pre>
</td>
</tr>
</table>

### compute_gds

Custom rule defined to validate the Geriatric Depression Scale (GDS) score computation. Only be used for validating the `gds` field in UDS Form B6.
Expand All @@ -934,7 +1006,7 @@ The rule definition for `compute_gds` should follow the following format:

Custom rule defined to check whether a given Drug ID is valid RXCUI code.

This function uses the `check_with` rule from Cerberus. Rule definition should be in the following format:
This function uses the `check_with` rule from Cerberus. The rule definition should be in the following format:

```json
{
Expand Down
12 changes: 8 additions & 4 deletions docs/validate_csv_records.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@
import logging

from pathlib import Path
from nacc_form_validator import QualityCheck

from nacc_form_validator.quality_check import QualityCheck

logging.basicConfig(level=logging.DEBUG)
log = logging.getLogger(__name__)
Expand Down Expand Up @@ -38,9 +39,11 @@
log.info(f"strict mode::\t{not args.disable_strict}")

if not args.rules_json.is_file():
raise FileNotFoundError(f"Cannot find specified rules JSON: {args.rules_json}")
raise FileNotFoundError(
f"Cannot find specified rules JSON: {args.rules_json}")
if not args.input_records_csv.is_file():
raise FileNotFoundError(f"Cannot find specified input records CSV: {args.input_records_csv}")
raise FileNotFoundError(
f"Cannot find specified input records CSV: {args.input_records_csv}")

"""
Instantiate the quality check object from rules JSON. This script assumes no datastore, and therefor
Expand Down Expand Up @@ -70,7 +73,8 @@
errors['row'] = i
all_errors.append(errors)
error_headers.update(set(errors.keys()))
log.warning(f"Row {i} in the input records CSV failed validation")
log.warning(
f"Row {i} in the input records CSV failed validation")

"""
Convert all_errors and error_headers to a "csv-like" dict for writing/printing out
Expand Down
2 changes: 1 addition & 1 deletion nacc_form_validator/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
from .quality_check import QualityCheck # noqa: F401

14 changes: 14 additions & 0 deletions nacc_form_validator/datastore.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,3 +84,17 @@ def is_valid_rxcui(self, drugid: int) -> bool:
bool: True if provided drug ID is valid, else False
"""
return False

@abstractmethod
def is_valid_adcid(self, adcid: int, own: bool) -> bool:
"""Abstract method to check whether a given ADCID is valid. Override
this method to implement ADCID validation.

Args:
adcid: provided ADCID
own: whether to check own ADCID or another center's ADCID

Returns:
bool: True if provided ADCID is valid, else False
"""
return False
47 changes: 9 additions & 38 deletions nacc_form_validator/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,11 @@
ValidationError,
)

from nacc_form_validator.keys import SchemaDefs

# pylint: disable=(too-few-public-methods)


class ErrorDefs:
"""Class to define custom errors."""

Expand Down Expand Up @@ -43,6 +46,8 @@ class ErrorDefs:
COMPARE_AGE = ErrorDefinition(0x3002, 'compare_age')
COMPARE_AGE_INVALID_COMPARISON = ErrorDefinition(0x3003, 'compare_age')
TEMPORAL_SWAPPED = ErrorDefinition(0x3004, 'temporalrules')
ADCID_NOT_MATCH = ErrorDefinition(0x3005, "function")
ADCID_NOT_VALID = ErrorDefinition(0x3006, "function")


class CustomErrorHandler(BasicErrorHandler):
Expand Down Expand Up @@ -124,6 +129,10 @@ def __set_custom_error_codes(self):
0x3004:
"{1} for if {3} in current visit then {2} " +
"in previous visit - temporal rule no: {0}",
0x3005:
"Provided ADCID {0} does not match your center's ADCID",
0x3006:
"Provided ADCID {0} is not in the valid list of ADCIDs",
}

self.messages.update(custom_errors)
Expand All @@ -145,41 +154,3 @@ def _format_message(self, field: str, error: ValidationError):
return field + ": " + error_msg

return super()._format_message(field, error)


class SchemaDefs:
"""Class to store schema attribute labels."""

TYPE = "type"
OP = "op"
IF_OP = "if_op"
THEN_OP = "then_op"
ELSE_OP = "else_op"
IF = "if"
THEN = "then"
ELSE = "else"
META = "meta"
ERRMSG = "errmsg"
ORDERBY = "orderby"
CONSTRAINTS = "constraints"
PREV_OP = "prev_op"
CURR_OP = "curr_op"
CURRENT = "current"
PREVIOUS = "previous"
CRR_DATE = "current_date"
CRR_YEAR = "current_year"
CRR_MONTH = "current_month"
CRR_DAY = "current_day"
PREV_RECORD = "previous_record"
FORMULA = "formula"
INDEX = "index"
FORMATTING = "formatting"
COMPARATOR = "comparator"
BASE = "base"
ADJUST = "adjustment"
IGNORE_EMPTY = "ignore_empty"
BIRTH_MONTH = 'birth_month'
BIRTH_DAY = 'birth_day'
BIRTH_YEAR = 'birth_year'
COMPARE_TO = "compare_to"
SWAP_ORDER = "swap_order"
41 changes: 41 additions & 0 deletions nacc_form_validator/keys.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
"""Module for commonly used keys."""


class SchemaDefs:
"""Class to store JSON schema attribute labels."""

TYPE = "type"
OP = "op"
IF_OP = "if_op"
THEN_OP = "then_op"
ELSE_OP = "else_op"
IF = "if"
THEN = "then"
ELSE = "else"
META = "meta"
ERRMSG = "errmsg"
ORDERBY = "orderby"
CONSTRAINTS = "constraints"
PREV_OP = "prev_op"
CURR_OP = "curr_op"
CURRENT = "current"
PREVIOUS = "previous"
CRR_DATE = "current_date"
CRR_YEAR = "current_year"
CRR_MONTH = "current_month"
CRR_DAY = "current_day"
PREV_RECORD = "previous_record"
FORMULA = "formula"
INDEX = "index"
FORMATTING = "formatting"
COMPARATOR = "comparator"
BASE = "base"
ADJUST = "adjustment"
IGNORE_EMPTY = "ignore_empty"
BIRTH_MONTH = 'birth_month'
BIRTH_DAY = 'birth_day'
BIRTH_YEAR = 'birth_year'
COMPARE_TO = "compare_to"
SWAP_ORDER = "swap_order"
FUNCTION_NAME = 'name'
FUNCTION_ARGS = 'args'
54 changes: 43 additions & 11 deletions nacc_form_validator/nacc_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,9 @@

from nacc_form_validator import utils
from nacc_form_validator.datastore import Datastore
from nacc_form_validator.errors import (
CustomErrorHandler,
ErrorDefs,
SchemaDefs,
)
from nacc_form_validator.errors import CustomErrorHandler, ErrorDefs
from nacc_form_validator.json_logic import jsonLogic
from nacc_form_validator.keys import SchemaDefs

log = logging.getLogger(__name__)

Expand Down Expand Up @@ -800,26 +797,36 @@ def _validate_logic(self, logic: Dict[str, Any], field: str,
except ValueError as error:
self._error(field, ErrorDefs.FORMULA, str(error))

def _validate_function(self, function: str, field: str, value: object):
def _validate_function(self, function: Dict[str, Any], field: str,
value: object):
"""Validate using a custom defined function.

Args:
function: Function name
function: Dict specifying function name and arguments
field: Variable name
value: Variable value

Note: Don't remove below docstring,
Cerberus uses it to validate the schema definition.

The rule's arguments are validated against this schema:
{'type': 'string', 'empty': False}
{
'type': 'dict',
'schema': {
'name': {'type': 'string', 'required': True, 'empty': False},
'args': {'type': 'dict', 'required': False}
}
}
"""

func = getattr(self, function, None)
function_name = '_' + \
function.get(SchemaDefs.FUNCTION_NAME, 'undefined')
func = getattr(self, function_name, None)
if func and callable(func):
func(value)
kwargs = function.get(SchemaDefs.FUNCTION_ARGS, {})
func(field, value, **kwargs)
else:
err_msg = f"{function} not defined in the validator module"
err_msg = f"{function_name} not defined in the validator module"
self.__add_system_error(field, err_msg)
raise ValidationException(err_msg)

Expand Down Expand Up @@ -997,6 +1004,9 @@ def _check_with_rxnorm(self, field: str, value: Optional[int]):
Args:
field: Variable name
value: Variable value

Raises:
ValidationException: If Datastore not set
"""

# No need to validate if blank or 0 (No RXCUI code available)
Expand Down Expand Up @@ -1106,3 +1116,25 @@ def _validate_compare_age(self, comparison: Dict[str, Any], field: str,
except TypeError as error:
self._error(field, ErrorDefs.COMPARE_AGE_INVALID_COMPARISON,
compare_field, field, age, str(error))

def _check_adcid(self, field: str, value: int, own: bool = True):
"""Check whether a given ADCID is valid.

Args:
field: name of ADCID field
value: ADCID value
own (optional): whether to check own ADCID or another center's ADCID.

Raises:
ValidationException: If Datastore not set
"""

if not self.datastore:
err_msg = "Datastore not set, cannot validate ADCID"
self.__add_system_error(field, err_msg)
raise ValidationException(err_msg)

if not self.datastore.is_valid_adcid(value, own):
self._error(
field, ErrorDefs.ADCID_NOT_MATCH
if own else ErrorDefs.ADCID_NOT_VALID, value)
7 changes: 2 additions & 5 deletions nacc_form_validator/quality_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,8 @@
from cerberus.schema import SchemaError

from nacc_form_validator.datastore import Datastore
from nacc_form_validator.nacc_validator import (
CustomErrorHandler,
NACCValidator,
ValidationException,
)
from nacc_form_validator.errors import CustomErrorHandler
from nacc_form_validator.nacc_validator import NACCValidator, ValidationException


class QualityCheckException(Exception):
Expand Down
Loading
Loading