Data quality rules validation module for NACC form data.
The validator is based on the Cerberus python library, which allows validating data using quality rules defined as data. See the Cerberus usage examples for more detail.
See the Usage doc for a quick-start usage guide with examples. In general, all documentation outside this README lives under docs
.
Before getting started, it is recommended to do your installations and work in a virtual environment. You can set one up with the following command:
# create; you want to use a Python version that matches the interpreter specified in pants.toml, which in this case is Python 3.11
python3.11 -m venv path/to/your/venv
# activate
source path/to/your/venv/bin/activate
# deactivate
deactivate
Next, you'll need to get the distribution for this package. The strategies to use the package defined in this repository are to
- clone the repository and build a distribution locally, or
- reference a distribution attached to a release on GitHub.
Once you have the distribution, you can install it with
pip3 install dist/nacc_form_validator-VERSION-py3-none-any.whl
This repository uses pants for developing and building the distributions.
Install pants with one of the following. See Installing Pants for more information.
For Linux:
bash get-pants.sh
For macOS:
brew install pantsbuild/tap/pants
You will need to make sure that you have a Python version compatible with the interpreter set in the pants.toml
file.
The repo has a VSCode devcontainer configuration that ensures a compatible Python is available. You need Docker installed, and VSCode with Dev Containers enabled. For this follow the Dev Containers tutorial to the point of "Check Installation".
To format and lint with pants, run:
pants fmt nacc_form_validator:: # fixes formatting
pants lint nacc_form_validator:: # run linter
To test with pants, run:
# use the --test-force flag to ignore the cache and force all tests to run
pants test ::
To package the distribution with pants, run:
pants package nacc_form_validator:dist
will then build sdist and wheel distributions in the dist
directory.
The version number on the distribution files is set in the
validator/BUILD
file.
If you do not have a Python version compatible with the interpreter set in the pants.toml
file, it will fail with something similar to the following when trying to build the distribution:
Examined the following interpreters:
1.) /opt/homebrew/Cellar/python@3.12/3.12.5/Frameworks/Python.framework/Versions/3.12/bin/python3.12 CPython==3.12.5
2.) /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/bin/python3.9 CPython==3.9.6
No interpreter compatible with the requested constraints was found:
Version matches CPython==3.11.*
As mentioned earlier, you can use a VSCode devcontainer configuration. Otherwise, you need to set up an environment (preferably a virtual one) with the correct Python version (in this case, Python 3.11).
On macOS, if you see a long error that ends with the following when trying to build the distribution:
(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64'))
make sure that the pants_version
in pants.toml
is >=2.22.0
.