Skip to content

Commit 6c294ec

Browse files
authored
Merge pull request nationalparkservice#135 from RobLBaker/main
expand test_missing_data acceptable missing data codes
2 parents 984cc6e + 7ea5143 commit 6c294ec

File tree

10 files changed

+70
-138
lines changed

10 files changed

+70
-138
lines changed

NEWS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
2024-02-05
44
* Fix bug in `test_date_range()` that was adding UTC to temporalCoverage
5+
* `test_missing_data()` now also handles the missing data codes "blank" and "empty".
56
* Update `test_missing_data()` to default to flag whole files, not each column that has undocumented missing data. This condenses the error output when running `run_congruence_checks()`. When trouble shooting and attempting to pinpoint data that lack missing values, `test_missing_data()` can be run with the parameter detail_level = "columns".
67
2024-01-26
78
* Bugfixes for `test_dates_parse()` and `test_date_range()`: now ignore files that have times but no dates or date times.

R/tabular_data_congruence.R

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -591,11 +591,11 @@ test_fields_match <- function(directory = here::here(), metadata = load_metadata
591591

592592
#' Looks for undocumented missing data (NAs)
593593
#'
594-
#' @description `test_missing_data` scans the data package for common missing data (blanks). If there are no blanks or if missing data coded as NA is documented as missing data in the metadata, the test passes. If missing data (blanks or NA) are found but not documented in the metadata the test fails with an error.
594+
#' @description `test_missing_data` scans the data package for common missing data (blanks/empty cells or NA in a cell). If there are no blanks or NAs, the test passes. If missing data are found and properly documented (missingValueCode is either "NA", "empty", or "blank"), the test passes. If any missing data is detected but not properly documented in the metadata, the test fails with an error.
595595
#'
596-
#' Commonly, R will interpret blank cells as missing and fill in NA. To pass this test, you will need to either delete columns with missing data (if they are completely blank) or add NA as a missing data code during metadata creation.
596+
#' Commonly, R will interpret blank cells as missing and fill in NA. To pass this test, you will need to either delete columns or tables with missing data (if they are completely blank), or add the appropriate as a missing data code during metadata creation (in the corresponding attributes.txt file).
597597
#'
598-
#' This is a fairly simple test and ONLY checks for NA. Although there are many common missing data codes (-99999, "Missing", "NaN" etc) we cannot anticipate all of them.
598+
#' This is a fairly simple test and ONLY checks for NA and blanks. Although there are many common missing data codes (-99999, "Missing", "NaN" etc) we cannot anticipate all of them.
599599
#'
600600
#' When running `test_missing_data()` via `run_congruence_checks()`, the default for "detail_level" will be used and only file-level information about undocumented missing values will be reported to condense the error message output. When attempting to identify specifically which data have undocumented missing values, it may be helpful to run `test_missing_data()` with the parameter "detail_level" set to "columns". This will output a list of all columns within each file with undocumented missing data.
601601
#'
@@ -638,6 +638,8 @@ test_missing_data <- function(directory = here::here(),
638638

639639
#load files and test for NAs
640640
error_log <- NULL
641+
#acceptable missing data codes if NA (or blank) cells found:
642+
missing_types <- c("NA", "blank", "empty")
641643
for (i in seq_along(data_files)) {
642644
#load each file
643645
dat <- suppressMessages(readr::read_csv(paste0(directory,
@@ -649,7 +651,7 @@ test_missing_data <- function(directory = here::here(),
649651
#look for NAs; if NAs found, look for correct missing data codes
650652
if (sum(is.na(dat[,j])) > 0) {
651653
missing <- data_tbl[[i]][["attributeList"]][["attribute"]][[j]][["missingValueCode"]][["code"]]
652-
if(is.null(missing) || ("NA" != missing)) {
654+
if(is.null(missing) || sum(missing != missing_types) < 1) {
653655
#file level error message output:
654656
if (detail_level == "files") {
655657
error_log <- append(error_log,

docs/news/index.html

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/pkgdown.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,5 @@ pkgdown: 2.0.7
33
pkgdown_sha: ~
44
articles:
55
DPchecker: DPchecker.html
6-
last_built: 2024-02-06T18:38Z
6+
last_built: 2024-02-06T19:16Z
77

0 commit comments

Comments
 (0)