diff --git a/images/image_harmonize-workflow.png b/images/image_harmonize-workflow.png new file mode 100644 index 0000000..adccb3a Binary files /dev/null and b/images/image_harmonize-workflow.png differ diff --git a/mod_wrangle.qmd b/mod_wrangle.qmd index 0ab50c7..d6ba80f 100644 --- a/mod_wrangle.qmd +++ b/mod_wrangle.qmd @@ -14,15 +14,33 @@ Now that we have covered how to find data and use data visualization methods to After completing this module you will be able to: -- X ... +- Identify typical steps in data harmonization and wrangling workflows +- Create a harmonization workflow +- +- +- -## Harmonizing Content +## Harmonizing Data +Data harmonization is an interesting topic in that it is _vital_ for synthesis projects but only very rarely relevant for primary research. Synthesis projects must reckon with the data choices made by each team of original data collectors. These collectors may or may not have recorded their judgement calls (or indeed, any metadata) but before synthesis work can be meaningfully done these independent datasets must be made comparable to one another and combined. +For tabular data, we recommend using the [`ltertools` R package](https://lter.github.io/ltertools/) to perform any needed harmonization. This package relies on a "column key" to translate the original column names into equivalents that apply across all datasets. Users can generate this column key however they would like but Google Sheets is a strong option as it allows multiple synthesis team members to simultaneously work on filling in the needed bits of the key. -## Wrangling Content +The column key requires three columns: +1. "source" -- Name of the raw file +2. "raw_name" -- Name of all raw columns in that file to be synonymized +3. "tidy_name" -- New name for each raw column that should be carried to the harmonized data + +Note that any raw names either not included in the column key or that lack a tidy name equivalent will be excluded from the final data object. For more information, consult the [`ltertools` package vignette](https://lter.github.io/ltertools/articles/ltertools.html). For convenience, we're attaching the visual diagram of this method of harmonization included in the `ltertools` vignette below. + +

+Four color-coded tables are in a soft rectangle. One is pulled out and its column names are replaced based on their respective 'tidy names' in the column key table. This is done for each of the other tables then the four tables--with fixed column names--are combined into a single data table +

+ +## Wrangling Content +Under Construction! Check back later ## Additional Resources