Skip to content

Commit

Permalink
Drafted harmonization section of 'wrangle' module
Browse files Browse the repository at this point in the history
  • Loading branch information
njlyon0 committed Mar 1, 2024
1 parent f6cb9dc commit 86c3de7
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 3 deletions.
Binary file added images/image_harmonize-workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
24 changes: 21 additions & 3 deletions mod_wrangle.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,33 @@ Now that we have covered how to find data and use data visualization methods to

After completing this module you will be able to:

- <u>X</u> ...
- <u>Identify</u> typical steps in data harmonization and wrangling workflows
- <u>Create</u> a harmonization workflow
-
-
-

## Harmonizing Content
## Harmonizing Data

Data harmonization is an interesting topic in that it is _vital_ for synthesis projects but only very rarely relevant for primary research. Synthesis projects must reckon with the data choices made by each team of original data collectors. These collectors may or may not have recorded their judgement calls (or indeed, any metadata) but before synthesis work can be meaningfully done these independent datasets must be made comparable to one another and combined.

For tabular data, we recommend using the [`ltertools` R package](https://lter.github.io/ltertools/) to perform any needed harmonization. This package relies on a "column key" to translate the original column names into equivalents that apply across all datasets. Users can generate this column key however they would like but Google Sheets is a strong option as it allows multiple synthesis team members to simultaneously work on filling in the needed bits of the key.

## Wrangling Content
The column key requires three columns:

1. "source" -- Name of the raw file
2. "raw_name" -- Name of all raw columns in that file to be synonymized
3. "tidy_name" -- New name for each raw column that should be carried to the harmonized data

Note that any raw names either not included in the column key or that lack a tidy name equivalent will be excluded from the final data object. For more information, consult the [`ltertools` package vignette](https://lter.github.io/ltertools/articles/ltertools.html). For convenience, we're attaching the visual diagram of this method of harmonization included in the `ltertools` vignette below.

<p align="center">
<img src="images/image_harmonize-workflow.png" alt="Four color-coded tables are in a soft rectangle. One is pulled out and its column names are replaced based on their respective 'tidy names' in the column key table. This is done for each of the other tables then the four tables--with fixed column names--are combined into a single data table" width="90%">
</p>

## Wrangling Content

Under Construction! Check back later

## Additional Resources

Expand Down

0 comments on commit 86c3de7

Please sign in to comment.