diff --git a/_freeze/mod_data-viz/execute-results/html.json b/_freeze/mod_data-viz/execute-results/html.json index 513c7eb..779593f 100644 --- a/_freeze/mod_data-viz/execute-results/html.json +++ b/_freeze/mod_data-viz/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "0c129111972e3330ba5f40042875640c", + "hash": "9edb204615b994ea90fb63cf2bff58d2", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Data Visualization & Exploration\"\ncode-annotations: hover\n---\n\n\n## Overview\n\nData visualization is a fundamental part of working with data. Visualization can be only used in the final stages of a project to make figures for publication but it can also be hugely valuable for quality control and hypothesis development processes. This module focuses on the fundamentals of graph creation in an effort to empower you to apply those methods in the various contexts where you might find visualization to be helpful.\n\n## Learning Objectives\n\nAfter completing this module you will be able to: \n\n- Define fundamental `ggplot2` vocabulary\n- Identify appropriate graph types for given data type/distribution\n- Discuss differences between presentation- and publication-quality graphs\n- Explain how your graphs can be made more accessible\n\n## Needed Packages\n\nIf you'd like to follow along with the code chunks included throughout this module, you'll need to install the following packages:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Note that these lines only need to be run once per computer\n## So you can skip this step if you've installed these before\ninstall.packages(\"tidyverse\")\ninstall.packages(\"lterdatasampler\")\ninstall.packages(\"supportR\")\ninstall.packages(\"cowplot\")\ninstall.packages(\"vegan\")\ninstall.packages(\"ape\")\n```\n:::\n\n\nWe'll go ahead and load some of these libraries as well to be able to better demonstrate these concepts.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load needed libraries\nlibrary(tidyverse)\n```\n:::\n\n\n## Graphing with `ggplot2`\n\n### `ggplot2` Fundamentals\n\nYou may already be familiar with the `ggplot2` package in R but if you are not, it is a popular graphing library based on [The Grammar of Graphics](https://bookshop.org/p/books/the-grammar-of-graphics-leland-wilkinson/1518348?ean=9780387245447). Every ggplot is composed of four elements:\n\n1. A 'core' `ggplot` function call\n2. Aesthetics\n3. Geometries\n4. Theme\n\nNote that the theme component may be implicit in some graphs because there is a suite of default theme elements that applies unless otherwise specified. \n\nThis module will use example data to demonstrate these tools but as we work through these topics you should feel free to substitute a dataset of your choosing! If you don't have one in mind, you can use the example dataset shown in the code chunks throughout this module. This dataset comes from the [`lterdatasampler` R package](https://lter.github.io/lterdatasampler/) and the data are about fiddler crabs (_Minuca pugnax_) at the [Plum Island Ecosystems (PIE) LTER](https://pie-lter.ecosystems.mbl.edu/welcome-plum-island-ecosystems-lter) site.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load the lterdatasampler package\nlibrary(lterdatasampler)\n\n# Load the fiddler crab dataset\ndata(pie_crab)\n\n# Check its structure\nstr(pie_crab)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\ntibble [392 × 9] (S3: tbl_df/tbl/data.frame)\n $ date : Date[1:392], format: \"2016-07-24\" \"2016-07-24\" ...\n $ latitude : num [1:392] 30 30 30 30 30 30 30 30 30 30 ...\n $ site : chr [1:392] \"GTM\" \"GTM\" \"GTM\" \"GTM\" ...\n $ size : num [1:392] 12.4 14.2 14.5 12.9 12.4 ...\n $ air_temp : num [1:392] 21.8 21.8 21.8 21.8 21.8 ...\n $ air_temp_sd : num [1:392] 6.39 6.39 6.39 6.39 6.39 ...\n $ water_temp : num [1:392] 24.5 24.5 24.5 24.5 24.5 ...\n $ water_temp_sd: num [1:392] 6.12 6.12 6.12 6.12 6.12 ...\n $ name : chr [1:392] \"Guana Tolomoto Matanzas NERR\" \"Guana Tolomoto Matanzas NERR\" \"Guana Tolomoto Matanzas NERR\" \"Guana Tolomoto Matanzas NERR\" ...\n```\n\n\n:::\n:::\n\n\nWith a dataset in hand, let's make a scatterplot of crab size on the Y-axis with latitude on the X. We'll forgo doing anything to the theme elements at this point to focus on the other three elements.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nggplot(data = pie_crab, mapping = aes(x = latitude, y = size, fill = site)) + # <1>\n geom_point(pch = 21, size = 2, alpha = 0.5) # <2>\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/gg-1-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. We're defining both the data and the X/Y aesthetics in this top-level bit of the plot. Also, note that each line ends with a plus sign\n2. Because we defined the data and aesthetics in the `ggplot()` function call above, this geometry can assume those mappings without re-specificying\n\nWe can improve on this graph by tweaking theme elements to make it use fewer of the default settings.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nggplot(data = pie_crab, mapping = aes(x = latitude, y = size, fill = site)) +\n geom_point(pch = 21, size = 2, alpha = 0.5) +\n theme(legend.title = element_blank(), # <1>\n panel.background = element_blank(),\n axis.line = element_line(color = \"black\"))\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/gg-2-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. All theme elements require these `element_...` helper functions. `element_blank` removes theme elements but otherwise you'll need to use the helper function that corresponds to the type of theme element (e.g., `element_text` for theme elements affecting graph text)\n\n### Multiple Geometries\n\nWe can further modify `ggplot2` graphs by adding _multiple_ geometries if you find it valuable to do so. Note however that geometry order matters! Geometries added later will be \"in front of\" those added earlier. Also, adding too much data to a plot will begin to make it difficult for others to understand the central take-away of the graph so you may want to be careful about the level of information density in each graph. Let's add boxplots behind the points to characterize the distribution of points more quantitatively.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nggplot(data = pie_crab, mapping = aes(x = latitude, y = size, fill = site)) +\n geom_boxplot(pch = 21) + # <1>\n geom_point(pch = 21, size = 2, alpha = 0.5) +\n theme(legend.title = element_blank(), \n panel.background = element_blank(),\n axis.line = element_line(color = \"black\"))\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/gg-3-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. By putting the boxplot geometry first we ensure that it doesn't cover up the points that overlap with the 'box' part of each boxplot\n\n:::{.callout-note icon=\"false\"}\n#### Activity: Graph Creation (P1)\n\nIn a script, attempt the following with one of either yours or your group's datasets:\n\n- Make a graph using `ggplot2`\n - Include at least one geometry\n - Include at least one aesthetic (beyond X/Y axes)\n - Modify at least one theme element from the default\n\n:::\n\n### Multiple Data Objects\n\n`ggplot2` also supports adding more than one data object to the same graph! While this module doesn't cover map creation, maps are a common example of a graph with more than one data object. Another common use would be to include both the full dataset and some summarized facet of it in the same plot.\n\nLet's calculate some summary statistics of crab size to include that in our plot.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load the supportR library\nlibrary(supportR)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'supportR'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:dplyr':\n\n count\n```\n\n\n:::\n\n```{.r .cell-code}\n# Summarize crab size within latitude groups\ncrab_summary <- supportR::summary_table(data = pie_crab, groups = c(\"site\", \"latitude\"),\n response = \"size\", drop_na = TRUE)\n\n# Check the structure\nstr(crab_summary)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n'data.frame':\t13 obs. of 6 variables:\n $ site : chr \"BC\" \"CC\" \"CT\" \"DB\" ...\n $ latitude : num 42.2 41.9 41.3 39.1 30 39.6 41.6 33.3 42.7 34.7 ...\n $ mean : num 16.2 16.8 14.7 15.6 12.4 ...\n $ std_dev : num 4.81 2.05 2.36 2.12 1.8 2.72 2.29 2.42 2.3 2.34 ...\n $ sample_size: int 37 27 33 30 28 30 29 30 28 25 ...\n $ std_error : num 0.79 0.39 0.41 0.39 0.34 0.5 0.43 0.44 0.43 0.47 ...\n```\n\n\n:::\n:::\n\n\nWith this data object in-hand, we can make a graph that includes both this and the original, unsummarized crab data. To better focus on the 'multiple data objects' bit of this example we'll pare down on the actual graph code.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nggplot() + # <1>\n geom_point(pie_crab, mapping = aes(x = latitude, y = size, fill = site),\n pch = 21, size = 2, alpha = 0.2) + \n geom_errorbar(crab_summary, mapping = aes(x = latitude, # <2>\n ymax = mean + std_error,\n ymin = mean - std_error),\n width = 0.2) +\n geom_point(crab_summary, mapping = aes(x = latitude, y = mean, fill = site),\n pch = 23, size = 3) + \n theme(legend.title = element_blank(),\n panel.background = element_blank(),\n axis.line = element_line(color = \"black\"))\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/gg-4-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. If you want multiple data objects in the same `ggplot2` graph you need to leave this top level `ggplot()` call _empty!_ Otherwise you'll get weird errors with aesthetics later in the graph\n2. This geometry adds the error bars and it's important that we add it before the summarized data points themselves if we want the error bars to be 'behind' their respective points\n\n:::{.callout-note icon=\"false\"}\n#### Activity: Graph Creation (P2)\n\nIn a script, attempt the following:\n\n- Add a second data object to the graph you made in the preceding activity\n - _Hint:_ If your first graph is unsummarized, add a summarized version (or vice versa)\n\n:::\n\n## Streamlining Graph Aesthetics\n\nSynthesis projects often generate an entire network of inter-related papers. Ensuring that all graphs across papers from a given team have a similar \"feel\" is a nice way of implying a certain standard of robustness for all of your group's projects. However, copy/pasting the theme elements of your graphs can (A) be cumbersome to do even once and (B) needs to be re-done every time you make a change anywhere. Fortunately, there is a better way!\n\n`ggplot2` supports adding theme elements to an object that can then be reused as needed elsewhere. This is the same theory behind wrapping repeated operations into custom functions.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Define core theme elements\ntheme_synthesis <- theme(legend.position = \"none\",\n panel.background = element_blank(),\n axis.line = element_line(color = \"black\"),\n axis.text = element_text(size = 13)) # <1>\n\n# Create a graph\nggplot(pie_crab, aes(y = water_temp, x = air_temp, color = size, size = size)) +\n geom_point() +\n theme_synthesis +\n theme(legend.position = \"right\") # <2>\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/std-theme-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. This theme element controls the text on the tick marks. `axis.title` controls the text in the _labels_ of the axes\n2. As a bonus, subsequent uses of `theme()` will replace defaults defined in your earlier theme object. So, you can design a set of theme elements that are _usually_ appropriate and then easily change just some of them as needed\n\n:::{.callout-note icon=\"false\"}\n#### Activity: Graph Creation (P3)\n\nIn a script, attempt the following:\n\n- Remove all theme edits from the graph you made in the preceding activity and assign them to a separate object\n - Then add that object to your graph\n- Make a second (different) graph and add your consolidated theme object to that graph as well\n\n:::\n\n## Multi-Panel Graphs\n\nIt is sometimes the case that you want to make a single graph file that has multiple panels. For many of us, we might default to creating the separate graphs that we want, exporting them, and then using software like Microsoft PowerPoint to stitch those panels into the single image we had in mind from the start. However, as all of us who have used this method know, this is hugely cumbersome when your advisor/committee/reviewers ask for edits and you now have to redo all of the manual work behind your multi-panel graph. \n\nFortunately, there are two nice entirely scripted alternatives that you might consider: **Faceted graphs** and **Plot grids**. See below for more information on both.\n\n:::{.panel-tabset}\n### Facets\n\nIn a faceted graph, every panel of the graph has the same aesthetics. These are often used when you want to show the relationship between two (or more) variables but separated by some other variable. In synthesis work, you might show the relationship between your core response and explanatory variables but facet by the original study. This would leave you with one panel per study where each would show the relationship only at that particular study.\n\nLet's check out an example.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nggplot(pie_crab, aes(x = date, y = size, color = site))+\n geom_point(size = 2) +\n facet_wrap(. ~ site) + # <1>\n theme_bw() +\n theme(legend.position = \"none\") # <2>\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/facet-1-1.png){fig-align='center' width=576}\n:::\n:::\n\n1. This is a `ggplot2` function that assumes you want panels laid out in a regular grid. There are other `facet_...` alternatives that let you specify row versus column arrangement. You could also facet by multiple variables by putting something to the left of the tilde\n2. We can remove the legend because the site names are in the facet titles in the gray boxes\n\n### Plot Grids\n\nIn a plot grid, each panel is completely independent of all others. These are often used in publications where you want to highlight several _different_ relationships that have some thematic connection. In synthesis work, your hypotheses may be more complicated than in primary research and such a plot grid would then be necessary to put all visual evidence for a hypothesis in the same location. On a practical note, plot grids are also a common way of circumventing figure number limits enforced by journals.\n\nLet's check out an example that relies on the `cowplot` library.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Load a needed library\nlibrary(cowplot)\n\n# Create the first graph\ncrab_p1 <- ggplot(pie_crab, aes(x = site, y = size, fill = site)) + # <1>\n geom_violin() +\n coord_flip() + # <2>\n theme_bw() +\n theme(legend.position = \"none\")\n\n# Create the second\ncrab_p2 <- ggplot(pie_crab, aes(x = air_temp, y = water_temp)) +\n geom_errorbar(aes(ymax = water_temp + water_temp_sd, ymin = water_temp - water_temp_sd),\n width = 0.1) +\n geom_errorbarh(aes(xmax = air_temp + air_temp_sd, xmin = air_temp - air_temp_sd), # <3>\n width = 0.1) +\n geom_point(aes(fill = site), pch = 23, size = 3) +\n theme_bw()\n\n# Assemble into a plot grid\ncowplot::plot_grid(crab_p1, crab_p2, labels = \"AUTO\", nrow = 1) # <4>\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/grid-1-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. Note that we're assigning these graphs to objects!\n2. This is a handy function for flipping X and Y axes without re-mapping the aesthetics\n3. This geometry is responsible for _horizontal_ error bars (note the \"h\" at the end of the function name)\n4. The `labels = \"AUTO\"` argument means that each panel of the plot grid gets the next sequential capital letter. You could also substitute that for a vector with labels of your choosing\n:::\n\n:::{.callout-note icon=\"false\"}\n#### Activity: Graph Creation (P4)\n\nIn a script, attempt the following:\n\n- Assemble the two graphs you made in the preceding two activities into the appropriate type of multi-panel graph\n\n:::\n\n## Accessibility Considerations\n\nAfter you've made the graphs you need, it is good practice to revisit them with to ensure that they are as accessible as possible. You can of course also do this during the graph construction process but it is sometimes less onerous to tackle as a penultimate step in the figure creation process. There are many facets to accessibility and we've tried to cover just a few of them below.\n\n### Color Choice\n\nOne of the more well-known facets of accessibility in data visualization is choosing colors that are \"colorblind safe\". Such palettes still create distinctive colors for those with various forms of color blindness (e.g., deuteranomoly, protanomaly, etc.). The classic red-green heatmap for instance is very colorblind unsafe in that people with some forms of colorblindness cannot distinguish between those colors (hence the rise of the yellow-blue heatmap in recent years). Unforunately, the `ggplot2` default rainbow palette--while nice for exploratory purposes--_is not_ colorlbind sfae.\n\nSome websites (such as [colorbewer2.org](https://colorbrewer2.org/#type=sequential&scheme=YlGnBu&n=9)) include a simple checkbox for colorblindness safety which automatically limits the listed options to those that are colorblind safe. Alternately, you could use a browser plug-in (such as [Let's get color blind](https://chromewebstore.google.com/detail/lets-get-color-blind/bkdgdianpkfahpkmphgehigalpighjck) on Google Chrome) to simulate colorblindness on a particular page.\n\nOne extreme approach you could take is to dodge this issue entirely and format your graphs such that color either isn't used at all or only conveys information that is also conveyed in another graph aesthetic. We don't necessarily recommend this as color--when the palette is chosen correctly--can be a really nice way of making information-dense graphs more informative and easily-navigable by viewers.\n\n### Multiple Modalities\n\nRelated to the color conversation is the value of mapping multiple aesthetics to the same variable. By presenting information in multiple ways--even if that seems redundant--you enable a wider audience to gain an intuitive sense of what you're trying to display.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nggplot(data = pie_crab, mapping = aes(x = latitude, y = size, \n fill = site, shape = site)) + # <1>\n geom_jitter(size = 2, width = 0.1, alpha = 0.6) + \n scale_shape_manual(values = c(21:25, 21:25, 21:23)) + # <2>\n theme_bw() +\n theme(legend.title = element_blank())\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/multi-modal-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. In this graph we're mapping both the fill and shape aesthetics to site\n2. This is a little cumbersome but there are only five 'fill-able' shapes in R so we need to reuse some of them to have a unique one for each site. Using fill-able shapes is nice because you get a crisp black border around each point. See `?pch` for all available shapes\n\nIn the above graph, even though the rainbow palette is not ideal for reasons mentioned earlier, it is now much easier to tell the difference between sites with similar colors. For instance, \"NB\", \"NIB\", and \"PIE\" are all shades of light blue/teal. Now that they have unique shapes it is dramatically easier to look at the graph and identify which points correspond to which site.\n\n\n:::{.callout-warning icon=\"false\"}\n#### Discussion: Graph Accessibility\n\nWith a group discuss (some of) the following questions:\n\n- What are other facets of accessibility that you think are important to consider when making data visualizations?\n- What changes do you make to your graphs to increase accessibility?\n - What changes _could_ you make going forward?\n\n:::\n\n\n### Presentation vs. Publication\n\nOne final element of accessibility to consider is the difference between a '_presentation_-quality' graph and a '_publication_-quality' one. While it may be tempting to create a single version of a given graph and use it in both contexts that is likely to be less effective in helping you to get your point across than making small tweaks to two separate versions of what is otherwise the same graph.\n\n:::{.panel-tabset}\n### Presentation-Focused\n\n**Do:**\n\n- Increase size of text/points **greatly**\n - If possible, sit in the back row of the room where you'll present and look at your graphs from there\n- _Consider_ adding graph elements that highlight certain graph regions\n- Present summarized data (increases focus on big-picture trends and avoids discussion of minutiae)\n- Map multiple aesthetics to the same variables\n\n**Don't:**\n\n- Use technical language / jargon\n- Include _unnecessary_ background elements\n- Use multi-panel graphs (either faceted or plot grid)\n - If you have multiple graph panels, put each on its own slide!\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nggplot(crab_summary, aes(x = latitude, y = mean, \n shape = reorder(site, latitude), # <1>\n fill = reorder(site, latitude))) +\n geom_vline(xintercept = 36.5, color = \"black\", linetype = 1) +\n geom_vline(xintercept = 41.5, color = \"black\", linetype = 2) + # <2>\n geom_errorbar(mapping = aes(ymax = mean + std_error, ymin = mean - std_error),\n width = 0.2) +\n geom_point(size = 4) + \n scale_shape_manual(values = c(21:25, 21:25, 21:23)) +\n labs(x = \"Latitude\", y = \"Mean Crab Size (mm)\") + # <3>\n theme(legend.title = element_blank(),\n axis.line = element_line(color = \"black\"),\n panel.background = element_blank(),\n axis.title = element_text(size = 17),\n axis.text = element_text(size = 15))\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/talk-graph-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. We can use the `reorder` function to make the order of sites in the legend (from top to bottom) match the order of sites in the graph (from left to right)\n2. Adding vertical lines at particular parts in the graph can make comparisons within the same graph easier\n3. `labs` lets us customize the title and label text of a graph\n\n### Publication-Focused\n\n**Do:**\n\n- Increase size of text/points **slightly**\n - You want to be legible but you can more safely assume that many readers will be able to increase the zoom of their browser window if needed\n- Present un-summarized data (with or without summarized points included)\n - Many reviewers will want to get a sense for the \"real\" data so you should include unsummarized values wherever possible\n- Use multi-panel graphs\n - If multiple graphs \"tell a story\" together, then they should be included in the same file!\n- Map multiple aesthetics to the same variables\n- If publishing in a journal available in print, check to make sure your graph still makes sense in grayscale\n - There are nice browser plug-ins (like [Grayscale the Web](https://chromewebstore.google.com/detail/grayscale-the-web-save-si/mblmpdpfppogibmoobibfannckeeleag) for Google Chrome) for this too\n\n**Don't:**\n\n- Include _unnecessary_ background elements\n- Add graph elements that highlight certain graph regions\n - You can--and should--lean more heavily on the text of your publication to discuss particular areas of a graph\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nggplot() +\n geom_point(pie_crab, mapping = aes(x = latitude, y = size,\n color = reorder(site, latitude)),\n pch = 19, size = 1, alpha = 0.3) +\n geom_errorbar(crab_summary, mapping = aes(x = latitude, y = mean, \n ymax = mean + std_error, \n ymin = mean - std_error),\n width = 0.2) +\n geom_point(crab_summary, mapping = aes(x = latitude, y = mean, \n shape = reorder(site, latitude),\n fill = reorder(site, latitude)),\n size = 4) +\n scale_shape_manual(values = c(21:25, 21:25, 21:23)) +\n labs(x = \"Latitude\", y = \"Mean Crab Carapace Width (mm)\") + # <1>\n theme(legend.title = element_blank(),\n axis.line = element_line(color = \"black\"),\n panel.background = element_blank(),\n axis.title = element_text(size = 15),\n axis.text = element_text(size = 13))\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/pub-graph-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. Here we are using a reasonable amount of technical language\n\n:::\n\n## Ordination\n\nIf you are working with multivariate data (i.e., data where multiple columns are all response variables collectively) you may find ordination helpful. Ordination is the general term for many types of multivariate visualization but typically is used to refer to visualizing a distance or dissimiliarity measure of the data. Such measures collapse all of those columns of response variables into fewer (typically two) index values that are easier to visualize.\n\nThis is a common approach particularly in answering questions in community ecology or considering a suite of traits (e.g., life history, landscape, etc.) together. While the math behind reducing the dimensionality of your data is interesting, this module is focused on only the visualization facet of ordination so we'll avoid deeper discussion of the internal mechanics that underpin ordination.\n\nIn order to demonstrate two types of ordination we'll use a lichen community composition dataset included in the `vegan` package. However, ordination approaches are most often used on data with multiple groups so we'll need to make a simulated grouping column to divide the lichen community data.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load library\nlibrary(vegan)\n\n# Grab data\nutils::data(\"varespec\", package = \"vegan\")\n\n# Create a faux group column\ntreatment <- c(rep.int(\"Treatment A\", nrow(varespec) / 2),\n rep.int(\"Treatment B\", nrow(varespec) / 2))\n\n# Combine into one dataframe\nlichen_df <- cbind(treatment, varespec)\n\n# Check structure of first few columns\nstr(lichen_df[1:5])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n'data.frame':\t24 obs. of 5 variables:\n $ treatment: chr \"Treatment A\" \"Treatment A\" \"Treatment A\" \"Treatment A\" ...\n $ Callvulg : num 0.55 0.67 0.1 0 0 ...\n $ Empenigr : num 11.13 0.17 1.55 15.13 12.68 ...\n $ Rhodtome : num 0 0 0 2.42 0 0 1.55 0 0.35 0.07 ...\n $ Vaccmyrt : num 0 0.35 0 5.92 0 ...\n```\n\n\n:::\n:::\n\n\n:::{.panel-tabset}\n\n### Metric Ordination\n\n\n\n\nCommon examples of this include Principal Components Analysis (PCA), , or Principal Coordinates Analysis (PCoA / \"metric multidimensional scaling\").\n\n\n### Non-Metric Ordination\n\nNon-metric ordinations are typically used when you care more about the relative differences among groups rather than specific measurements between particular points. For instance, you may want to assess whether the composition of insect communities differs between two experimental treatments. In such a case, your hypothesis likely depends more on the holistic difference between the treatments rather than some quantitative difference on one of the axes.\n\nThe most common non-metric ordination type is called Nonmetric Multidimensional Scaling (NMS / NMDS). This approach prioritizes making groups that are \"more different\" further apart than those that are less different. However, NMS uses a dissimilarity matrix which means that the _distance_ between any two specific points cannot be interpreted meaningfully. It _is_ appropriate though to interpret which cloud of points is closer to/further from another in aggregate. **If specific distances among points are of interest, consider a metric ordination approach.**\n\nIn order to perform an NMS ordination we'll first need to calculate a dissimilarity matrix for our response data. The vegan function `metaMDS` is useful for this. This function has many arguments but the most fundamental are the following:\n\n- `comm` = the dataframe of response variables (minus any non-numeric / grouping columns)\n- `distance` = the distance/dissimilarity metric to use\n - Note that there is no benefit to using a metric distance because when we make the ordination it will become non-metric\n- `k` = number of axes to decompose to -- typically two so the graph can be simple\n- `try` = number of attempts at minimizing \"stress\"\n - Stress is how NMS evaluates how good of a job it did at representing the true differences among groups (lower stress is better)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Get dissimilarity matrix\ndissim_mat <- vegan::metaMDS(comm = lichen_df[-1], distance = \"bray\", k = 2,\n autotransform = F, expand = F, try = 50)\n```\n:::\n\n\nWith that in hand, we can make our ordination! While you could make this step-by-step on your own, we'll use the `ordination` function from the `supportR` package for convenience. This function automatically uses colorblind safe colors for up to 10 groups and has some useful base plot defaults (as well as including ellipses around the standard deviation of the centorid of all groups)\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Load the library\nlibrary(supportR)\n\n# Make the ordination\nsupportR::ordination(mod = dissim_mat, grps = lichen_df$treatment, \n x = \"bottomright\", legend = c(\"A\", \"B\")) #<1>\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/nms-ord-1.png){fig-align='center' width=672}\n:::\n:::\n\n1. This function allows several base plot arguments to be supplied to alter non-critical plot elements (e.g., legend position, point size, etc.)\n\nIf the stress is less than 0.15 it is generally considered a good representation of the data. We can see that the ellipses do not overlap which indicates that the community composition of our two groups does seem to differ. We'd need to do real multivariate analysis if we wanted a _p_-value or AIC score to support that but as a visual tool this is still useful.\n\n:::\n\n## Maps\n\nYou may find it valuable to create a map as an additional way of visualizing data. Many synthesis groups do this--particularly when there is a strong spatial component to the research questions and/or hypotheses.\n\nCheck out the [bonus spatial data module](https://lter.github.io/ssecr/mod_spatial.html) for more information on map-making if this is of interest!\n\n## Additional Resources\n\n### Papers & Documents\n\n- NCEAS [Colorblind Safe Color Schemes](https://www.nceas.ucsb.edu/sites/default/files/2022-06/Colorblind%20Safe%20Color%20Schemes.pdf) reference document\n\n### Workshops & Courses\n\n- NCEAS Scientific Computing team's Coding in the Tidyverse workshop [`ggplot2` module](https://nceas.github.io/scicomp-workshop-tidyverse/visualize.html)\n- The Carpentries' Data Analysis and Visualization in R for Ecologists [`ggplot2` episode](https://datacarpentry.org/R-ecology-lesson/04-visualization-ggplot2.html)\n\n\n### Websites\n\n- \n", + "markdown": "---\ntitle: \"Data Visualization & Exploration\"\ncode-annotations: hover\n---\n\n\n## Overview\n\nData visualization is a fundamental part of working with data. Visualization can be only used in the final stages of a project to make figures for publication but it can also be hugely valuable for quality control and hypothesis development processes. This module focuses on the fundamentals of graph creation in an effort to empower you to apply those methods in the various contexts where you might find visualization to be helpful.\n\n## Learning Objectives\n\nAfter completing this module you will be able to: \n\n- Define fundamental `ggplot2` vocabulary\n- Identify appropriate graph types for given data type/distribution\n- Discuss differences between presentation- and publication-quality graphs\n- Explain how your graphs can be made more accessible\n\n## Needed Packages\n\nIf you'd like to follow along with the code chunks included throughout this module, you'll need to install the following packages:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Note that these lines only need to be run once per computer\n## So you can skip this step if you've installed these before\ninstall.packages(\"tidyverse\")\ninstall.packages(\"lterdatasampler\")\ninstall.packages(\"supportR\")\ninstall.packages(\"cowplot\")\ninstall.packages(\"vegan\")\ninstall.packages(\"ape\")\n```\n:::\n\n\nWe'll go ahead and load some of these libraries as well to be able to better demonstrate these concepts.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load needed libraries\nlibrary(tidyverse)\n```\n:::\n\n\n## Graphing with `ggplot2`\n\n### `ggplot2` Fundamentals\n\nYou may already be familiar with the `ggplot2` package in R but if you are not, it is a popular graphing library based on [The Grammar of Graphics](https://bookshop.org/p/books/the-grammar-of-graphics-leland-wilkinson/1518348?ean=9780387245447). Every ggplot is composed of four elements:\n\n1. A 'core' `ggplot` function call\n2. Aesthetics\n3. Geometries\n4. Theme\n\nNote that the theme component may be implicit in some graphs because there is a suite of default theme elements that applies unless otherwise specified. \n\nThis module will use example data to demonstrate these tools but as we work through these topics you should feel free to substitute a dataset of your choosing! If you don't have one in mind, you can use the example dataset shown in the code chunks throughout this module. This dataset comes from the [`lterdatasampler` R package](https://lter.github.io/lterdatasampler/) and the data are about fiddler crabs (_Minuca pugnax_) at the [Plum Island Ecosystems (PIE) LTER](https://pie-lter.ecosystems.mbl.edu/welcome-plum-island-ecosystems-lter) site.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load the lterdatasampler package\nlibrary(lterdatasampler)\n\n# Load the fiddler crab dataset\ndata(pie_crab)\n\n# Check its structure\nstr(pie_crab)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\ntibble [392 × 9] (S3: tbl_df/tbl/data.frame)\n $ date : Date[1:392], format: \"2016-07-24\" \"2016-07-24\" ...\n $ latitude : num [1:392] 30 30 30 30 30 30 30 30 30 30 ...\n $ site : chr [1:392] \"GTM\" \"GTM\" \"GTM\" \"GTM\" ...\n $ size : num [1:392] 12.4 14.2 14.5 12.9 12.4 ...\n $ air_temp : num [1:392] 21.8 21.8 21.8 21.8 21.8 ...\n $ air_temp_sd : num [1:392] 6.39 6.39 6.39 6.39 6.39 ...\n $ water_temp : num [1:392] 24.5 24.5 24.5 24.5 24.5 ...\n $ water_temp_sd: num [1:392] 6.12 6.12 6.12 6.12 6.12 ...\n $ name : chr [1:392] \"Guana Tolomoto Matanzas NERR\" \"Guana Tolomoto Matanzas NERR\" \"Guana Tolomoto Matanzas NERR\" \"Guana Tolomoto Matanzas NERR\" ...\n```\n\n\n:::\n:::\n\n\nWith a dataset in hand, let's make a scatterplot of crab size on the Y-axis with latitude on the X. We'll forgo doing anything to the theme elements at this point to focus on the other three elements.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nggplot(data = pie_crab, mapping = aes(x = latitude, y = size, fill = site)) + # <1>\n geom_point(pch = 21, size = 2, alpha = 0.5) # <2>\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/gg-1-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. We're defining both the data and the X/Y aesthetics in this top-level bit of the plot. Also, note that each line ends with a plus sign\n2. Because we defined the data and aesthetics in the `ggplot()` function call above, this geometry can assume those mappings without re-specificying\n\nWe can improve on this graph by tweaking theme elements to make it use fewer of the default settings.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nggplot(data = pie_crab, mapping = aes(x = latitude, y = size, fill = site)) +\n geom_point(pch = 21, size = 2, alpha = 0.5) +\n theme(legend.title = element_blank(), # <1>\n panel.background = element_blank(),\n axis.line = element_line(color = \"black\"))\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/gg-2-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. All theme elements require these `element_...` helper functions. `element_blank` removes theme elements but otherwise you'll need to use the helper function that corresponds to the type of theme element (e.g., `element_text` for theme elements affecting graph text)\n\n### Multiple Geometries\n\nWe can further modify `ggplot2` graphs by adding _multiple_ geometries if you find it valuable to do so. Note however that geometry order matters! Geometries added later will be \"in front of\" those added earlier. Also, adding too much data to a plot will begin to make it difficult for others to understand the central take-away of the graph so you may want to be careful about the level of information density in each graph. Let's add boxplots behind the points to characterize the distribution of points more quantitatively.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nggplot(data = pie_crab, mapping = aes(x = latitude, y = size, fill = site)) +\n geom_boxplot(pch = 21) + # <1>\n geom_point(pch = 21, size = 2, alpha = 0.5) +\n theme(legend.title = element_blank(), \n panel.background = element_blank(),\n axis.line = element_line(color = \"black\"))\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/gg-3-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. By putting the boxplot geometry first we ensure that it doesn't cover up the points that overlap with the 'box' part of each boxplot\n\n:::{.callout-note icon=\"false\"}\n#### Activity: Graph Creation (P1)\n\nIn a script, attempt the following with one of either yours or your group's datasets:\n\n- Make a graph using `ggplot2`\n - Include at least one geometry\n - Include at least one aesthetic (beyond X/Y axes)\n - Modify at least one theme element from the default\n\n:::\n\n### Multiple Data Objects\n\n`ggplot2` also supports adding more than one data object to the same graph! While this module doesn't cover map creation, maps are a common example of a graph with more than one data object. Another common use would be to include both the full dataset and some summarized facet of it in the same plot.\n\nLet's calculate some summary statistics of crab size to include that in our plot.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load the supportR library\nlibrary(supportR)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'supportR'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:dplyr':\n\n count\n```\n\n\n:::\n\n```{.r .cell-code}\n# Summarize crab size within latitude groups\ncrab_summary <- supportR::summary_table(data = pie_crab, groups = c(\"site\", \"latitude\"),\n response = \"size\", drop_na = TRUE)\n\n# Check the structure\nstr(crab_summary)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n'data.frame':\t13 obs. of 6 variables:\n $ site : chr \"BC\" \"CC\" \"CT\" \"DB\" ...\n $ latitude : num 42.2 41.9 41.3 39.1 30 39.6 41.6 33.3 42.7 34.7 ...\n $ mean : num 16.2 16.8 14.7 15.6 12.4 ...\n $ std_dev : num 4.81 2.05 2.36 2.12 1.8 2.72 2.29 2.42 2.3 2.34 ...\n $ sample_size: int 37 27 33 30 28 30 29 30 28 25 ...\n $ std_error : num 0.79 0.39 0.41 0.39 0.34 0.5 0.43 0.44 0.43 0.47 ...\n```\n\n\n:::\n:::\n\n\nWith this data object in-hand, we can make a graph that includes both this and the original, unsummarized crab data. To better focus on the 'multiple data objects' bit of this example we'll pare down on the actual graph code.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nggplot() + # <1>\n geom_point(pie_crab, mapping = aes(x = latitude, y = size, fill = site),\n pch = 21, size = 2, alpha = 0.2) + \n geom_errorbar(crab_summary, mapping = aes(x = latitude, # <2>\n ymax = mean + std_error,\n ymin = mean - std_error),\n width = 0.2) +\n geom_point(crab_summary, mapping = aes(x = latitude, y = mean, fill = site),\n pch = 23, size = 3) + \n theme(legend.title = element_blank(),\n panel.background = element_blank(),\n axis.line = element_line(color = \"black\"))\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/gg-4-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. If you want multiple data objects in the same `ggplot2` graph you need to leave this top level `ggplot()` call _empty!_ Otherwise you'll get weird errors with aesthetics later in the graph\n2. This geometry adds the error bars and it's important that we add it before the summarized data points themselves if we want the error bars to be 'behind' their respective points\n\n:::{.callout-note icon=\"false\"}\n#### Activity: Graph Creation (P2)\n\nIn a script, attempt the following:\n\n- Add a second data object to the graph you made in the preceding activity\n - _Hint:_ If your first graph is unsummarized, add a summarized version (or vice versa)\n\n:::\n\n## Streamlining Graph Aesthetics\n\nSynthesis projects often generate an entire network of inter-related papers. Ensuring that all graphs across papers from a given team have a similar \"feel\" is a nice way of implying a certain standard of robustness for all of your group's projects. However, copy/pasting the theme elements of your graphs can (A) be cumbersome to do even once and (B) needs to be re-done every time you make a change anywhere. Fortunately, there is a better way!\n\n`ggplot2` supports adding theme elements to an object that can then be reused as needed elsewhere. This is the same theory behind wrapping repeated operations into custom functions.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Define core theme elements\ntheme_synthesis <- theme(legend.position = \"none\",\n panel.background = element_blank(),\n axis.line = element_line(color = \"black\"),\n axis.text = element_text(size = 13)) # <1>\n\n# Create a graph\nggplot(pie_crab, aes(y = water_temp, x = air_temp, color = size, size = size)) +\n geom_point() +\n theme_synthesis +\n theme(legend.position = \"right\") # <2>\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/std-theme-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. This theme element controls the text on the tick marks. `axis.title` controls the text in the _labels_ of the axes\n2. As a bonus, subsequent uses of `theme()` will replace defaults defined in your earlier theme object. So, you can design a set of theme elements that are _usually_ appropriate and then easily change just some of them as needed\n\n:::{.callout-note icon=\"false\"}\n#### Activity: Graph Creation (P3)\n\nIn a script, attempt the following:\n\n- Remove all theme edits from the graph you made in the preceding activity and assign them to a separate object\n - Then add that object to your graph\n- Make a second (different) graph and add your consolidated theme object to that graph as well\n\n:::\n\n## Multi-Panel Graphs\n\nIt is sometimes the case that you want to make a single graph file that has multiple panels. For many of us, we might default to creating the separate graphs that we want, exporting them, and then using software like Microsoft PowerPoint to stitch those panels into the single image we had in mind from the start. However, as all of us who have used this method know, this is hugely cumbersome when your advisor/committee/reviewers ask for edits and you now have to redo all of the manual work behind your multi-panel graph. \n\nFortunately, there are two nice entirely scripted alternatives that you might consider: **Faceted graphs** and **Plot grids**. See below for more information on both.\n\n:::{.panel-tabset}\n### Facets\n\nIn a faceted graph, every panel of the graph has the same aesthetics. These are often used when you want to show the relationship between two (or more) variables but separated by some other variable. In synthesis work, you might show the relationship between your core response and explanatory variables but facet by the original study. This would leave you with one panel per study where each would show the relationship only at that particular study.\n\nLet's check out an example.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nggplot(pie_crab, aes(x = date, y = size, color = site))+\n geom_point(size = 2) +\n facet_wrap(. ~ site) + # <1>\n theme_bw() +\n theme(legend.position = \"none\") # <2>\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/facet-1-1.png){fig-align='center' width=576}\n:::\n:::\n\n1. This is a `ggplot2` function that assumes you want panels laid out in a regular grid. There are other `facet_...` alternatives that let you specify row versus column arrangement. You could also facet by multiple variables by putting something to the left of the tilde\n2. We can remove the legend because the site names are in the facet titles in the gray boxes\n\n### Plot Grids\n\nIn a plot grid, each panel is completely independent of all others. These are often used in publications where you want to highlight several _different_ relationships that have some thematic connection. In synthesis work, your hypotheses may be more complicated than in primary research and such a plot grid would then be necessary to put all visual evidence for a hypothesis in the same location. On a practical note, plot grids are also a common way of circumventing figure number limits enforced by journals.\n\nLet's check out an example that relies on the `cowplot` library.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Load a needed library\nlibrary(cowplot)\n\n# Create the first graph\ncrab_p1 <- ggplot(pie_crab, aes(x = site, y = size, fill = site)) + # <1>\n geom_violin() +\n coord_flip() + # <2>\n theme_bw() +\n theme(legend.position = \"none\")\n\n# Create the second\ncrab_p2 <- ggplot(pie_crab, aes(x = air_temp, y = water_temp)) +\n geom_errorbar(aes(ymax = water_temp + water_temp_sd, ymin = water_temp - water_temp_sd),\n width = 0.1) +\n geom_errorbarh(aes(xmax = air_temp + air_temp_sd, xmin = air_temp - air_temp_sd), # <3>\n width = 0.1) +\n geom_point(aes(fill = site), pch = 23, size = 3) +\n theme_bw()\n\n# Assemble into a plot grid\ncowplot::plot_grid(crab_p1, crab_p2, labels = \"AUTO\", nrow = 1) # <4>\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/grid-1-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. Note that we're assigning these graphs to objects!\n2. This is a handy function for flipping X and Y axes without re-mapping the aesthetics\n3. This geometry is responsible for _horizontal_ error bars (note the \"h\" at the end of the function name)\n4. The `labels = \"AUTO\"` argument means that each panel of the plot grid gets the next sequential capital letter. You could also substitute that for a vector with labels of your choosing\n:::\n\n:::{.callout-note icon=\"false\"}\n#### Activity: Graph Creation (P4)\n\nIn a script, attempt the following:\n\n- Assemble the two graphs you made in the preceding two activities into the appropriate type of multi-panel graph\n\n:::\n\n## Accessibility Considerations\n\nAfter you've made the graphs you need, it is good practice to revisit them with to ensure that they are as accessible as possible. You can of course also do this during the graph construction process but it is sometimes less onerous to tackle as a penultimate step in the figure creation process. There are many facets to accessibility and we've tried to cover just a few of them below.\n\n### Color Choice\n\nOne of the more well-known facets of accessibility in data visualization is choosing colors that are \"colorblind safe\". Such palettes still create distinctive colors for those with various forms of color blindness (e.g., deuteranomoly, protanomaly, etc.). The classic red-green heatmap for instance is very colorblind unsafe in that people with some forms of colorblindness cannot distinguish between those colors (hence the rise of the yellow-blue heatmap in recent years). Unforunately, the `ggplot2` default rainbow palette--while nice for exploratory purposes--_is not_ colorlbind sfae.\n\nSome websites (such as [colorbewer2.org](https://colorbrewer2.org/#type=sequential&scheme=YlGnBu&n=9)) include a simple checkbox for colorblindness safety which automatically limits the listed options to those that are colorblind safe. Alternately, you could use a browser plug-in (such as [Let's get color blind](https://chromewebstore.google.com/detail/lets-get-color-blind/bkdgdianpkfahpkmphgehigalpighjck) on Google Chrome) to simulate colorblindness on a particular page.\n\nOne extreme approach you could take is to dodge this issue entirely and format your graphs such that color either isn't used at all or only conveys information that is also conveyed in another graph aesthetic. We don't necessarily recommend this as color--when the palette is chosen correctly--can be a really nice way of making information-dense graphs more informative and easily-navigable by viewers.\n\n### Multiple Modalities\n\nRelated to the color conversation is the value of mapping multiple aesthetics to the same variable. By presenting information in multiple ways--even if that seems redundant--you enable a wider audience to gain an intuitive sense of what you're trying to display.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nggplot(data = pie_crab, mapping = aes(x = latitude, y = size, \n fill = site, shape = site)) + # <1>\n geom_jitter(size = 2, width = 0.1, alpha = 0.6) + \n scale_shape_manual(values = c(21:25, 21:25, 21:23)) + # <2>\n theme_bw() +\n theme(legend.title = element_blank())\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/multi-modal-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. In this graph we're mapping both the fill and shape aesthetics to site\n2. This is a little cumbersome but there are only five 'fill-able' shapes in R so we need to reuse some of them to have a unique one for each site. Using fill-able shapes is nice because you get a crisp black border around each point. See `?pch` for all available shapes\n\nIn the above graph, even though the rainbow palette is not ideal for reasons mentioned earlier, it is now much easier to tell the difference between sites with similar colors. For instance, \"NB\", \"NIB\", and \"PIE\" are all shades of light blue/teal. Now that they have unique shapes it is dramatically easier to look at the graph and identify which points correspond to which site.\n\n\n:::{.callout-warning icon=\"false\"}\n#### Discussion: Graph Accessibility\n\nWith a group discuss (some of) the following questions:\n\n- What are other facets of accessibility that you think are important to consider when making data visualizations?\n- What changes do you make to your graphs to increase accessibility?\n - What changes _could_ you make going forward?\n\n:::\n\n\n### Presentation vs. Publication\n\nOne final element of accessibility to consider is the difference between a '_presentation_-quality' graph and a '_publication_-quality' one. While it may be tempting to create a single version of a given graph and use it in both contexts that is likely to be less effective in helping you to get your point across than making small tweaks to two separate versions of what is otherwise the same graph.\n\n:::{.panel-tabset}\n### Presentation-Focused\n\n**Do:**\n\n- Increase size of text/points **greatly**\n - If possible, sit in the back row of the room where you'll present and look at your graphs from there\n- _Consider_ adding graph elements that highlight certain graph regions\n- Present summarized data (increases focus on big-picture trends and avoids discussion of minutiae)\n- Map multiple aesthetics to the same variables\n\n**Don't:**\n\n- Use technical language / jargon\n- Include _unnecessary_ background elements\n- Use multi-panel graphs (either faceted or plot grid)\n - If you have multiple graph panels, put each on its own slide!\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nggplot(crab_summary, aes(x = latitude, y = mean, \n shape = reorder(site, latitude), # <1>\n fill = reorder(site, latitude))) +\n geom_vline(xintercept = 36.5, color = \"black\", linetype = 1) +\n geom_vline(xintercept = 41.5, color = \"black\", linetype = 2) + # <2>\n geom_errorbar(mapping = aes(ymax = mean + std_error, ymin = mean - std_error),\n width = 0.2) +\n geom_point(size = 4) + \n scale_shape_manual(values = c(21:25, 21:25, 21:23)) +\n labs(x = \"Latitude\", y = \"Mean Crab Size (mm)\") + # <3>\n theme(legend.title = element_blank(),\n axis.line = element_line(color = \"black\"),\n panel.background = element_blank(),\n axis.title = element_text(size = 17),\n axis.text = element_text(size = 15))\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/talk-graph-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. We can use the `reorder` function to make the order of sites in the legend (from top to bottom) match the order of sites in the graph (from left to right)\n2. Adding vertical lines at particular parts in the graph can make comparisons within the same graph easier\n3. `labs` lets us customize the title and label text of a graph\n\n### Publication-Focused\n\n**Do:**\n\n- Increase size of text/points **slightly**\n - You want to be legible but you can more safely assume that many readers will be able to increase the zoom of their browser window if needed\n- Present un-summarized data (with or without summarized points included)\n - Many reviewers will want to get a sense for the \"real\" data so you should include unsummarized values wherever possible\n- Use multi-panel graphs\n - If multiple graphs \"tell a story\" together, then they should be included in the same file!\n- Map multiple aesthetics to the same variables\n- If publishing in a journal available in print, check to make sure your graph still makes sense in grayscale\n - There are nice browser plug-ins (like [Grayscale the Web](https://chromewebstore.google.com/detail/grayscale-the-web-save-si/mblmpdpfppogibmoobibfannckeeleag) for Google Chrome) for this too\n\n**Don't:**\n\n- Include _unnecessary_ background elements\n- Add graph elements that highlight certain graph regions\n - You can--and should--lean more heavily on the text of your publication to discuss particular areas of a graph\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nggplot() +\n geom_point(pie_crab, mapping = aes(x = latitude, y = size,\n color = reorder(site, latitude)),\n pch = 19, size = 1, alpha = 0.3) +\n geom_errorbar(crab_summary, mapping = aes(x = latitude, y = mean, \n ymax = mean + std_error, \n ymin = mean - std_error),\n width = 0.2) +\n geom_point(crab_summary, mapping = aes(x = latitude, y = mean, \n shape = reorder(site, latitude),\n fill = reorder(site, latitude)),\n size = 4) +\n scale_shape_manual(values = c(21:25, 21:25, 21:23)) +\n labs(x = \"Latitude\", y = \"Mean Crab Carapace Width (mm)\") + # <1>\n theme(legend.title = element_blank(),\n axis.line = element_line(color = \"black\"),\n panel.background = element_blank(),\n axis.title = element_text(size = 15),\n axis.text = element_text(size = 13))\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/pub-graph-1.png){fig-align='center' width=864}\n:::\n:::\n\n1. Here we are using a reasonable amount of technical language\n\n:::\n\n## Ordination\n\nIf you are working with multivariate data (i.e., data where multiple columns are all response variables collectively) you may find ordination helpful. Ordination is the general term for many types of multivariate visualization but typically is used to refer to visualizing a distance or dissimiliarity measure of the data. Such measures collapse all of those columns of response variables into fewer (typically two) index values that are easier to visualize.\n\nThis is a common approach particularly in answering questions in community ecology or considering a suite of traits (e.g., life history, landscape, etc.) together. While the math behind reducing the dimensionality of your data is interesting, this module is focused on only the visualization facet of ordination so we'll avoid deeper discussion of the internal mechanics that underpin ordination.\n\nIn order to demonstrate two types of ordination we'll use a lichen community composition dataset included in the `vegan` package. However, ordination approaches are most often used on data with multiple groups so we'll need to make a simulated grouping column to divide the lichen community data.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load library\nlibrary(vegan)\n\n# Grab data\nutils::data(\"varespec\", package = \"vegan\")\n\n# Create a faux group column\ntreatment <- c(rep.int(\"Treatment A\", nrow(varespec) / 2),\n rep.int(\"Treatment B\", nrow(varespec) / 2))\n\n# Combine into one dataframe\nlichen_df <- cbind(treatment, varespec)\n\n# Check structure of first few columns\nstr(lichen_df[1:5])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n'data.frame':\t24 obs. of 5 variables:\n $ treatment: chr \"Treatment A\" \"Treatment A\" \"Treatment A\" \"Treatment A\" ...\n $ Callvulg : num 0.55 0.67 0.1 0 0 ...\n $ Empenigr : num 11.13 0.17 1.55 15.13 12.68 ...\n $ Rhodtome : num 0 0 0 2.42 0 0 1.55 0 0.35 0.07 ...\n $ Vaccmyrt : num 0 0.35 0 5.92 0 ...\n```\n\n\n:::\n:::\n\n\n:::{.panel-tabset}\n\n### Metric Ordination\n\nMetric ordinations are typically used when you are concerned with retaining quantitative differences among particular points, _even after you've collapsed many response variables into just one or two_. For example, this is a common approach if you have a table of traits and want to compare the whole set of traits among groups while still being able to interpret the effect of a particular effect on the whole.\n\nTwo of the more common methods for metric ordination are Principal Components Analysis (PCA), and Principal Coordinates Analysis (PCoA / \"metric multidimensional scaling\"). The primary difference is that PCA works on the data directly while PCoA works on a distance matrix of the data. We'll use PCoA in this example because it is closer analog to the non-metric ordination discussed in the other tab. **If the holistic difference among groups is of interest,** (rather than metric point-to-point comparisons), **consider a non-metric ordination approach.**\n\nIn order to perform a PCoA ordination we first need to get a distance matrix of our response variables and then we can actually do the PCoA step. The distance matrix can be calculated with the `vegdist` function from the `vegan` package and the `pcoa` function in the `ape` package can do the actual PCoA.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load needed libraries\nlibrary(vegan); library(ape)\n\n# Get distance matrix\nlichen_dist <- vegan::vegdist(x = lichen_df[-1], method = \"kulczynski\") # <1>\n\n# Do PCoA\npcoa_points <- ape::pcoa(D = lichen_dist)\n```\n:::\n\n1. The `method` argument requires a distance/dissimilarity measure. Note that **if you use a non-metric measure** (e.g., Bray Curtis, etc.) **you lose many of the advantages conferred by using a metric ordination approach**.\n\nWith that in hand, we can make our ordination! While you could make this step-by-step on your own, we'll use the `ordination` function from the `supportR` package for convenience. This function automatically uses colorblind safe colors for up to 10 groups and has some useful base plot defaults (as well as including ellipses around the standard deviation of the centorid of all groups).\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Load the library\nlibrary(supportR)\n\n# Make the ordination\nsupportR::ordination(mod = pcoa_points, grps = lichen_df$treatment, \n x = \"topleft\", legend = c(\"A\", \"B\")) #<1>\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/pcoa-ord-1.png){fig-align='center' width=672}\n:::\n:::\n\n1. This function allows several base plot arguments to be supplied to alter non-critical plot elements (e.g., legend position, point size, etc.)\n\nThe percentages included in parentheses on either axis label are the percent of the total variation in the data explained by each axis on its own. Use this information in combination with what the graph looks like to determine how different the groups truly are.\n\n\n### Non-Metric Ordination\n\nNon-metric ordinations are typically used when you care more about the relative differences among groups rather than specific measurements between particular points. For instance, you may want to assess whether the composition of insect communities differs between two experimental treatments. In such a case, your hypothesis likely depends more on the holistic difference between the treatments rather than some quantitative difference on one of the axes.\n\nThe most common non-metric ordination type is called Nonmetric Multidimensional Scaling (NMS / NMDS). This approach prioritizes making groups that are \"more different\" further apart than those that are less different. However, NMS uses a dissimilarity matrix which means that the _distance_ between any two specific points cannot be interpreted meaningfully. It _is_ appropriate though to interpret which cloud of points is closer to/further from another in aggregate. **If specific distances among points are of interest, consider a metric ordination approach.**\n\nIn order to perform an NMS ordination we'll first need to calculate a dissimilarity matrix for our response data. The vegan function `metaMDS` is useful for this. This function has many arguments but the most fundamental are the following:\n\n- `comm` = the dataframe of response variables (minus any non-numeric / grouping columns)\n- `distance` = the distance/dissimilarity metric to use\n - Note that there is no benefit to using a metric distance because when we make the ordination it will become non-metric\n- `k` = number of axes to decompose to -- typically two so the graph can be simple\n- `try` = number of attempts at minimizing \"stress\"\n - Stress is how NMS evaluates how good of a job it did at representing the true differences among groups (lower stress is better)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load needed libraries\nlibrary(vegan)\n\n# Get dissimilarity matrix\ndissim_mat <- vegan::metaMDS(comm = lichen_df[-1], distance = \"bray\", k = 2,\n autotransform = F, expand = F, try = 50)\n```\n:::\n\n\nWith that in hand, we can make our ordination! While you could make this step-by-step on your own, we'll use the `ordination` function from the `supportR` package for convenience. This function automatically uses colorblind safe colors for up to 10 groups and has some useful base plot defaults (as well as including ellipses around the standard deviation of the centorid of all groups).\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Load the library\nlibrary(supportR)\n\n# Make the ordination\nsupportR::ordination(mod = dissim_mat, grps = lichen_df$treatment, \n x = \"bottomright\", legend = c(\"A\", \"B\")) #<1>\n```\n\n::: {.cell-output-display}\n![](mod_data-viz_files/figure-html/nms-ord-1.png){fig-align='center' width=672}\n:::\n:::\n\n1. This function allows several base plot arguments to be supplied to alter non-critical plot elements (e.g., legend position, point size, etc.)\n\nIf the stress is less than 0.15 it is generally considered a good representation of the data. We can see that the ellipses do not overlap which indicates that the community composition of our two groups does seem to differ. We'd need to do real multivariate analysis if we wanted a _p_-value or AIC score to support that but as a visual tool this is still useful.\n\n:::\n\n## Maps\n\nYou may find it valuable to create a map as an additional way of visualizing data. Many synthesis groups do this--particularly when there is a strong spatial component to the research questions and/or hypotheses.\n\nCheck out the [bonus spatial data module](https://lter.github.io/ssecr/mod_spatial.html) for more information on map-making if this is of interest!\n\n## Additional Resources\n\n### Papers & Documents\n\n- NCEAS [Colorblind Safe Color Schemes](https://www.nceas.ucsb.edu/sites/default/files/2022-06/Colorblind%20Safe%20Color%20Schemes.pdf) reference document\n\n### Workshops & Courses\n\n- NCEAS Scientific Computing team's Coding in the Tidyverse workshop [`ggplot2` module](https://nceas.github.io/scicomp-workshop-tidyverse/visualize.html)\n- The Carpentries' Data Analysis and Visualization in R for Ecologists [`ggplot2` episode](https://datacarpentry.org/R-ecology-lesson/04-visualization-ggplot2.html)\n\n\n### Websites\n\n- \n", "supporting": [ "mod_data-viz_files" ], diff --git a/_freeze/mod_data-viz/figure-html/multi-modal-1.png b/_freeze/mod_data-viz/figure-html/multi-modal-1.png index eb21c0b..a59b281 100644 Binary files a/_freeze/mod_data-viz/figure-html/multi-modal-1.png and b/_freeze/mod_data-viz/figure-html/multi-modal-1.png differ diff --git a/_freeze/mod_data-viz/figure-html/nms-ord-1.png b/_freeze/mod_data-viz/figure-html/nms-ord-1.png index 2d8cd80..d060de6 100644 Binary files a/_freeze/mod_data-viz/figure-html/nms-ord-1.png and b/_freeze/mod_data-viz/figure-html/nms-ord-1.png differ diff --git a/_freeze/mod_data-viz/figure-html/pcoa-ord-1.png b/_freeze/mod_data-viz/figure-html/pcoa-ord-1.png new file mode 100644 index 0000000..3335516 Binary files /dev/null and b/_freeze/mod_data-viz/figure-html/pcoa-ord-1.png differ diff --git a/mod_data-viz.qmd b/mod_data-viz.qmd index 8a9a742..3e054d4 100644 --- a/mod_data-viz.qmd +++ b/mod_data-viz.qmd @@ -468,10 +468,44 @@ str(lichen_df[1:5]) ### Metric Ordination +Metric ordinations are typically used when you are concerned with retaining quantitative differences among particular points, _even after you've collapsed many response variables into just one or two_. For example, this is a common approach if you have a table of traits and want to compare the whole set of traits among groups while still being able to interpret the effect of a particular effect on the whole. +Two of the more common methods for metric ordination are Principal Components Analysis (PCA), and Principal Coordinates Analysis (PCoA / "metric multidimensional scaling"). The primary difference is that PCA works on the data directly while PCoA works on a distance matrix of the data. We'll use PCoA in this example because it is closer analog to the non-metric ordination discussed in the other tab. **If the holistic difference among groups is of interest,** (rather than metric point-to-point comparisons), **consider a non-metric ordination approach.** +In order to perform a PCoA ordination we first need to get a distance matrix of our response variables and then we can actually do the PCoA step. The distance matrix can be calculated with the `vegdist` function from the `vegan` package and the `pcoa` function in the `ape` package can do the actual PCoA. -Common examples of this include Principal Components Analysis (PCA), , or Principal Coordinates Analysis (PCoA / "metric multidimensional scaling"). +```{r pcoa-prep} +#| message: false +#| output: false + +# Load needed libraries +library(vegan); library(ape) + +# Get distance matrix +lichen_dist <- vegan::vegdist(x = lichen_df[-1], method = "kulczynski") # <1> + +# Do PCoA +pcoa_points <- ape::pcoa(D = lichen_dist) +``` +1. The `method` argument requires a distance/dissimilarity measure. Note that **if you use a non-metric measure** (e.g., Bray Curtis, etc.) **you lose many of the advantages conferred by using a metric ordination approach**. + +With that in hand, we can make our ordination! While you could make this step-by-step on your own, we'll use the `ordination` function from the `supportR` package for convenience. This function automatically uses colorblind safe colors for up to 10 groups and has some useful base plot defaults (as well as including ellipses around the standard deviation of the centorid of all groups). + +```{r pcoa-ord} +#| fig-align: center +#| fig-width: 7 +#| fig-height: 5 + +# Load the library +library(supportR) + +# Make the ordination +supportR::ordination(mod = pcoa_points, grps = lichen_df$treatment, + x = "topleft", legend = c("A", "B")) #<1> +``` +1. This function allows several base plot arguments to be supplied to alter non-critical plot elements (e.g., legend position, point size, etc.) + +The percentages included in parentheses on either axis label are the percent of the total variation in the data explained by each axis on its own. Use this information in combination with what the graph looks like to determine how different the groups truly are. ### Non-Metric Ordination @@ -493,12 +527,15 @@ In order to perform an NMS ordination we'll first need to calculate a dissimilar #| message: false #| output: false +# Load needed libraries +library(vegan) + # Get dissimilarity matrix dissim_mat <- vegan::metaMDS(comm = lichen_df[-1], distance = "bray", k = 2, autotransform = F, expand = F, try = 50) ``` -With that in hand, we can make our ordination! While you could make this step-by-step on your own, we'll use the `ordination` function from the `supportR` package for convenience. This function automatically uses colorblind safe colors for up to 10 groups and has some useful base plot defaults (as well as including ellipses around the standard deviation of the centorid of all groups) +With that in hand, we can make our ordination! While you could make this step-by-step on your own, we'll use the `ordination` function from the `supportR` package for convenience. This function automatically uses colorblind safe colors for up to 10 groups and has some useful base plot defaults (as well as including ellipses around the standard deviation of the centorid of all groups). ```{r nms-ord} #| fig-align: center