Forget to fix a facet_grid

VectorPosse · Jan 16, 2025 · 5a065bb · 5a065bb
1 parent 14d6f5b
commit 5a065bb
Show file tree

Hide file tree

Showing 5 changed files with 11 additions and 9 deletions.
diff --git a/04-numerical_data-web.qmd b/04-numerical_data-web.qmd
@@ -4,7 +4,7 @@
 
 ### Functions introduced in this chapter
 
-`mean`, `sd`, `var`, `median`, `sort`, `IQR`, `quantile`, `summary`, `min`, `max`, `geom_histogram`, `geom_point`, `geom_boxplot`, `facet_grid`
+`mean`, `sd`, `var`, `median`, `sort`, `IQR`, `quantile`, `unname`, `summary`, `min`, `max`, `geom_histogram`, `geom_point`, `geom_boxplot`, `facet_grid`
 
 :::
 
@@ -584,7 +584,7 @@ The other thing that kind of sucks is the fact that the y-axis is showing counts
 ggplot(penguins, aes(x = body_mass_g)) +
   geom_histogram(aes(y = after_stat(density)),
                  binwidth = 250, boundary = 3500) +
-  facet_grid(species ~ .)
+  facet_grid(rows = vars(species))
 ```
 
 Due to some technical issues in `ggplot2`, these are not strictly proportions. (If you were to add up the heights of all the bars, they would not add up to 100%.) Nevertheless, the graph is still useful because it does scale the groups to put them on equal footing. In other words, it treats each group as if they all had the same sample size.

diff --git a/chapter_downloads/04-numerical_data.qmd b/chapter_downloads/04-numerical_data.qmd
@@ -14,7 +14,8 @@ format:
 
 ### Functions introduced in this chapter
 
-`mean`, `sd`, `var`, `median`, `sort`, `IQR`, `quantile`, `summary`, `min`, `max`, `geom_histogram`, `geom_point`, `geom_boxplot`, `facet_grid`
+`mean`, `sd`, `var`, `median`, `sort`, `IQR`, `quantile`, `unname`, `summary`, `min`, `max`, `geom_histogram`, `geom_point`, `geom_boxplot`, `facet_grid`
+
 
 :::
 
@@ -594,7 +595,7 @@ The other thing that kind of sucks is the fact that the y-axis is showing counts
 ggplot(penguins, aes(x = body_mass_g)) +
   geom_histogram(aes(y = after_stat(density)),
                  binwidth = 250, boundary = 3500) +
-  facet_grid(species ~ .)
+  facet_grid(rows = vars(species))
 ```
 
 Due to some technical issues in `ggplot2`, these are not strictly proportions. (If you were to add up the heights of all the bars, they would not add up to 100%.) Nevertheless, the graph is still useful because it does scale the groups to put them on equal footing. In other words, it treats each group as if they all had the same sample size.

diff --git a/docs/04-numerical_data-web.html b/docs/04-numerical_data-web.html
@@ -300,7 +300,7 @@ <h1 class="title"><span class="chapter-number">4</span>&nbsp; <span class="chapt
 </div>
 </div>
 <div class="callout-body-container callout-body">
-<p><code>mean</code>, <code>sd</code>, <code>var</code>, <code>median</code>, <code>sort</code>, <code>IQR</code>, <code>quantile</code>, <code>summary</code>, <code>min</code>, <code>max</code>, <code>geom_histogram</code>, <code>geom_point</code>, <code>geom_boxplot</code>, <code>facet_grid</code></p>
+<p><code>mean</code>, <code>sd</code>, <code>var</code>, <code>median</code>, <code>sort</code>, <code>IQR</code>, <code>quantile</code>, <code>unname</code>, <code>summary</code>, <code>min</code>, <code>max</code>, <code>geom_histogram</code>, <code>geom_point</code>, <code>geom_boxplot</code>, <code>facet_grid</code></p>
 </div>
 </div>
 <section id="introduction" class="level2" data-number="4.1">
@@ -921,7 +921,7 @@ <h5 class="unnumbered anchored" data-anchor-id="exercise-9">Exercise 9</h5>
 <div class="sourceCode cell-code" id="cb62"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb62-1"><a href="#cb62-1" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>(penguins, <span class="fu">aes</span>(<span class="at">x =</span> body_mass_g)) <span class="sc">+</span></span>
 <span id="cb62-2"><a href="#cb62-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">geom_histogram</span>(<span class="fu">aes</span>(<span class="at">y =</span> <span class="fu">after_stat</span>(density)),</span>
 <span id="cb62-3"><a href="#cb62-3" aria-hidden="true" tabindex="-1"></a>                 <span class="at">binwidth =</span> <span class="dv">250</span>, <span class="at">boundary =</span> <span class="dv">3500</span>) <span class="sc">+</span></span>
-<span id="cb62-4"><a href="#cb62-4" aria-hidden="true" tabindex="-1"></a>  <span class="fu">facet_grid</span>(species <span class="sc">~</span> .)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb62-4"><a href="#cb62-4" aria-hidden="true" tabindex="-1"></a>  <span class="fu">facet_grid</span>(<span class="at">rows =</span> <span class="fu">vars</span>(species))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-stderr">
 <pre><code>Warning: Removed 2 rows containing non-finite outside the scale range
 (`stat_bin()`).</code></pre>

diff --git a/docs/chapter_downloads/04-numerical_data.qmd b/docs/chapter_downloads/04-numerical_data.qmd
@@ -14,7 +14,8 @@ format:
 
 ### Functions introduced in this chapter
 
-`mean`, `sd`, `var`, `median`, `sort`, `IQR`, `quantile`, `summary`, `min`, `max`, `geom_histogram`, `geom_point`, `geom_boxplot`, `facet_grid`
+`mean`, `sd`, `var`, `median`, `sort`, `IQR`, `quantile`, `unname`, `summary`, `min`, `max`, `geom_histogram`, `geom_point`, `geom_boxplot`, `facet_grid`
+
 
 :::
 
@@ -594,7 +595,7 @@ The other thing that kind of sucks is the fact that the y-axis is showing counts
 ggplot(penguins, aes(x = body_mass_g)) +
   geom_histogram(aes(y = after_stat(density)),
                  binwidth = 250, boundary = 3500) +
-  facet_grid(species ~ .)
+  facet_grid(rows = vars(species))
 ```
 
 Due to some technical issues in `ggplot2`, these are not strictly proportions. (If you were to add up the heights of all the bars, they would not add up to 100%.) Nevertheless, the graph is still useful because it does scale the groups to put them on equal footing. In other words, it treats each group as if they all had the same sample size.

diff --git a/docs/search.json b/docs/search.json
@@ -534,7 +534,7 @@
     "href": "04-numerical_data-web.html#graphing-grouped-numerical-data",
     "title": "4  Numerical data",
     "section": "4.6 Graphing grouped numerical data",
-    "text": "4.6 Graphing grouped numerical data\nSuppose you want to analyze one numerical variable and one categorical variable. Usually, the idea here is that the categorical variable divides up the data into groups and you are interested in understanding the numerical variable for each group separately. Another way to say this is that your numerical variable is response and your categorical variable is predictor. (It is also possible for a categorical variable to be response and a numerical variable to be predictor. This is common in so-called “classification” problems. We will not cover this possibility in this course, but it is covered in more advanced courses.)\nThis turns out to be exactly what we need in the penguins data. Throughout the above exercises, there was a concern that the penguin measurements are fundamentally different among the three different species of penguin.\nGraphically, there are two good options here. The first is a side-by-side boxplot.\n\nggplot(penguins, aes(y = body_mass_g, x = species)) +\n  geom_boxplot()\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_boxplot()`).\n\n\n\n\n\n\n\n\n\nNotice the placement of the variables. The y-axis is body_mass_g, the numerical variable. The x-axis variable is species; the groups are placed along the x-axis. This is consistent with other graph types that place the response variable on the y-axis and the predictor variable on the x-axis.\nThe other possible graph is a stacked histogram. This uses a feature called “faceting” that creates a different plot for each group. The new ggplot command is called facet_grid. The only slightly unusual syntax you need to know is that the predictor variable has to be inside vars() as in the following code chunk:\n\nggplot(penguins, aes(x = body_mass_g)) +\n  geom_histogram() +\n  facet_grid(rows = vars(species))\n\n`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.\n\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_bin()`).\n\n\n\n\n\n\n\n\n\nAs always, the default bins suck, so let’s change them.\n\nggplot(penguins, aes(x = body_mass_g)) +\n  geom_histogram(binwidth = 250, boundary = 3500) +\n  facet_grid(rows = vars(species))\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_bin()`).\n\n\n\n\n\n\n\n\n\nNotice that we specified rows in the facet_grid function. What if we had specified columns instead?\n\nggplot(penguins, aes(x = body_mass_g)) +\n  geom_histogram(binwidth = 250, boundary = 3500) +\n  facet_grid(cols = vars(species))\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_bin()`).\n\n\n\n\n\n\n\n\n\n\nExercise 9\nExplain why that last graph (which might be called a side-by-side histogram) is less effective than the earlier stacked histogram. (Hint: which variable can you line up with your eyes when the histograms are stacked vertically rather than horizontally?)\n\nPlease write up your answer here.\n\n\nThe other thing that kind of sucks is the fact that the y-axis is showing counts. That makes it harder to see the distribution of body mass among Chinstrap penguins, for example, as there are fewer of them in the data set. It would be nice to scale these using percentages.\n\nggplot(penguins, aes(x = body_mass_g)) +\n  geom_histogram(aes(y = after_stat(density)),\n                 binwidth = 250, boundary = 3500) +\n  facet_grid(species ~ .)\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_bin()`).\n\n\n\n\n\n\n\n\n\nDue to some technical issues in ggplot2, these are not strictly proportions. (If you were to add up the heights of all the bars, they would not add up to 100%.) Nevertheless, the graph is still useful because it does scale the groups to put them on equal footing. In other words, it treats each group as if they all had the same sample size.\n\n\nExercise 10\nChoose a numerical variable that’s not body mass and a categorical variable that’s not species from the penguins data set. Make both a side-by-side boxplot and a stacked histogram. Discuss the resulting graphs. Comment on the association (or independence) of the two variables. If there is an association, be sure to focus on the four key features (linearity, direction, strength, and outliers).\n\n\n# Add code here to create a side-by-side boxplot.\n\n\n# Add code here to create a stacked histogram.\n\nPlease write up your answer here.",
+    "text": "4.6 Graphing grouped numerical data\nSuppose you want to analyze one numerical variable and one categorical variable. Usually, the idea here is that the categorical variable divides up the data into groups and you are interested in understanding the numerical variable for each group separately. Another way to say this is that your numerical variable is response and your categorical variable is predictor. (It is also possible for a categorical variable to be response and a numerical variable to be predictor. This is common in so-called “classification” problems. We will not cover this possibility in this course, but it is covered in more advanced courses.)\nThis turns out to be exactly what we need in the penguins data. Throughout the above exercises, there was a concern that the penguin measurements are fundamentally different among the three different species of penguin.\nGraphically, there are two good options here. The first is a side-by-side boxplot.\n\nggplot(penguins, aes(y = body_mass_g, x = species)) +\n  geom_boxplot()\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_boxplot()`).\n\n\n\n\n\n\n\n\n\nNotice the placement of the variables. The y-axis is body_mass_g, the numerical variable. The x-axis variable is species; the groups are placed along the x-axis. This is consistent with other graph types that place the response variable on the y-axis and the predictor variable on the x-axis.\nThe other possible graph is a stacked histogram. This uses a feature called “faceting” that creates a different plot for each group. The new ggplot command is called facet_grid. The only slightly unusual syntax you need to know is that the predictor variable has to be inside vars() as in the following code chunk:\n\nggplot(penguins, aes(x = body_mass_g)) +\n  geom_histogram() +\n  facet_grid(rows = vars(species))\n\n`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.\n\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_bin()`).\n\n\n\n\n\n\n\n\n\nAs always, the default bins suck, so let’s change them.\n\nggplot(penguins, aes(x = body_mass_g)) +\n  geom_histogram(binwidth = 250, boundary = 3500) +\n  facet_grid(rows = vars(species))\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_bin()`).\n\n\n\n\n\n\n\n\n\nNotice that we specified rows in the facet_grid function. What if we had specified columns instead?\n\nggplot(penguins, aes(x = body_mass_g)) +\n  geom_histogram(binwidth = 250, boundary = 3500) +\n  facet_grid(cols = vars(species))\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_bin()`).\n\n\n\n\n\n\n\n\n\n\nExercise 9\nExplain why that last graph (which might be called a side-by-side histogram) is less effective than the earlier stacked histogram. (Hint: which variable can you line up with your eyes when the histograms are stacked vertically rather than horizontally?)\n\nPlease write up your answer here.\n\n\nThe other thing that kind of sucks is the fact that the y-axis is showing counts. That makes it harder to see the distribution of body mass among Chinstrap penguins, for example, as there are fewer of them in the data set. It would be nice to scale these using percentages.\n\nggplot(penguins, aes(x = body_mass_g)) +\n  geom_histogram(aes(y = after_stat(density)),\n                 binwidth = 250, boundary = 3500) +\n  facet_grid(rows = vars(species))\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_bin()`).\n\n\n\n\n\n\n\n\n\nDue to some technical issues in ggplot2, these are not strictly proportions. (If you were to add up the heights of all the bars, they would not add up to 100%.) Nevertheless, the graph is still useful because it does scale the groups to put them on equal footing. In other words, it treats each group as if they all had the same sample size.\n\n\nExercise 10\nChoose a numerical variable that’s not body mass and a categorical variable that’s not species from the penguins data set. Make both a side-by-side boxplot and a stacked histogram. Discuss the resulting graphs. Comment on the association (or independence) of the two variables. If there is an association, be sure to focus on the four key features (linearity, direction, strength, and outliers).\n\n\n# Add code here to create a side-by-side boxplot.\n\n\n# Add code here to create a stacked histogram.\n\nPlease write up your answer here.",
     "crumbs": [
       "<span class='chapter-number'>4</span>  <span class='chapter-title'>Numerical data</span>"
     ]