Skip to content

Commit

Permalink
Forget to fix a facet_grid
Browse files Browse the repository at this point in the history
  • Loading branch information
VectorPosse committed Jan 16, 2025
1 parent 14d6f5b commit 5a065bb
Show file tree
Hide file tree
Showing 5 changed files with 11 additions and 9 deletions.
4 changes: 2 additions & 2 deletions 04-numerical_data-web.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

### Functions introduced in this chapter

`mean`, `sd`, `var`, `median`, `sort`, `IQR`, `quantile`, `summary`, `min`, `max`, `geom_histogram`, `geom_point`, `geom_boxplot`, `facet_grid`
`mean`, `sd`, `var`, `median`, `sort`, `IQR`, `quantile`, `unname`, `summary`, `min`, `max`, `geom_histogram`, `geom_point`, `geom_boxplot`, `facet_grid`

:::

Expand Down Expand Up @@ -584,7 +584,7 @@ The other thing that kind of sucks is the fact that the y-axis is showing counts
ggplot(penguins, aes(x = body_mass_g)) +
geom_histogram(aes(y = after_stat(density)),
binwidth = 250, boundary = 3500) +
facet_grid(species ~ .)
facet_grid(rows = vars(species))
```

Due to some technical issues in `ggplot2`, these are not strictly proportions. (If you were to add up the heights of all the bars, they would not add up to 100%.) Nevertheless, the graph is still useful because it does scale the groups to put them on equal footing. In other words, it treats each group as if they all had the same sample size.
Expand Down
5 changes: 3 additions & 2 deletions chapter_downloads/04-numerical_data.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ format:

### Functions introduced in this chapter

`mean`, `sd`, `var`, `median`, `sort`, `IQR`, `quantile`, `summary`, `min`, `max`, `geom_histogram`, `geom_point`, `geom_boxplot`, `facet_grid`
`mean`, `sd`, `var`, `median`, `sort`, `IQR`, `quantile`, `unname`, `summary`, `min`, `max`, `geom_histogram`, `geom_point`, `geom_boxplot`, `facet_grid`


:::

Expand Down Expand Up @@ -594,7 +595,7 @@ The other thing that kind of sucks is the fact that the y-axis is showing counts
ggplot(penguins, aes(x = body_mass_g)) +
geom_histogram(aes(y = after_stat(density)),
binwidth = 250, boundary = 3500) +
facet_grid(species ~ .)
facet_grid(rows = vars(species))
```

Due to some technical issues in `ggplot2`, these are not strictly proportions. (If you were to add up the heights of all the bars, they would not add up to 100%.) Nevertheless, the graph is still useful because it does scale the groups to put them on equal footing. In other words, it treats each group as if they all had the same sample size.
Expand Down
4 changes: 2 additions & 2 deletions docs/04-numerical_data-web.html
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,7 @@ <h1 class="title"><span class="chapter-number">4</span>&nbsp; <span class="chapt
</div>
</div>
<div class="callout-body-container callout-body">
<p><code>mean</code>, <code>sd</code>, <code>var</code>, <code>median</code>, <code>sort</code>, <code>IQR</code>, <code>quantile</code>, <code>summary</code>, <code>min</code>, <code>max</code>, <code>geom_histogram</code>, <code>geom_point</code>, <code>geom_boxplot</code>, <code>facet_grid</code></p>
<p><code>mean</code>, <code>sd</code>, <code>var</code>, <code>median</code>, <code>sort</code>, <code>IQR</code>, <code>quantile</code>, <code>unname</code>, <code>summary</code>, <code>min</code>, <code>max</code>, <code>geom_histogram</code>, <code>geom_point</code>, <code>geom_boxplot</code>, <code>facet_grid</code></p>
</div>
</div>
<section id="introduction" class="level2" data-number="4.1">
Expand Down Expand Up @@ -921,7 +921,7 @@ <h5 class="unnumbered anchored" data-anchor-id="exercise-9">Exercise 9</h5>
<div class="sourceCode cell-code" id="cb62"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb62-1"><a href="#cb62-1" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>(penguins, <span class="fu">aes</span>(<span class="at">x =</span> body_mass_g)) <span class="sc">+</span></span>
<span id="cb62-2"><a href="#cb62-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_histogram</span>(<span class="fu">aes</span>(<span class="at">y =</span> <span class="fu">after_stat</span>(density)),</span>
<span id="cb62-3"><a href="#cb62-3" aria-hidden="true" tabindex="-1"></a> <span class="at">binwidth =</span> <span class="dv">250</span>, <span class="at">boundary =</span> <span class="dv">3500</span>) <span class="sc">+</span></span>
<span id="cb62-4"><a href="#cb62-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">facet_grid</span>(species <span class="sc">~</span> .)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<span id="cb62-4"><a href="#cb62-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">facet_grid</span>(<span class="at">rows =</span> <span class="fu">vars</span>(species))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_bin()`).</code></pre>
Expand Down
5 changes: 3 additions & 2 deletions docs/chapter_downloads/04-numerical_data.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ format:

### Functions introduced in this chapter

`mean`, `sd`, `var`, `median`, `sort`, `IQR`, `quantile`, `summary`, `min`, `max`, `geom_histogram`, `geom_point`, `geom_boxplot`, `facet_grid`
`mean`, `sd`, `var`, `median`, `sort`, `IQR`, `quantile`, `unname`, `summary`, `min`, `max`, `geom_histogram`, `geom_point`, `geom_boxplot`, `facet_grid`


:::

Expand Down Expand Up @@ -594,7 +595,7 @@ The other thing that kind of sucks is the fact that the y-axis is showing counts
ggplot(penguins, aes(x = body_mass_g)) +
geom_histogram(aes(y = after_stat(density)),
binwidth = 250, boundary = 3500) +
facet_grid(species ~ .)
facet_grid(rows = vars(species))
```

Due to some technical issues in `ggplot2`, these are not strictly proportions. (If you were to add up the heights of all the bars, they would not add up to 100%.) Nevertheless, the graph is still useful because it does scale the groups to put them on equal footing. In other words, it treats each group as if they all had the same sample size.
Expand Down
2 changes: 1 addition & 1 deletion docs/search.json
Original file line number Diff line number Diff line change
Expand Up @@ -534,7 +534,7 @@
"href": "04-numerical_data-web.html#graphing-grouped-numerical-data",
"title": "4  Numerical data",
"section": "4.6 Graphing grouped numerical data",
"text": "4.6 Graphing grouped numerical data\nSuppose you want to analyze one numerical variable and one categorical variable. Usually, the idea here is that the categorical variable divides up the data into groups and you are interested in understanding the numerical variable for each group separately. Another way to say this is that your numerical variable is response and your categorical variable is predictor. (It is also possible for a categorical variable to be response and a numerical variable to be predictor. This is common in so-called “classification” problems. We will not cover this possibility in this course, but it is covered in more advanced courses.)\nThis turns out to be exactly what we need in the penguins data. Throughout the above exercises, there was a concern that the penguin measurements are fundamentally different among the three different species of penguin.\nGraphically, there are two good options here. The first is a side-by-side boxplot.\n\nggplot(penguins, aes(y = body_mass_g, x = species)) +\n geom_boxplot()\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_boxplot()`).\n\n\n\n\n\n\n\n\n\nNotice the placement of the variables. The y-axis is body_mass_g, the numerical variable. The x-axis variable is species; the groups are placed along the x-axis. This is consistent with other graph types that place the response variable on the y-axis and the predictor variable on the x-axis.\nThe other possible graph is a stacked histogram. This uses a feature called “faceting” that creates a different plot for each group. The new ggplot command is called facet_grid. The only slightly unusual syntax you need to know is that the predictor variable has to be inside vars() as in the following code chunk:\n\nggplot(penguins, aes(x = body_mass_g)) +\n geom_histogram() +\n facet_grid(rows = vars(species))\n\n`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.\n\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_bin()`).\n\n\n\n\n\n\n\n\n\nAs always, the default bins suck, so let’s change them.\n\nggplot(penguins, aes(x = body_mass_g)) +\n geom_histogram(binwidth = 250, boundary = 3500) +\n facet_grid(rows = vars(species))\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_bin()`).\n\n\n\n\n\n\n\n\n\nNotice that we specified rows in the facet_grid function. What if we had specified columns instead?\n\nggplot(penguins, aes(x = body_mass_g)) +\n geom_histogram(binwidth = 250, boundary = 3500) +\n facet_grid(cols = vars(species))\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_bin()`).\n\n\n\n\n\n\n\n\n\n\nExercise 9\nExplain why that last graph (which might be called a side-by-side histogram) is less effective than the earlier stacked histogram. (Hint: which variable can you line up with your eyes when the histograms are stacked vertically rather than horizontally?)\n\nPlease write up your answer here.\n\n\nThe other thing that kind of sucks is the fact that the y-axis is showing counts. That makes it harder to see the distribution of body mass among Chinstrap penguins, for example, as there are fewer of them in the data set. It would be nice to scale these using percentages.\n\nggplot(penguins, aes(x = body_mass_g)) +\n geom_histogram(aes(y = after_stat(density)),\n binwidth = 250, boundary = 3500) +\n facet_grid(species ~ .)\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_bin()`).\n\n\n\n\n\n\n\n\n\nDue to some technical issues in ggplot2, these are not strictly proportions. (If you were to add up the heights of all the bars, they would not add up to 100%.) Nevertheless, the graph is still useful because it does scale the groups to put them on equal footing. In other words, it treats each group as if they all had the same sample size.\n\n\nExercise 10\nChoose a numerical variable that’s not body mass and a categorical variable that’s not species from the penguins data set. Make both a side-by-side boxplot and a stacked histogram. Discuss the resulting graphs. Comment on the association (or independence) of the two variables. If there is an association, be sure to focus on the four key features (linearity, direction, strength, and outliers).\n\n\n# Add code here to create a side-by-side boxplot.\n\n\n# Add code here to create a stacked histogram.\n\nPlease write up your answer here.",
"text": "4.6 Graphing grouped numerical data\nSuppose you want to analyze one numerical variable and one categorical variable. Usually, the idea here is that the categorical variable divides up the data into groups and you are interested in understanding the numerical variable for each group separately. Another way to say this is that your numerical variable is response and your categorical variable is predictor. (It is also possible for a categorical variable to be response and a numerical variable to be predictor. This is common in so-called “classification” problems. We will not cover this possibility in this course, but it is covered in more advanced courses.)\nThis turns out to be exactly what we need in the penguins data. Throughout the above exercises, there was a concern that the penguin measurements are fundamentally different among the three different species of penguin.\nGraphically, there are two good options here. The first is a side-by-side boxplot.\n\nggplot(penguins, aes(y = body_mass_g, x = species)) +\n geom_boxplot()\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_boxplot()`).\n\n\n\n\n\n\n\n\n\nNotice the placement of the variables. The y-axis is body_mass_g, the numerical variable. The x-axis variable is species; the groups are placed along the x-axis. This is consistent with other graph types that place the response variable on the y-axis and the predictor variable on the x-axis.\nThe other possible graph is a stacked histogram. This uses a feature called “faceting” that creates a different plot for each group. The new ggplot command is called facet_grid. The only slightly unusual syntax you need to know is that the predictor variable has to be inside vars() as in the following code chunk:\n\nggplot(penguins, aes(x = body_mass_g)) +\n geom_histogram() +\n facet_grid(rows = vars(species))\n\n`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.\n\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_bin()`).\n\n\n\n\n\n\n\n\n\nAs always, the default bins suck, so let’s change them.\n\nggplot(penguins, aes(x = body_mass_g)) +\n geom_histogram(binwidth = 250, boundary = 3500) +\n facet_grid(rows = vars(species))\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_bin()`).\n\n\n\n\n\n\n\n\n\nNotice that we specified rows in the facet_grid function. What if we had specified columns instead?\n\nggplot(penguins, aes(x = body_mass_g)) +\n geom_histogram(binwidth = 250, boundary = 3500) +\n facet_grid(cols = vars(species))\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_bin()`).\n\n\n\n\n\n\n\n\n\n\nExercise 9\nExplain why that last graph (which might be called a side-by-side histogram) is less effective than the earlier stacked histogram. (Hint: which variable can you line up with your eyes when the histograms are stacked vertically rather than horizontally?)\n\nPlease write up your answer here.\n\n\nThe other thing that kind of sucks is the fact that the y-axis is showing counts. That makes it harder to see the distribution of body mass among Chinstrap penguins, for example, as there are fewer of them in the data set. It would be nice to scale these using percentages.\n\nggplot(penguins, aes(x = body_mass_g)) +\n geom_histogram(aes(y = after_stat(density)),\n binwidth = 250, boundary = 3500) +\n facet_grid(rows = vars(species))\n\nWarning: Removed 2 rows containing non-finite outside the scale range\n(`stat_bin()`).\n\n\n\n\n\n\n\n\n\nDue to some technical issues in ggplot2, these are not strictly proportions. (If you were to add up the heights of all the bars, they would not add up to 100%.) Nevertheless, the graph is still useful because it does scale the groups to put them on equal footing. In other words, it treats each group as if they all had the same sample size.\n\n\nExercise 10\nChoose a numerical variable that’s not body mass and a categorical variable that’s not species from the penguins data set. Make both a side-by-side boxplot and a stacked histogram. Discuss the resulting graphs. Comment on the association (or independence) of the two variables. If there is an association, be sure to focus on the four key features (linearity, direction, strength, and outliers).\n\n\n# Add code here to create a side-by-side boxplot.\n\n\n# Add code here to create a stacked histogram.\n\nPlease write up your answer here.",
"crumbs": [
"<span class='chapter-number'>4</span>  <span class='chapter-title'>Numerical data</span>"
]
Expand Down

0 comments on commit 5a065bb

Please sign in to comment.