Skip to content

Latest commit

 

History

History
191 lines (132 loc) · 5 KB

gapminder-ggplot2-univariate-quantitative.md

File metadata and controls

191 lines (132 loc) · 5 KB

gapminder-ggplot2-univariate-quantitative.r

Jenny Bryan Mon Oct 3 23:49:33 2016

Note: this is rendered by applying knitr::spin() to an R script. So the narrative is very minimal. load the data and ggplot2 (part of the tidyverse)

library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr

## Conflicts with tidy packages ----------------------------------------------

## filter(): dplyr, stats
## lag():    dplyr, stats
library(gapminder)
gapminder
## # A tibble: 1,704 × 6
##        country continent  year lifeExp      pop gdpPercap
##         <fctr>    <fctr> <int>   <dbl>    <int>     <dbl>
## 1  Afghanistan      Asia  1952  28.801  8425333  779.4453
## 2  Afghanistan      Asia  1957  30.332  9240934  820.8530
## 3  Afghanistan      Asia  1962  31.997 10267083  853.1007
## 4  Afghanistan      Asia  1967  34.020 11537966  836.1971
## 5  Afghanistan      Asia  1972  36.088 13079460  739.9811
## 6  Afghanistan      Asia  1977  38.438 14880372  786.1134
## 7  Afghanistan      Asia  1982  39.854 12881816  978.0114
## 8  Afghanistan      Asia  1987  40.822 13867957  852.3959
## 9  Afghanistan      Asia  1992  41.674 16317921  649.3414
## 10 Afghanistan      Asia  1997  41.763 22227415  635.3414
## # ... with 1,694 more rows

distribution of a quant var: histogram

ggplot(gapminder, aes(x = lifeExp)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

experiment with bin width; think in terms of the units of the x variable

ggplot(gapminder, aes(x = lifeExp)) +
  geom_histogram(binwidth = 1)

show the different continents, but it's weird to stack up the histograms, which is what default of position = "stack" delivers

ggplot(gapminder, aes(x = lifeExp, fill = continent)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

position = "identity" is good to know about it's still weird to layer them on top of each other like this

ggplot(gapminder, aes(x = lifeExp, fill = continent)) +
  geom_histogram(position = "identity")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

geom_freqpoly() is better in this case

ggplot(gapminder, aes(x = lifeExp, color = continent)) +
  geom_freqpoly()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

smooth histogram = densityplot

ggplot(gapminder, aes(x = lifeExp)) + geom_density()

you should look at different levels of smoothing

ggplot(gapminder, aes(x = lifeExp)) + geom_density(adjust = 1)

ggplot(gapminder, aes(x = lifeExp)) + geom_density(adjust = 0.2)

densityplots work better in terms of one continent not obscuring another

ggplot(gapminder, aes(x = lifeExp, color = continent)) + geom_density()

alpha transparency works here too

ggplot(gapminder, aes(x = lifeExp, fill = continent)) +
  geom_density(alpha = 0.2)

with only two countries, maybe we should ignore Oceania?

ggplot(subset(gapminder, continent != "Oceania"),
       aes(x = lifeExp, fill = continent)) + geom_density(alpha = 0.2)

facets work here too

ggplot(gapminder, aes(x = lifeExp)) + geom_density() + facet_wrap(~ continent)

ggplot(subset(gapminder, continent != "Oceania"),
       aes(x = lifeExp, fill = continent)) + geom_histogram() +
  facet_grid(continent ~ .)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

boxplot for one quantitative variable against a discrete variable first attempt does not work since year is not formally a factor

ggplot(gapminder, aes(x = year, y = lifeExp)) + geom_boxplot()
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?

by explicitly specifying year as the grouping variable, we get what we want

ggplot(gapminder, aes(x = year, y = lifeExp)) + geom_boxplot(aes(group = year))

try geom_violin() instead and just generally goofing off now

ggplot(gapminder, aes(x = year, y = lifeExp)) +
  geom_violin(aes(group = year)) +
  geom_jitter(alpha = 1/4) +
  geom_smooth(se = FALSE)