Skip to content

Commit

Permalink
Download package for learners.
Browse files Browse the repository at this point in the history
  • Loading branch information
hlapp committed Sep 24, 2015
0 parents commit b28cd50
Show file tree
Hide file tree
Showing 9 changed files with 2,152 additions and 0 deletions.
10 changes: 10 additions & 0 deletions files/lit-prog/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Files for the Literate Programming lesson

- `countryPick.Rmd`: Rmarkdown file demonstrating various features of
literate programming with R.
- `countryPick.pdf`: the PDF generated from the Rmarkdown file of the
same base name.
- `gapminderDataFiveYear.tsv`: the cleaned and subset version of the
Gapminder dataset available from the [gapminder R package].

[gapminder R package]: http://github.com/jennybc/gapminder
189 changes: 189 additions & 0 deletions files/lit-prog/countryPick4.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
---
title: "Pick four - comparing trends in population over time"
output: pdf_document
---

## Purpose

The purpose of this report is to compare the population trends for four countries of your choosing. In addition, this serves as an example of literate programming. Literate programming is a way to document how you performed your analysis. It serves as a guide to other to others (and your future self) how to reproduce your work.

## Required Libraries
```{r}
library(ggplot2)
```

## Data

Always add as many details as possible about your data including where it came from, how it was processed, licensing, and where it can be accessed.

- Gapminder data [available here](http://www.gapminder.org/data/). [Gapminder data is licensed CC-BY 3.0](https://docs.google.com/document/pub?id=1POd-pBMc5vDXAmxrpGjPLaCSDSWuxX6FLQgq5DhlUhM#h.ul2gu2-uwathz).

- Processed data via [@jennybc](https://github.com/jennybc), [R package available here](https://github.com/jennybc/gapminder). The [data-raw](https://github.com/jennybc/gapminder/tree/master/data-raw) sub-directory reveals the journey from Gapminder.org's Excel workbooks to increasingly clean and tidy data.

**Read in data**: To read in the data, make sure this file is in the same directory/folder as the `gapminderDataFiveYear.txv` file. To set the proper working directory go to Session > Set Working Directory > To Source File Location.

```{r}
gapMinder <- read.delim("gapminderDataFiveYear.tsv")
### Check data
head(gapMinder) #First 10 lines of dataset
dim(gapMinder) #number of rows and columns in data set
```

You can see what countries are available by looking at the how many unique categories are in the country column of the gapMinder dataset.

```{r, results='hide'}
levels(gapMinder$country)
```

### Pick Four Countries

Now pick four countries that you are intrested in. Just replace with the countries name below.

```{r}
countryName1 <- "India"
countryName2 <- "United States"
countryName3 <- "Nigeria"
countryName4 <- "Germany"
```

## Individual countries

### Country One

We want to look at how population changes over time for the first country.

```{r}
country1 <- subset(gapMinder, country == countryName1)
ggplot(country1, aes(year, pop)) +
geom_path() +
ggtitle(countryName1) +
theme(plot.title = element_text(size = 15, face = "bold"))
```

This second graph is looking at the correlation between life expectancy (lifeExp) and GDP per person (gdpPercap). The size of the circles on the plot represents total population.

```{r}
ggplot(country1, aes(gdpPercap, lifeExp, size = pop)) +
geom_point() +
ggtitle(countryName1) +
theme(plot.title = element_text(size = 15, face = "bold"))
```

### Country Two

We will do this for each country. Since the code is very similar, we
will omit viewing it below by adding the named parameter `echo=FALSE`
(`TRUE` is the default):

```{r, echo=FALSE}
country2 <- subset(gapMinder, country == countryName2)
ggplot(country2, aes(year, pop)) +
geom_path() +
ggtitle(countryName2) +
theme(plot.title = element_text(size = 15, face = "bold"))
```

**Notes**: In a real report you can add information about the results of the analysis you are performing. That way your code, analysis, questions, and results are all in one place.

```{r, echo = FALSE}
ggplot(country2, aes(gdpPercap, lifeExp, size = pop)) +
geom_point() +
ggtitle(countryName2) +
theme(plot.title = element_text(size = 15, face = "bold"))
```

### Country Three

```{r, echo=FALSE}
country3 <- subset(gapMinder, country == countryName3)
ggplot(country3, aes(year, pop)) +
geom_path() +
ggtitle(countryName3) +
theme(plot.title = element_text(size = 15, face = "bold"))
```

**Notes** Maybe a country has an unusual distribution and we want to label the graph with the year. We added `label = year` to the first line of the code below. To display the text we also added the `geom_text(hjust = 1, vjust = 0, size = 5)` option.

```{r}
ggplot(country3, aes(gdpPercap, lifeExp, size = pop, label = year)) +
geom_point() +
geom_text(hjust = 1.3, vjust = 0, size = 3) +
ggtitle(countryName3) +
theme(plot.title = element_text(size = 15, face = "bold"))
```

### Country Four

```{r, echo=FALSE}
country4 <- subset(gapMinder, country == countryName4)
ggplot(country4, aes(year, pop)) +
geom_path() +
ggtitle(countryName4) +
theme(plot.title = element_text(size=15, face = "bold"))
```

**Notes**: Or we can try out labeling the year by adding color.

```{r}
ggplot(country4, aes(gdpPercap, lifeExp, size = pop, color = year)) +
geom_point() +
ggtitle(countryName4) +
theme(plot.title = element_text(size=15, face = "bold"))
```

## All four countries

Let's add all four countries together and to see how they compare.

```{r}
#Add subsetted data together
allCountries <- rbind(country1, country2, country3, country4)
#Notice the code for this is similar to when we are just looking at one country
#just with the added color option
ggplot(allCountries, aes(year, pop, color=country)) +
geom_path() +
xlab("Year") + ylab("Population Size") +
ggtitle("All four countries") +
theme(plot.title = element_text(lineheight=.8, face = "bold"))
```

What about what is occuring in a particular year? You can change the year by changing the code in the `year == 2007` section. To look at what years are possible use `allCountries$year`.

```{r}
yr <- 2007
ggplot(subset(allCountries, year == yr),
aes(x = gdpPercap, y = lifeExp, color = country, size = pop)) +
scale_x_log10(limits = c(500, 90000)) +
geom_point(alpha = 0.8) +
scale_size_area(max_size = 14) +
theme_bw() + # black grid on white background
xlab("GDP per capita") + ylab("Life Expectancy") +
ggtitle(paste("All 4 countries in", yr)) +
theme(plot.title = element_text(size = 15, face = "bold"))
```

You can plot all the years at once also!

```{r}
ggplot(allCountries,
aes(x = gdpPercap, y = lifeExp, color = country, size = pop)) +
scale_x_log10(limits = c(500, 90000)) +
ylim(c(30, 90)) +
geom_point(alpha = 0.8) +
scale_size_area(max_size = 14) +
theme_bw() + # black grid on white background
xlab("GDP per capita") + ylab("Life Expectancy") +
ggtitle("All 4 countries") +
theme(plot.title = element_text(size = 15, face = "bold"))
```


## Conclusions

In a real report you can add conclusions about your analysis or future plans for the project. The best part is that if you want to change something in your report you don't have to redo every step. You can just make the change and re-print the report.
Binary file added files/lit-prog/countryPick4.pdf
Binary file not shown.
Loading

0 comments on commit b28cd50

Please sign in to comment.