Skip to content

Commit 35e4cfb

Browse files
authored
Add files via upload
Uploaded answer key (script + graphics outputs).
1 parent a760333 commit 35e4cfb

17 files changed

+1499
-0
lines changed
+87
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
---
2+
title: "Very basics of R coding ANSWER KEY"
3+
author: "Chenxin Li"
4+
date: "01/06/2023"
5+
output:
6+
html_notebook:
7+
number_sections: yes
8+
toc: yes
9+
toc_float: yes
10+
---
11+
12+
```{r setup, include=FALSE}
13+
knitr::opts_chunk$set(echo = TRUE)
14+
```
15+
16+
# Packages
17+
```{r}
18+
library(dplyr)
19+
```
20+
21+
# Excerice
22+
23+
Today you have learned some basic syntax of R.
24+
Now it's time for you to practice.
25+
26+
## Q1:
27+
28+
1. Insert a new code chunk
29+
2. Make this matrix
30+
31+
| 1 | 1 | 2 | 2 |
32+
| 2 | 2 | 1 | 2 |
33+
| 2 | 3 | 3 | 4 |
34+
| 1 | 2 | 3 | 4 |
35+
36+
and save it as an item called `my_mat2`.
37+
```{r}
38+
my_mat2 <- rbind(
39+
c(1, 1, 2, 2),
40+
c(2, 2, 1, 2),
41+
c(2, 3, 3, 4),
42+
c(1, 2, 3, 4)
43+
)
44+
45+
my_mat2
46+
```
47+
48+
49+
3. Select the 1st and 3rd rows and the 1st, 2nd and 4th columns, and save it as an item.
50+
```{r}
51+
item <- my_mat2[c(1,3), c(1,2,4)]
52+
item
53+
```
54+
55+
4. Take the square root for each member of my_mat2, then take log2(), and lastly find the maximum value.
56+
Use the pipe syntax. The command for maximum is `max()`.
57+
58+
```{r}
59+
my_mat2 %>%
60+
sqrt() %>%
61+
log2() %>%
62+
max()
63+
```
64+
65+
## Q2:
66+
67+
1. Use the following info to make a data frame and save it as an item called "grade".
68+
Adel got 85 on the exam, Bren got 83, and Cecil got 93.
69+
Their letter grades are B, B, and A, respectively.
70+
(Hint: How many columns do you have to have?)
71+
72+
```{r}
73+
grade <- data.frame(
74+
name = c("Adel", "Bren", "Cecil"),
75+
score = c(85, 83, 93),
76+
letters = c("B", "B", "A")
77+
)
78+
79+
grade
80+
```
81+
82+
2. Pull out the column with the scores.
83+
Use the `$` syntax.
84+
```{r}
85+
grade$score
86+
```
87+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
---
2+
title: "Data Arrangement ANSWER KEY"
3+
author: "Chenxin Li"
4+
date: "01/06/2023"
5+
output:
6+
html_notebook:
7+
number_sections: yes
8+
toc: yes
9+
toc_float: yes
10+
11+
---
12+
13+
```{r setup, include=FALSE}
14+
knitr::opts_chunk$set(echo = TRUE)
15+
```
16+
17+
# Load packages
18+
```{r}
19+
library(tidyverse)
20+
library(readxl)
21+
```
22+
## Data from lecture
23+
```{r}
24+
child_mortality <- read_csv("../Data/child_mortality_0_5_year_olds_dying_per_1000_born.csv", col_types = cols())
25+
babies_per_woman <- read_csv("../Data/children_per_woman_total_fertility.csv", col_types = cols())
26+
```
27+
28+
These are two datasets downloaded from the [Gapminder foundation](https://www.gapminder.org/data/).
29+
The Gapminder foundation has datasets on life expectancy, economy, education, and population across countries and years.
30+
The goal is to remind us not only the "gaps" between developed and developing worlds, but also the amazing continuous improvements of quality of life through time.
31+
32+
1. Child mortality (0 - 5 year old) dying per 1000 born.
33+
2. Births per woman.
34+
35+
These were recorded from year 1800 and projected all the way to 2100.
36+
37+
Let's look at them.
38+
39+
```{r}
40+
head(child_mortality)
41+
head(babies_per_woman)
42+
```
43+
44+
45+
```{r}
46+
babies_per_woman_tidy <- babies_per_woman %>%
47+
pivot_longer(names_to = "year", values_to = "birth", cols = c(2:302))
48+
49+
head(babies_per_woman_tidy)
50+
51+
child_mortality_tidy <- child_mortality %>%
52+
pivot_longer(names_to = "year", values_to = "death_per_1000_born", cols = c(2:302))
53+
54+
head(child_mortality_tidy)
55+
```
56+
57+
```{r}
58+
birth_and_mortality <- babies_per_woman_tidy %>%
59+
inner_join(child_mortality_tidy, by = c("country", "year"))
60+
61+
head(birth_and_mortality)
62+
```
63+
64+
# Exercise
65+
66+
You have learned data arrangement! Let's do an exercise to practice what
67+
you have learned today.
68+
As the example, this time we will use income per person dataset from Gapminder foundation.
69+
70+
```{r}
71+
income <- read_csv("../Data/income_per_person_gdppercapita_ppp_inflation_adjusted.csv", col_types = cols())
72+
head(income)
73+
```
74+
75+
## Tidy data
76+
Is this a tidy data frame?
77+
NO!
78+
79+
Make it a tidy data frame using this code chunk.
80+
```{r}
81+
income_tidy <- income %>%
82+
pivot_longer(names_to = "year", values_to = "income", cols = !country)
83+
84+
head(income_tidy)
85+
```
86+
87+
## Joining data
88+
89+
Combine the income data with birth per woman and child mortality data using this code chunk.
90+
Name the new data frame "birth_and_mortality_and_income".
91+
92+
```{r}
93+
birth_and_mortality_and_income <- income_tidy %>%
94+
inner_join(babies_per_woman_tidy, by = c("country", "year")) %>%
95+
inner_join(child_mortality_tidy, by = c("country", "year"))
96+
97+
head(birth_and_mortality_and_income)
98+
```
99+
100+
101+
## Filtering data
102+
103+
Filter out the data for Bangladesh and Sweden, in years 1945 (when WWII ended) and 2010.
104+
Name the new data frame BS_1945_2010.
105+
How has income, birth per woman and child mortality rate changed during this 55-year period?
106+
107+
```{r}
108+
BS_1945_2010 <- birth_and_mortality_and_income %>%
109+
filter(country == "Bangladesh" |
110+
country == "Sweden") %>%
111+
filter(year == 1945 |
112+
year == 2010)
113+
114+
115+
head(BS_1945_2010)
116+
```
117+
118+
119+
## Mutate data
120+
121+
Let's say for countries with income between 1000 to 10,000 dollars per year, they are called "fed".
122+
For countries with income above 10,000 dollars per year, they are called "wealthy".
123+
Below 1000, they are called "poor".
124+
125+
Using this info to make a new column called "status".
126+
Hint: you will have to use case_when() and the "&" logic somewhere in this chunk.
127+
128+
```{r}
129+
birth_and_mortality_and_income <- birth_and_mortality_and_income %>%
130+
mutate(status = case_when(
131+
income >= 1000 & income <= 10000 ~ "fed",
132+
income > 10000 ~ "wealthy",
133+
income < 1000 ~ "poor"
134+
))
135+
136+
head(birth_and_mortality_and_income)
137+
```
138+
139+
## Summarise the data
140+
141+
Let's look at the average child mortality and its sd in year 2010.
142+
across countries across different status that we just defined.
143+
Name the new data frame "child_mortality_summmary_2010".
144+
145+
```{r}
146+
child_mortality_summary_2010 <- birth_and_mortality_and_income %>%
147+
filter(year == 2010) %>%
148+
group_by(status) %>%
149+
summarize(
150+
avg = mean(death_per_1000_born),
151+
sd = sd(death_per_1000_born))
152+
153+
head(child_mortality_summary_2010)
154+
```
155+
156+
How does child mortality compare across income group in year 2010?
157+
Child mortality is higher for lower income groups.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
---
2+
title: "Intro_to_data_vis ANSWER KEY"
3+
author: "Chenxin Li"
4+
date: "2023-01-06"
5+
output:
6+
html_notebook:
7+
number_sections: yes
8+
toc: yes
9+
toc_float: yes
10+
---
11+
12+
```{r setup, include=FALSE}
13+
knitr::opts_chunk$set(echo = TRUE)
14+
```
15+
16+
17+
# Required packages
18+
```{r}
19+
library(tidyverse)
20+
library(RColorBrewer)
21+
```
22+
23+
# Data from lecture
24+
```{r}
25+
child_mortality <- read_csv("../Data/child_mortality_0_5_year_olds_dying_per_1000_born.csv", col_types = cols())
26+
babies_per_woman <- read_csv("../Data/children_per_woman_total_fertility.csv", col_types = cols())
27+
income <- read_csv("../Data/income_per_person_gdppercapita_ppp_inflation_adjusted.csv", col_types = cols())
28+
```
29+
30+
Re-shape data into tidy format.
31+
```{r}
32+
babies_per_woman_tidy <- babies_per_woman %>%
33+
pivot_longer(names_to = "year", values_to = "birth", cols = c(2:302))
34+
35+
child_mortality_tidy <- child_mortality %>%
36+
pivot_longer(names_to = "year", values_to = "death_per_1000_born", cols = c(2:302))
37+
38+
income_tidy <- income %>%
39+
pivot_longer(names_to = "year", values_to = "income", cols = c(2:242))
40+
```
41+
42+
Join them together.
43+
```{r}
44+
example2_data <- babies_per_woman_tidy %>%
45+
inner_join(child_mortality_tidy, by = c("country", "year")) %>%
46+
inner_join(income_tidy, by = c("country", "year"))
47+
48+
head(example2_data)
49+
```
50+
51+
# Exercise
52+
Graph income (in log10 scale) on x axis, child mortality on y axis, and color with children/woman in year 2010.
53+
Were the trend similar to year 1945?
54+
Save the graph using `ggsave()`.
55+
56+
57+
```{r}
58+
example2_data %>%
59+
filter(year == 2010) %>%
60+
ggplot(aes(x = log10(income), y = death_per_1000_born)) +
61+
geom_point(aes(color = birth)) +
62+
scale_color_gradientn(colours = brewer.pal(9, "YlGnBu")) +
63+
labs(x = "log10 income",
64+
y = "death per 1000 born",
65+
title = "2010") +
66+
theme_classic()
67+
68+
ggsave("Lesson3_answer1.png", width = 3, height = 3)
69+
```
70+
```{r}
71+
example2_data %>%
72+
filter(year == 1945) %>%
73+
ggplot(aes(x = log10(income), y = death_per_1000_born)) +
74+
geom_point(aes(color = birth)) +
75+
scale_color_gradientn(colours = brewer.pal(9, "YlGnBu")) +
76+
labs(x = "log10 income",
77+
y = "death per 1000 born",
78+
title = "1945") +
79+
theme_classic()
80+
81+
ggsave("Lesson3_answer2.png", width = 3, height = 3)
82+
```
83+

0 commit comments

Comments
 (0)