cxli233
diff --git a/‎Answer_key/01_Intro_to_R_answer_key.Rmd
+87 b/‎Answer_key/01_Intro_to_R_answer_key.Rmd
+87
diff --git a/‎Answer_key/02_Intro_to_tidy_data_answer_key.Rmd
+157 b/‎Answer_key/02_Intro_to_tidy_data_answer_key.Rmd
+157
diff --git a/‎Answer_key/03_Intro_to_data_vis_answer_key.Rmd
+83 b/‎Answer_key/03_Intro_to_data_vis_answer_key.Rmd
+83
@@ -0,0 +1,87 @@
+---
+title: "Very basics of R coding ANSWER KEY"
+author: "Chenxin Li"
+date: "01/06/2023"
+output:
+  html_notebook:
+    number_sections: yes
+    toc: yes
+    toc_float: yes
+---
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+```
+
+# Packages 
+```{r}
+library(dplyr) 
+```
+
+# Excerice
+
+Today you have learned some basic syntax of R.
+Now it's time for you to practice.
+
+## Q1:
+
+1.  Insert a new code chunk
+2.  Make this matrix
+
+| 1  | 1 | 2 | 2 |
+| 2  | 2 | 1 | 2 |
+| 2  | 3 | 3 | 4 |
+| 1  | 2 | 3 | 4 |
+
+and save it as an item called `my_mat2`.
+```{r}
+my_mat2 <- rbind(
+  c(1, 1, 2, 2),
+  c(2, 2, 1, 2),
+  c(2, 3, 3, 4),
+  c(1, 2, 3, 4)
+  )
+
+my_mat2
+```
+
+
+3. Select the 1st and 3rd rows and the 1st, 2nd and 4th columns, and save it as an item.
+```{r}
+item <- my_mat2[c(1,3), c(1,2,4)]   
+item
+```
+
+4. Take the square root for each member of my_mat2, then take log2(), and lastly find the maximum value.
+Use the pipe syntax. The command for maximum is `max()`.
+
+```{r}
+my_mat2 %>%
+  sqrt() %>% 
+  log2() %>% 
+  max() 
+```
+
+## Q2:
+
+1.  Use the following info to make a data frame and save it as an item called "grade".
+    Adel got 85 on the exam, Bren got 83, and Cecil got 93.
+    Their letter grades are B, B, and A, respectively.
+    (Hint: How many columns do you have to have?)
+    
+```{r}
+grade <- data.frame(
+  name = c("Adel", "Bren", "Cecil"),
+  score = c(85, 83, 93),
+  letters = c("B", "B", "A")
+)
+
+grade
+```
+
+2. Pull out the column with the scores.
+    Use the `$` syntax.
+```{r}
+grade$score
+```
+
@@ -0,0 +1,157 @@
+---
+title: "Data Arrangement ANSWER KEY"
+author: "Chenxin Li"
+date: "01/06/2023"
+output:
+  html_notebook:
+    number_sections: yes
+    toc: yes
+    toc_float: yes
+
+---
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+```
+
+# Load packages
+```{r}
+library(tidyverse)
+library(readxl)
+```
+## Data from lecture 
+```{r}
+child_mortality <- read_csv("../Data/child_mortality_0_5_year_olds_dying_per_1000_born.csv", col_types = cols()) 
+babies_per_woman <- read_csv("../Data/children_per_woman_total_fertility.csv", col_types = cols()) 
+```
+
+These are two datasets downloaded from the [Gapminder foundation](https://www.gapminder.org/data/).
+The Gapminder foundation has datasets on life expectancy, economy, education, and population across countries and years.
+The goal is to remind us not only the "gaps" between developed and developing worlds, but also the amazing continuous improvements of quality of life through time.
+
+1.  Child mortality (0 - 5 year old) dying per 1000 born.
+2.  Births per woman.
+
+These were recorded from year 1800 and projected all the way to 2100.
+
+Let's look at them.
+
+```{r}
+head(child_mortality)
+head(babies_per_woman)
+```
+
+
+```{r}
+babies_per_woman_tidy <- babies_per_woman %>% 
+  pivot_longer(names_to = "year", values_to = "birth", cols = c(2:302)) 
+
+head(babies_per_woman_tidy)
+
+child_mortality_tidy <- child_mortality %>% 
+  pivot_longer(names_to = "year", values_to = "death_per_1000_born", cols = c(2:302)) 
+
+head(child_mortality_tidy)
+```
+
+```{r}
+birth_and_mortality <- babies_per_woman_tidy %>% 
+  inner_join(child_mortality_tidy, by = c("country", "year"))
+
+head(birth_and_mortality)
+```
+
+# Exercise
+
+You have learned data arrangement! Let's do an exercise to practice what
+you have learned today. 
+As the example, this time we will use income per person dataset from Gapminder foundation.
+
+```{r}
+income <- read_csv("../Data/income_per_person_gdppercapita_ppp_inflation_adjusted.csv", col_types = cols()) 
+head(income)
+```
+
+## Tidy data
+Is this a tidy data frame?
+NO! 
+
+Make it a tidy data frame using this code chunk.
+```{r}
+income_tidy <- income %>% 
+  pivot_longer(names_to = "year", values_to = "income", cols = !country)
+
+head(income_tidy)
+```
+
+## Joining data
+
+Combine the income data with birth per woman and child mortality data using this code chunk.
+Name the new data frame "birth_and_mortality_and_income".
+
+```{r}
+ birth_and_mortality_and_income <- income_tidy %>% 
+  inner_join(babies_per_woman_tidy, by = c("country", "year")) %>% 
+  inner_join(child_mortality_tidy, by = c("country", "year"))
+
+head(birth_and_mortality_and_income)
+```
+ 
+
+## Filtering data
+
+Filter out the data for Bangladesh and Sweden, in years 1945 (when WWII ended) and 2010.
+Name the new data frame BS_1945_2010.
+How has income, birth per woman and child mortality rate changed during this 55-year period?
+
+```{r}
+BS_1945_2010 <- birth_and_mortality_and_income %>% 
+ filter(country == "Bangladesh" | 
+          country == "Sweden") %>% 
+ filter(year == 1945 | 
+         year == 2010)
+ 
+
+head(BS_1945_2010)
+```
+ 
+
+## Mutate data
+
+Let's say for countries with income between 1000 to 10,000 dollars per year, they are called "fed".
+For countries with income above 10,000 dollars per year, they are called "wealthy".
+Below 1000, they are called "poor".
+
+Using this info to make a new column called "status".
+Hint: you will have to use case_when() and the "&" logic somewhere in this chunk.
+
+```{r}
+birth_and_mortality_and_income <- birth_and_mortality_and_income %>% 
+  mutate(status = case_when(
+    income >= 1000 & income <= 10000 ~ "fed",       
+    income > 10000 ~ "wealthy",                     
+    income < 1000 ~ "poor"                        
+))
+
+head(birth_and_mortality_and_income)
+```
+
+## Summarise the data
+
+Let's look at the average child mortality and its sd in year 2010. 
+across countries across different status that we just defined. 
+Name the new data frame "child_mortality_summmary_2010".
+
+```{r}
+child_mortality_summary_2010 <- birth_and_mortality_and_income %>% 
+  filter(year == 2010) %>% 
+  group_by(status) %>%
+  summarize(
+    avg = mean(death_per_1000_born), 
+    sd = sd(death_per_1000_born))
+
+head(child_mortality_summary_2010)
+```
+ 
+How does child mortality compare across income group in year 2010?
+Child mortality is higher for lower income groups.
@@ -0,0 +1,83 @@
+---
+title: "Intro_to_data_vis ANSWER KEY"
+author: "Chenxin Li"
+date: "2023-01-06"
+output:
+  html_notebook:
+    number_sections: yes
+    toc: yes
+    toc_float: yes
+---
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+```
+
+
+# Required packages
+```{r}
+library(tidyverse)
+library(RColorBrewer)
+```
+
+# Data from lecture  
+```{r}
+child_mortality <- read_csv("../Data/child_mortality_0_5_year_olds_dying_per_1000_born.csv", col_types = cols()) 
+babies_per_woman <- read_csv("../Data/children_per_woman_total_fertility.csv", col_types = cols()) 
+income <- read_csv("../Data/income_per_person_gdppercapita_ppp_inflation_adjusted.csv", col_types = cols()) 
+```
+
+Re-shape data into tidy format. 
+```{r}
+babies_per_woman_tidy <- babies_per_woman %>% 
+  pivot_longer(names_to = "year", values_to = "birth", cols = c(2:302))  
+
+child_mortality_tidy <- child_mortality %>% 
+  pivot_longer(names_to = "year", values_to = "death_per_1000_born", cols = c(2:302))  
+
+income_tidy <- income %>% 
+  pivot_longer(names_to = "year", values_to = "income", cols = c(2:242))  
+```
+
+Join them together. 
+```{r}
+example2_data <- babies_per_woman_tidy %>% 
+  inner_join(child_mortality_tidy, by = c("country", "year")) %>% 
+  inner_join(income_tidy, by = c("country", "year"))
+
+head(example2_data)
+```
+
+# Exercise 
+Graph income (in log10 scale) on x axis, child mortality on y axis, and color with children/woman in year 2010. 
+Were the trend similar to year 1945? 
+Save the graph using `ggsave()`. 
+
+ 
+```{r}
+example2_data %>% 
+  filter(year == 2010) %>% 
+  ggplot(aes(x = log10(income), y = death_per_1000_born)) +
+  geom_point(aes(color = birth)) +
+  scale_color_gradientn(colours = brewer.pal(9, "YlGnBu")) + 
+  labs(x = "log10 income",
+       y = "death per 1000 born",
+       title = "2010") +
+  theme_classic()
+
+ggsave("Lesson3_answer1.png", width = 3, height = 3)
+```
+```{r}
+example2_data %>% 
+  filter(year == 1945) %>% 
+  ggplot(aes(x = log10(income), y = death_per_1000_born)) +
+  geom_point(aes(color = birth)) +
+  scale_color_gradientn(colours = brewer.pal(9, "YlGnBu")) + 
+  labs(x = "log10 income",
+       y = "death per 1000 born",
+       title = "1945") +
+  theme_classic()
+
+ggsave("Lesson3_answer2.png", width = 3, height = 3)
+```
+