-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.Rmd
187 lines (120 loc) · 10.5 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
title: "Using RStudio / Importing data: Quiz & Exercises"
output:
learnr::tutorial:
progressive: true
allow_skip: true
df_print: paged
runtime: shiny_prerendered
description: >
Learn how to use RStudio and import data - Quiz/Exo.
---
```{r setup, include=FALSE}
library(learnr)
library(dplyr)
knitr::opts_chunk$set(echo = FALSE)
```
## Quiz
*Question 1*
```{r Q1, echo=FALSE}
question("Which of these statements is true?",
answer("RStudio is useless without R",correct=TRUE),
answer("R is a powerful Interactive Development Environment", message="No, RStudio is the IDE"),
answer("Without RStudio, you cannot use R", message="R can work very well without RStudio. It's just not that user friendly"),
answer("You need to install RStudio first, then you can install R", message="No, you should start by installing R"), allow_retry = TRUE,random_answer_order = TRUE
)
```
```{r echo=FALSE, out.width="100%", fig.align='left'}
knitr::include_graphics("./images/packageQuestion.jpg")
```
*Question 2*
```{r Q2, echo=FALSE}
question("Why is R giving an error in the above screenshot?",
answer("The command for reading the data is missing",message="No, it is present at the top of the file: Pulse <- read.csv('Pulse.csv')"),
answer("The command library(ggplot2) is missing", message="It is missing, but we don't need it, since we are not plotting anything"),
answer("We did not specify the name of our summary statistic", message="It's good practice to name our summary statistics indeed, but ommiting the name is not an 'error'"),
answer("The column names are incorrect", message="No, the column names are correct"),
answer("Pipes don't work within R chunks", message="pipes work very well within R chunks"),
answer("The project file was not created", message="We can't tell from this screenshot whether a project file was created or not. We can say that it cannot be responsible for the error though."),
answer("The command library(dplyr) is missing", correct=TRUE),
answer("The symbol used for the pipe operator is wrong", message="%>% is the right symbol for the pipe operator"),
answer("We are using pipes, but the data argument of each function is missing", message="The pipe command is correct. With pipes, we ommit the data argument of the function"), allow_retry = TRUE,random_answer_order = TRUE
)
```
```{r echo=FALSE, out.width="100%", fig.align='left'}
knitr::include_graphics("./images/Quiz1.PNG")
```
*Question 3
```{r Q3, echo=FALSE}
question("The above R Markdown file runs fine, but is missing the point of using R Markdown. Why?",
answer("The comments should be part of the text outside the R chunks. That way, an R Markdown document can become a proper report",correct=TRUE),
answer("With R Markdown, all the libraries are already preloaded, so there's no need to have commands that load libraries!", message="No, you still need to load libraries in an R Markdown file"),
answer("R Markdown is made for writing R code. Ideally, we shouldn't have any text or comment in such document, so that it is reproducible", message="No, the advantage of R Markdown is that it is great for writing reproducible reports. So it should contain text"),
answer("The commands should all be included in the same R chunk, so that they can be run altogether. That's the beauty of R Markdown!", message="No, having all the commands inside one unique R chunk would be missing the point as well."), allow_retry = TRUE,random_answer_order = TRUE
)
```
*Question 4*
```{r Q4, echo=FALSE}
question("What is important for your data if you want to import it into R? (select all that apply)",
answer("It should be made of one single rectangle of data",correct=TRUE),
answer("The column names should lie in one row only", correct=TRUE),
answer("There shouldn't be any gap between rows or columns", correct=TRUE),
answer("All the columns should have the same length, and the same for the rows", correct=TRUE),
answer("It shouldn't contain any merged cells", correct=TRUE), allow_retry = TRUE,random_answer_order = TRUE
)
```
## Exercises
**Exercise 1: Using the 'happiness' data set we used in the workbook – let’s do a quick exercise to recap what we learnt in Modules 1-3, but writing out the code entirely in RStudio instead of online.**
0. Start a new project file, and a new RMD script file and write the initial set up code to import the happiness dataset and load the `dplyr` and `ggplot2` libraries. Remember to save the Excel file in CSV format before importing the data.
(If you were following along to here directly from the workbook you may have already done this; so you can skip this step!).
You can find the Excel file here: [Happiness.xlsx](https://github.com/stats4sd/RCourse_2023_download_files/raw/dev/Happiness.xlsx)
1. Find the 5 countries with the lowest ‘generosity scores’
2. Produce a subset of countries which have below average scores for both GDP and generosity.
3. Make a plot showing the relationship between generosity and GDP per capita
**Exercise 2: Download the file below that contains the imdb dataset and the Module 3 exercises in RMD format.**
[Module-3-Exercises.zip](https://github.com/stats4sd/r2020_04Quiz/raw/refs/heads/main/Module%203%20Exercises.zip)
Create a new project in RStudio and unzip the files into your new project folder.
Then open the workbook, and load in the libraries and the dataset. The code for these steps are written for you.
Then you can copy across your answers from Module 3 (which should still be saved on the online version if you have completed Module 3) into this new workbook.
And, if you did finish the Module 3 exercises, now is the time to do so! Only now you are using your own version RStudio rather than using the version of R on the server.
## Appendix: 'Happiness' dataset
The data used for the questions in this workbook comes from the World Happiness Report 2019. The World Happiness Report is an annual publication of the United Nations Sustainable Development Solutions Network. It aims at ranking 156 countries by level of global happiness of their population, by doing an annual survey on their citizens.
In this dataset, the variables 'GDP per capita','Social support','Healthy life expectancy','Freedom to make life choices','Generosity' and 'Perceptions of corruption' are factors used to compute the Happiness Score ('Score').
The methodology, as well as more information about this publication, can be found <a href="https://worldhappiness.report/ed/2019/" target="_blank">here </a>.
## Appendix: 'imdb' dataset
For one exercise of this module, we are using a dataset called "imdb", which we constructed from the subsets of the Internet Movie Database made available for non-commercial purposes by the IMDb team:
<a href="https://www.imdb.com/interfaces/" target="_blank">https://www.imdb.com/interfaces/</a>
It contains the following information for all the entries having more than 500 votes, that are not of type "tvEpisodes" and for which information about year of release, running time and director(s) was available at the time of extraction (02/10/2024):
```{r, echo=FALSE,message=FALSE,warning=FALSE}
library(knitr)
data.frame(Column=c("title","type","year","length","numVotes","averageRating","director","birthYear","animation","action", "adventure", "comedy", "documentary", "fantasy", "romance", "sci_fi", "thriller"),
Description=c("popular title of the entry",
"type of entry: movie, short, tvMiniSeries, tvMovie, tvSeries, tvShort, tvSpecial, video or videoGame",
"year of release (for series, year of release of the first episode)",
"duration in minutes",
"number of votes for the entry",
"IMDb's weighted average rating for the entry",
"director of the entry (if multiple directors, the first one was picked)",
"year of birth of the director",
"the entry is of genre animation (TRUE/FALSE)",
"the entry is of genre action (TRUE/FALSE)",
"the entry is of genre adventure (TRUE/FALSE)",
"the entry is of genre comedy (TRUE/FALSE)",
"the entry is of genre documentary (TRUE/FALSE)",
"the entry is of genre fantasy (TRUE/FALSE)",
"the entry is of genre romance (TRUE/FALSE)",
"the entry is of genre science fiction (TRUE/FALSE)",
"the entry is of genre thriller (TRUE/FALSE)")) %>% kable()
```
## Appendix: Useful reference links
RStudio website:<a href="https://rstudio.com/" target="_blank">https://rstudio.com/ </a>
R-project website:<a href="https://www.r-project.org/" target="_blank">https://www.r-project.org/ </a>
Andy Field's Getting started in R and RStudio:<a href="http://milton-the-cat.rocks/learnr/r/r_getting_started" target="_blank">http://milton-the-cat.rocks/learnr/r/r_getting_started </a>
R Markdown documentation:<a href="https://rmarkdown.rstudio.com/lesson-1.html" target="_blank">https://rmarkdown.rstudio.com/lesson-1.html </a>
RStudio CheatSheet:<a href="https://ucdavis-bioinformatics-training.github.io/Oct2017-ILRI-Workshop/Cheat_Sheets/rstudio-IDE-cheatsheet.pdf" target="_blank">https://ucdavis-bioinformatics-training.github.io/Oct2017-ILRI-Workshop/Cheat_Sheets/rstudio-IDE-cheatsheet.pdf </a>
R Markdown CheatSheet:<a href="https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf" target="_blank">https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf </a>
Data importing cheat sheet <a href="https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_data-import.pdf" target="_blank">https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_data-import.pdf </a>
Illustrated presentation of 'tidy data': <a href="https://docs.google.com/presentation/d/1N7hKepabvl9OrHjvGJWPjUsfzVdB5xzV5AsFndgSwms/edit?usp=sharing" target="_blank">Make friends with tidy data </a>
Video explaining the command to import text files into R - DataCamp:<a href="https://www.youtube.com/watch?v=Yy-ismDUkkQ" target="_blank">https://www.youtube.com/watch?v=Yy-ismDUkkQ </a>
Article on the RStudio support site showing how to import data from different files via the RStudio import menu:<a href="https://support.rstudio.com/hc/en-us/articles/218611977-Importing-Data-with-RStudio" target="_blank">Importing Data with RStudio</a>
RStudio documentation on connecting to databases using R <a href="https://db.rstudio.com" target="_blank">https://db.rstudio.com </a>