-
Notifications
You must be signed in to change notification settings - Fork 5
/
class3.Rmd
151 lines (111 loc) · 8.58 KB
/
class3.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
title: 'R Small Group: Class 3'
author: "Amy Allen & Dayne Filer"
date: "June 28, 2016"
output:
pdf_document:
highlight: tango
html_document: null
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, collapse = TRUE)
```
<!-- Here we style out button a little bit -->
<style>
.showopt {
background-color: #004c93;
color: #FFFFFF;
width: 100px;
height: 20px;
text-align: center;
vertical-align: middle !important;
border-radius: 8px;
float:right;
}
.showopt:hover {
background-color: #dfe4f2;
color: #004c93;
}
</style>
<!--Include script for hiding output chunks-->
<script src="hideOutput.js"></script>
### Using this document
* Code blocks and R code have a grey background (note, code nested in the text is not highlighted in the pdf version of this document but is a different font).
* \# indicates a comment, and anything after a comment will not be evaluated in R
* The comments beginning with \#\# under the code in the grey code boxes are the output from the code directly above; any comments added by us will start with a single \#
* While you can copy and paste code into R, you will learn faster if you type out the commands yourself.
* Read through the document after class. This is meant to be a reference, and ideally, you should be able to understand every line of code. If there is something you do not understand please email us with questions or ask in the following class (you're probably not the only one with the same question!).
### Class 3 expectations
1. Know how to import data from a csv file format
2. Understand all of the arguments of the `plot` function
3. Be able to make basic plots
### Importing data from a csv file
A csv is a comma separated values file, which allows data to be saved in a table structured format. Excel files can often be saved as csv files. R has a function to read these files called `read.csv`. We will use this function to read in data from a csv file.
You have been provided with the data set "yeastmutants.csv". This data set includes data on the response to mating pheromone over time for following three cell types: wildtype (wt), an sst2 mutant (sst2), and a gpa1 mutant (gpa1). For each mutant there is data on the mean response of a population (avg) as well as the standard deviation of the response (stdev). The data set also includes the time for each data point. Let's import it into R and store it as `data`. First we need to set our working directory to the location where the file is stored. We do this by using the `setwd` function. First locate the file under the files tab, then hover over the file name and it should give you the path to that directory; for me it is "~/Documents/rclass". Make sure that the path name is in quotes so R knows it is a character data type and execute the function as follows:
```{r}
setwd("~/Documents/rclass")
getwd()
```
You can use the `getwd()` function to check that you have sucessfully changed directories. Now that we are in the correct working directory we can load the data using the `read.csv` function. Here the argument `header` is set to `TRUE` because the data set has a header labeling each column.
```{r}
data <- read.csv("yeastmutants.csv", header = TRUE)
```
Now we can take a look at our data.
```{r}
class(data)
data
str(data)
```
Using the `class()` function we can tell that `data` is a dataframe. This means it is a list of vectors each with the same length. Now, let's look closer at the data itself; we see it looks like a table with 7 columns (each with a header) and 17 rows. We can also look at the structure of this data frame using the `str()` function. It tells us that we have 17 observations of 7 variables and liststhe variables and their classes. `Time` is an integer and the rest of the variables are numeric.
### Basic plotting
Now we have imported some data, so how can we visualize it? One of the most basic ways to visualize data is the the `plot()` function. We can learn more about this function using `?plot`. The `plot` function is used for generic X-Y plotting. It requires two arguments `x` and `y`, which represent the x and y coordinates for the plot respectively. Then the function also has lots of optional arguments. `type` is used indicate what type of plot should be drawn; all of the options for this are listed in the help menu. `main` and `sub` are used to give the plot an overall and a subtitle, respectively. `xlab` and `ylab` are used to label the x and y axes. Finally, `asp` is used to assign the y/x aspect ratio. There are other arguments that can be passed to `plot()`. You can google these if you're interested or use `?par`; one we will be particuarly interested in is `col` which assigns a color.
Now let's do a basic plotting example. We want to plot the average response of wildtype cells versus time. We will need to subset the average response of wt and time from the data frame and plot them. We can do this using the `$` operator.
```{r}
plot(x = data$Time, y = data$wt.avg)
```
\pagebreak
We also want to give the plot a title and axis labels.
```{r}
plot(x = data$Time,
y = data$wt.avg,
main = "Yeast Response to Pheromone",
xlab = "Time (minutes)",
ylab = "Mean pheromone response")
```
\pagebreak
In order to add more data sets to the existing plot we need to use the `lines` or the `points` functions. The use of the `points` function to compare the response of wildtype to the gpa1 mutant is shown below.
```{r}
plot(x = data$Time,
y = data$wt.avg,
main = "Yeast Response to Pheromone",
xlab = "Time (minutes)",
ylab = "Mean pheromone response")
points(x = data$Time, y = data$gpa1.avg)
```
We can also change the color of the points and add a legend to be able to tell which is which. The legend is a little more complicated. We first specify the location `"bottomright"`, then we include a vector with the label for each legend entry `c("wildtype","gpa1 mutant")`. `lty = 0` indicates that the line type is zero, or there is not line. `pch = 1` indicates the point character is 1 - the open circle. Finally, we say that the colors should be black and red respectively with `col = c("black","red")`.
```{r}
plot(x = data$Time,
y = data$wt.avg,
main = "Yeast Response to Pheromone",
xlab = "Time (minutes)",
ylab = "Mean pheromone response")
points(x = data$Time, y = data$gpa1.avg, col = "red")
legend(x = "bottomright",
legend = c("wildtype","gpa1 mutant"),
lty = 0,
pch = 1,
col = c("black","red"))
```
### Small group activities
The following questions are to be worked on in a small group. Always make sure that your plots have proper titles and axis labels.
1. Plot the average pheromone response for all three cell types. Plot them as lines, not points (remember: the argument `type` for `plot` and the function `lines` which is similar to `points`). Make each cell type a different color and create a legend to match.
2. Plot the standard deviation of the pheromone response for all three cell types. Plot them as points making each cell type a different color. Create a legend to match.
3. Noise in cells is often quantified by the coefficient of variation (CV) because it normalizes the standard deviation for differences in the mean. CV is defined as $CV=\sigma/\mu$ where $\sigma$ is the standard deviation and $\mu$ is mean. Create new variables in the data frame for the CV for each cell type. (Rememeber: you can use subsetting to assign new variables). Then create a plot of the CV for all cell types following the same guidelines as in questions 1 and 2.
\pagebreak
### Homework
The following questions are to be done for homework. They require the data in the file "heights.csv" which includes the gender, height, handedness, and eye color. To answer these questions you will use different types of plots including histograms, bar plots, and box plots. You can learn about all the required arguments by using the help section.
1. Make a histogram of height using the `hist()` function. Include an appropriate title and axis labels.
2. Now make the same histogram but include 20 bins and make it your favorite color.
3. Make a bar chart of how many people are left or right handed using the `barplot()` function. Hint: `data$Handedness[data$Handedness=="Left"]` will return a vector including on those entries where `data$Handedness` is `"Left"` and `length()` will return the length of a vector.
4. Make a bar chart of how many people have the following eye colors: blue, brown, other. Make the color of each bar correspond to the eye color (you can choose any color for other).
5. Make a box plot of the height using the `boxplot()` function.