-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathobjectsndata.Rmd
411 lines (290 loc) · 13.5 KB
/
objectsndata.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
---
title: "Objects and Data Frames"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Download the script [here](https://raw.githubusercontent.com/BeSeLuFri/RforISG/master/Files/Course%20Scripts/objectsndata_clean.R).
Solutions can be downloaded [here](https://raw.githubusercontent.com/BeSeLuFri/RforISG/master/Files/Course%20Scripts/objectsndata_solutions.R) - but try to solve everything without the solutions first!
***
# Objects in R: Basic Calculations
We will most often want to store results of calculations to reuse them later. For this, we can work with basic objects. An object has a name and a content. We can freely choose the name of an object and give certain rules - they have to start with a letter and include only letters, numbers and some special characters (".", "_", "-"). **R is case sensitive so "x" and "X" are different object names**.
The content of an object is assigned using "<-" or "=" (for tidy readability, you should always use the former, though).
In order to assign the value of 5 to an object with name x do
```{r}
x <- 5
# Which corresponds to
x = 5
```
If you already had an object named x before and want to change it, you can simply overwrite the old version.
<span style="color:red">Task</span> <u>**Assign the value of 10 to the object x**</u>
```{r Assign the value of 10 to the object x, eval=FALSE}
x
```
Similarly, you can use an object to create other objects.
<span style="color:red">Task</span> <u>**Multiply x with 5 and save it as b**</u>
```{r Multiply x with 5 and save it as b, eval=FALSE}
b <-
```
<span style="color:red">Task</span> <u>**Print b by simply typing its name in the script**</u>
```{r Print b by simply typing its name in the script}
```
In R Studio all the object names are also shown in the "Workspace" window on the top right side.
If we want to delete certain objects we can do so with the function ```rm()``` (remove). ```rm(list=ls())``` removes all objects currently saved in the list environment.
```{r}
# Change and Delete objects:
rm(x) # Deletes an object
rm(list=ls()) # all objects are removed
```
***
# Everything in R Is an Object
Coming from excel or stata, you might think that objects are objects storing data. However, everything in R is an object: functions, symbols, and even R expressions: functions, symbols, and even R expressions (see [R in a Nutshell](https://www.oreilly.com/library/view/r-in-a/9781449358204/ch05s06.html)).
For example, function names in R are really symbol objects that point to function objects. (That relationship is, in turn, stored in an environment object.) You can assign a symbol to refer to a numeric object and then change the symbol to refer to a function:
<span style="color:red">Task</span> <u>**Try to understand the different meanigs of x!**</u>
```{r, eval=FALSE}
# Basic object creation
x <- 1
x
# Not working why
x(2)
# Making it work
x <- function(i) i^2
x
x(2)
```
***
# Vectors
For statistical calculations, we obviously need to work with data sets including many numbers of instead of scalars. The simplest way to collect many numbers (or other
types of information) is called a vector in R terminology (you have already been familiarized with vectors on the page before).
To define a vector, we can collect different values using ```c(value1, value2,...)```.
```{r}
# Examples
a <- c(1, 2, 3, 4)
```
<span style="color:red">Task</span> <u>**Do you remember a different (easier) way to build the vector a?**</u>
```{r, eval=FALSE}
a <-
```
<span style="color:red">Task</span> <u>**What happens if we add 1 to a?**</u>
```{r}
b <- a + 1
b
```
R is object-oriented programming --> accordingly, it follows standard math rules.
```{r}
c <- sqrt(a + b * 2)
c
```
Important R functions for vectors:
<span style="color:red">Task</span> <u>**Do you understand all of these functions?**</u>
```{r, eval=FALSE}
# Important R functions for vectors:
# Basic functions:
sort(a)
length(a)
min(a)
max(a)
sum(a)
prod(a)
# Creating special vectors:
rep(1, 20)
seq(50)
5:15
seq(4, 20, 2)
```
***
## Logical Operators and Logical Vectors
<span style="color:red">Task</span> <u>**Do you understand all of these operators?**</u>
```{r eval=FALSE}
x <- 10
y <- 9
x == y # x is equal to y
x > y # x is bigger then y
x y # HOW DO YOU TYPE: x is smaller or equal to y
x != y # x is NOT equal to y
```
The contents of R vectors do not need to be numeric. A simple example of a different type are *character* vectors. For handling them, the contents simply need to be
enclosed in quotation marks:
```{r}
cities = c("Friedrichshafen", "Paris", "Tokio", "Tettnang", "Mailand")
cities
```
Another useful type are **logical vectors**. Each element can only take one of two values: "TRUE" or "FALSE". Internally, "FALSE" corresponds to 0 and "TRUE" to 1.
```{r}
a <- TRUE; b <- FALSE # note the functionality of the semicolon
a | b # Either a or b is TRUE
a & b # Both a and b are TRUE
a <- c(7, 5, 2, 6, 9, 4, 1, 3)
b <- a < 3 | a >= 6
b
```
**Summary**: Vectors can have three different classes: "numeric", "logical", "character":
```{r}
x <- c(1050, 35, 2.3, 2 ^ 2, -10, sqrt(81)) # numeric vector
y <- c(TRUE, FALSE, TRUE, FALSE) # logical vector
z <- c("zero", "four", "nine") # character vector
```
One minor (seldomly important) addendum: Within numeric, there is a difference between `1` and `1.0`. The former is a pure `integer`, the latter is an object called `double`.
```{r}
# create a string of double-precision values
dbl_var <- c(1, 2.5, 4.5)
typeof(dbl_var)
# placing an L after the values creates a string of integers
int_var <- c(1L, 6L, 10L)
typeof(int_var)
```
***
## Naming and Indexing Vectors
The elements of a vector can be named which can increase the readability of the output. Given a vector ```vec``` and a string vector ```namevec``` of the same length, the names are attached to the vector elements using ```names(vec) <- namevec```.
If we want to access a single element or a subset form a vector, we can work with **indices**. They are written in square brackets next to the vector name. For example
````myvector[4]``` returns the 4th element of ```myvector``` and ```myvector[6] <- 8``` changes the 6th element to take the value of 8. If the vector elements have
names, we can also use those as indices like in ```myvector["elementname"]```
<span style="color:red">Task</span> <u>**Complete the empty spaces!**</u>
```{r, eval=FALSE}
# Create a vector "kicker.skills":
kicker.skills <- c(0.2, 4, 3, 10, 5,-2)
# Create a string vector of names:
players <- c("Katja", "Hans", "Theresa", "Sabine", "Katrin", "Marco")
# Assign names to vector and display vector:
names(kicker.skills) <- players
kicker.skills
# Indices by number:
kicker.skills[_] # GIVE second entry
kicker.skills[_:4] # SELECT 2nd to 4th entry
# Indices by name:
kicker.skills["Marco"] # WHAT is the level of Katja's kicker skills?
# Logical indices:
# SUBSET for an all-star team, only kicker players with skill level above 2
kicker.skills[kicker.skills __ _ ]
# Who is the worst kicker player?
kicker.skills[_ == min(_)] # DO you understand every part of this line?
```
***
# Data Frames
A data frame is an object that collects several variables and can be thought of as a rectangular shape with the rows representing the observational units and the columns
representing the variables. As such, it is similar to a matrix (see below). For us, the most important difference to a matrix is that a data frame can contain variables of different
types (like numerical, logical, string and factor), whereas matrices can only contain values of one type.
Unlike a matrix, the columns always contain names which represent the variables. We can define a data frame from scratch by using the command ```data.frame```.
```{r, eval=FALSE}
# Define one x vector for the days:
days <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
# When are Eugen, Julia, and Lena in the office?
Eugen <-
c(0, 0, 1, 1, 0)
Julia <- c(1, 1, 0, 0, 1)
Lena <- c(0, 0, 0, 1, 1)
presence <- data.frame(days, Eugen, Julia, Lena)
```
If you check the Environment pane, you see that ```presence``` is described as *5 obs. of 4 variables*. Again, every column is an individual variable.
To index certain entries in a data frame use `[]` behind the name of a data frame object. Specifically: `data.frame.name[ROWnumber(s) , COLUMN-number/-name(s)]`. All two dimensional objects in R follow the logic: Row **,** Column.
<span style="color:red">Task</span> <u>**Complete the empty spaces!**</u>
```{r, eval=FALSE}
# Show the entry in the 4th row and the second column of presence
presence[__ , __]
# Show all entries in the third row
presence[__ , __]
# Show all entries of the third column
presence[__ , __]
# Show all entries of the variable Julia
presence[__ , __]
# Show all entries of the variables days, Julia and Lena
presence[__ , __________]
```
We can address a single variable var of a data frame df using the matrix-like syntax ```df[, "var"]``` or by stating ```df$var```. This can be used for extracting the values of a variable but also for creating new variables. Sometimes, it is convenient not to have to type the name of the data frame several times within a command. The function ```with(df, some expression using vars of df``` can help.
<span style="color:red">Task</span> <u>**Complete the empty spaces!**</u>
```{r, eval=FALSE}
# Accessing a single variable:
presence$Eugen
# GENERATE a new variable total.presence which sums up the total of people who are present on each day:
presence$total.presence <- presence$Eugen + ______ + ______
presence
# The same but using "with":
presence$total.presence2 <- with(presence, Eugen+Julia+Lena)
```
Sometimes, we do not want to work with a whole data set but only with a subset. This can be easily achieved with the command ````subset(df, criterion)```, where *criterion* is a logical expression which evaluetes to TRUE for the rows which are to be selected.
<span style="color:red">Task</span> <u>**Complete the empty spaces!**</u>
```{r, eval=FALSE}
# Subset: all days in which Lena is present
subset(presence, __==___)
```
***
# Other object types
The following three object types are of secondary importance for our tutorial but nevertheless you should become familiar with them over time.
## Factors
Seemingly similar to vectors but behaving (sometimes) differently: factors. Factors are very useful to work with binary variables.
<span style="color:red">Task</span> <u>**What is the difference between xf1 and xf2?**</u>
```{r}
# Costumer Ratings
x <- c(3, 2, 2, 3, 1, 2, 3, 2, 1, 2)
xf1 <- factor(x, labels = c("bad", "okay", "good"), ordered = TRUE)
xf2 <- factor(x, labels = c("bad", "okay", "good"))
```
If they are ordered, factors can be seen as an ordinal variable (e.g. in the latter case, values are ranked). Under the hood, factors behave like integers.
<span style="color:red">Task</span> <u>**Practice more **</u>
```{r, eval=FALSE}
# variable gender with 20 "male" entries and 30 "female" entries
gender <- c(rep("male", _ ), rep(_, 30)_ # COMPLETE MISSING PARTS
gender <- _ # CREATE gender as factor
# R now treats gender as a nominal variable
summary(gender)
```
***
## Matrices
Matrices are important tools for data analysis. R has a powerful matrix algebra system. Most often, matrices will be generated from an existing data set. But you can
also build the from scratch with ```matrix(vec, nrow=m)``` (takes the numbers stored in vector ```vec``` and puts them into a matrix with m rows).
Other options include: ```rbind(r1,r2)``` and ```cbind(c1,c2)``` in binding several vectors (which obviously need to have the same length) by row or column.
<span style="color:red">Task</span> <u>**Complete the empty spaces!**</u>
```{r, eval=FALSE}
# GENERATE matrix A from one vector with two rows
v <- c("AG.5", "G2", "AG.5", "G1", "AG.5", "G3")
A <- matrix(v, _ = 2)
# GENERATE matrix A.row in binding vec1 and vec2 AND generate matrix A.col in binding column wise:
vec1 <- c("AG.5", "AG.5", "AG.5")
vec2 <- c("G2", "G1", "G3")
A.row <- _____(vec1, vec2)
A.col <- ______
# Giving names to rows and columns:
colnames(A.col) <- c("Katrin", "Hans")
rownames(A.col) <- c("Monday", "Wednesday", "Friday")
A.col
```
**Core rule** for indexing all two dimensional objects in R: [row , column]
<span style="color:red">Task</span> <u>**Complete the empty spaces!**</u>
```{r, eval=FALSE}
# EXTRACT value of first column and 2nd row
A.col[_ , _]
# GIVE all meal choices from Hans:
A.col[ , ]
# WHAT does Katrin have for lunch on Monday? call Katrin and Monday by name
A.col[_ , _]
# WHAT does Katrin have on Monday and Friday?
A.col[c(_ , _) , _]
```
Some quick matrix algebra:
```{r, eval=FALSE}
# Element wise multiplication (not matrix multiplication but multiplying elements at same place)
A <- matrix(c(2, -4, -1, 5, 7, 0), nrow = 2)
B <- matrix(c(2, 1, 0, 3, -1, 5), nrow = 2)
A * B
# Matrix multiplication:
(D <- A %*% C)
# Transpose:
(C <- t(B))
# Inverse:
solve(D)
```
***
## Lists
A list is a generic collection of objects. Unlike vectors, components can be of different types.
```{r}
# Generate a list object:
mylist <- list(A = seq(8, 36, 4),
product = "Fruechtekorb",
idm = diag(3))
# Print whole list:
mylist
# Vector of names:
names(mylist)
# Print component "A":
mylist$A
```