RworkshopIII.Rmd

---
title: "Hello, R!"
author: "Yue Hu's R Workshop Series II"
output:
  ioslides_presentation:
    self_contained: yes
    logo: images/logo.gif
    transition: faster
    widescreen: yes
    slidy_presentation:
    incremental: yes
---
# Preface
## What Are Covered in This Workshop Series 
* [A overview of R](https://rpubs.com/sammo3182/Rintro) 
* [Data manipulation (input/output, row/column selections, etc.)](https://rpubs.com/sammo3182/Rintro) 
* **Descriptive and binary hypotheses (summary, correlation, t-test, etc.)**
* **Multiple regression (OLS, GLS, MLM, etc.)**
* Multilevel Regression
* Presentation (table, graph)


# Hypothesis Tests

## package loading
You want the `pacman` package to load multiple packages.

```{r}
pacman::p_load(dplyr)
```

## Data Glimpse
```{r}
data("mtcars")
dplyr::glimpse(mtcars)
```


## Binary Tests: Difference in mean

$H_{0}: \bar{cylinders} = \bar{gears},\ \alpha = .05$

```{r}
t.test(mtcars$cyl, mtcars$gears) 
```

----

`t.test` offers arguments `alternative`, `mu`, `paired`, and `conf.level` for users to change in two-tail/one-tail test, parameter mean, independent/paired comparison, and $\alpha$.

```{r eval=FALSE}
# one side, cyl > gear, alpha = .01
t.test(mtcars$cyl, mtcars$gear,
       alternative = "greater", conf.level = .99)) 

# comparing with the parameter (true value)
t.test(mtcars$cyl, mu = 6)   # the true mean is 6.

```


## Binary Tests: Correlation
$H_{0}: \rho_{(cyl,gear)} = 0,\ \alpha = .05$

```{r}
cor.test(mtcars$cyl, mtcars$gear)
```

----

`cor.test` offers various arguments as in `t.test` for more specific settings. Moreover, users can use the `method` argument to set the method to calculate the correlations, "Pearson", "Kendall", or "Spearman." (<span style="color:green">Tip</span>)

```{r}
cor.test(mtcars$cyl, mtcars$gear, method = "kendall")
```

<div class="notes">
Do I have to type the `mtcars$` every time? 

* No you don't.
    + It offers a potential for cross-dataset operation, though.
    + Use `within()`: e.g., `within(mtcars, cor.test(cyl, gear))`
    + Use `attach()` (not recommonded)
</div>


----

We can get the correlation matrix, too:
```{r}
cor(mtcars[,1:4])
```

## Present the correlations
You want the `corrplot` package.
```{r fig.height=4}
cor(mtcars) %>% corrplot::corrplot()
```

----

Or a mixed format:
```{r}
cor(mtcars) %>% corrplot::corrplot.mixed()
```


## Binary Tests: ANOVA {.smaller}
One way or two way ANOVA: 

```{r}
aov_one <- aov(cyl ~ gear, data = mtcars) #one-way

aov_two <- aov(cyl ~ gear + am, data = mtcars) #two-way

summary(aov_one); summary(aov_two)
```


## Wrap up
* T-test: `t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, conf.level = 0.95, ...)`

* Correlation: `cor.test(x, y, alternative = c("two.sided", "less", "greater"), method = c("pearson", "kendall", "spearman"), conf.level = 0.95, continuity = FALSE, ...) `

* ANOVA: `aov(formula, data = NULL, ...)`

----

Next: Multiple regression

<div class="centered">
![core](http://www.math.yorku.ca/SCS/spida/lm/mreganim3.gif)
</div>


# Multiple Regression
## Ordinary Linear Regression
$Mileage = \beta_0cylinders + \beta_1horsepower + \beta_3weight + \varepsilon$

```{r}
lm_ols <- lm(mpg ~ cyl + hp + wt, data = mtcars)
``` 

* `lm_ols`: Object name
* `mpg`: Dependent variable
* `cyl + hp + wt`: Independent variables
* `data = mtcars`: Where the variables are stored


## Result{.smaller}

```{r} 
summary(lm_ols)
```

## Nonlinear transition
ln, square, exponential, or inverse

```{r}
lm_tran <- lm(log(mpg) ~ I(cyl^2) + exp(hp) + I(1/wt), data = mtcars)
```

* `log(mpg)`: logistic
* `I(cyl^2), I(1/wt)`: square, inverse
* `exp(hp)`: exponential

## The result {.smaller}

```{r}
summary(lm_tran)
```

## Adding binary variables

When the model including binary variables based on a factor 

```{r}
mtcars$gear_f <- factor(mtcars$gear, levels = 3:5, labels = c("3-gear", "4-gear", "5-gear"))
table(mtcars$gear)
table(mtcars$gear_f); class(mtcars$gear_f)
```

## The result {.smaller}

```{r}
lm_f <- lm(mpg ~ cyl + hp + wt + gear_f, data = mtcars)
summary(lm_f)
```


## Interaction
Two-way interaction: horsepower * Weight

```{r}
lm_in <- lm(mpg ~ cyl + hp * wt, data = mtcars)

```

Equivalent to `lm_in2 <- lm(mpg ~ cyl + hp + wt + hp:wt, data = mtcars)`


## The result {.smaller}
```{r}
summary(lm_in)
```


## Post-estimate diagnoses: Residural
  
```{r fig.height=3.5, fig.align="center"}
res <- resid(lm_ols); res[1:4]
plot(lm_ols, which = 1) # residural vs. fitted plot
```

## Post-estimate diagnoses: Outliers
```{r}
car::outlierTest(lm_ols) # Bonferonni p-value for most extreme obs
```

----

```{r}
car::qqPlot(lm_ols)  #qq plot for studentized resid 
```


## Post-estimate diagnoses: CLRM Properties{.build}
* Heteroscedasticity 

```{r}
car::ncvTest(lm_ols) 
```

* Multicollinearity

```{r}
car::vif(lm_ols) 
```

----

Autocorrelation

```{r}
car::durbinWatsonTest(lm_ols)
```


## Logit
$vs = \frac{1}{1 + e^{-(\beta_0 + \beta_1cylinder + \beta_2horsepower + \beta_3weight + \varepsilon)}}$

```{r}
logit <- glm(vs ~ cyl + hp + wt, data = mtcars, family = "binomial")
```

MLE on other distributions: change the value of the argument `family` to `Gamma`, `poisson`, `gaussian`, etc.

## The result{.smaller}

```{r}
summary(logit)
```


## Interpretation: Margin

```{r message=FALSE}
library(mfx)
logit_m <- logitmfx(vs ~ cyl + hp + wt, data = mtcars) 
logit_m
```

## Interpretation: Predicted probability
Predicted Probability when `cyl` changes from 4 to 6.

```{r}
# Step 1: creat an aggregate data 
mtcars_fake <- with(mtcars, data.frame(cyl = 4:6, hp = mean(hp), wt = mean(wt)))
# Step 2: predict based on the new data
logit_pp4 <- cbind(mtcars_fake,predict(logit, newdata = mtcars_fake, type = "link", se = TRUE))
# Step 3: convert to probability 
logit_pp4 <- within(logit_pp4, {pp <- plogis(fit) 
                                lb <- plogis(fit - 1.96 * se.fit)
                                ub <- plogis(fit + 1.96 * se.fit)})
logit_pp4[,7:9]
```


## Wrap Up
* OLS: `lm(Y ~ X, data = data)`
    + Non-linear transformations: `I(X^2)`, `exp(X)`, `log(X)`.
    + Using factor variable: R will handle that for you.
    + Interaction: `lm(Y ~ X * Z, data = data)`.
    + Post-estimate diagnoses: `resid()`, `outlierTest()`, `qqPlot()`, `ncvTest()`, `vif()`, `durbinWatsonTest()`
* Logit: `glm(Y ~ X, data = data, family = "binomial")`
    + Margins: using `mfx::logitmfx`
    + Predict probabilty: 
        + Step 1: create an aggregate data
        + Step 2: predict the log odds
        + Step 3: transfer to probability
        
----

Next: Presenting with R

<div class="centered">

<img src="https://espngrantland.files.wordpress.com/2014/06/9u4jd.gif" height="500" width = "800" />

</div>
 
 
## See you then ~

<div class = "centered">

<img src="http://rescuethepresent.net/tomandjerry/files/2016/05/16-thanks.gif" />

</div>        

## External Sources
* My email: [yue-hu-1@uiowa.edu](mailto: yue-hu-1@uiowa.edu)

* Workshops: http://ppc.uiowa.edu/node/3608
* Consulting service: http://ppc.uiowa.edu/node/3385/
* Q&A Blogs: 
    + http://stackoverflow.com/questions/tagged/r
    + https://stat.ethz.ch/mailman/listinfo/r-help

* Blog for new stuffs: http://www.r-bloggers.com/

* Graph Blogs:
    + http://www.cookbook-r.com/Graphs/
    + http://shiny.stat.ubc.ca/r-graph-catalog/