-
Notifications
You must be signed in to change notification settings - Fork 0
/
RM_project_auxiliary_working_space.Rmd
236 lines (189 loc) · 11.1 KB
/
RM_project_auxiliary_working_space.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
---
title: "RM_project_auxillary_working_space"
author: "james c walmsley"
date: "12/16/2016"
output: html_document
---
---
title: "Regression Models Project - Motor Trend Data 'mtcars' Miles Per Gallon Analysis"
author: "james c walmsley"
date: "12/1/2016"
output:
pdf_document:
fig_caption: yes
fig_height: 5.5
---
---
title: "Regression Models Course Project - Motor Trend Data Set - 'mtcars' Miles Per Gallon Ratings Analysis"
author: "james c walmsley"
date: "12/1/2016"
output:
html_document: default
pdf_document:
fig_caption: yes
fig_height: 5.5
---
Executive Summary: √
Using a braod range of linear regressoin model variations including the step function to gauge
best model fit we idendified a model using (wt + qsec + am) that provides an 84.966 R^2 value
inidicating a reasonalby goo fit which practically matches a slightly different fit that was
devloped using the manual nested approach followed by and anove table test to check for
multicollinearity which also produced a reasonably good fit with an R^2 value of 85.13%.
It must also be noted that vehicle weight is highly correlated (-86.77%) with mpg ratings and
transmission type is relatively highly correlated with vehicle wieght at (69%).
Problem Statement: √
Backround information, problem statement & questions of interest:
Background situation:
As a member of a team of data analysts for the Motor Trend Magazine we have been given
a data set called "mtcars" and asked to answer some questions of interest concerning
differences between automatic and manual transmissoin types in regards to associated
mpg or miles per gallon ratings within the given data set.
Assumptions:
The given data set (a sample of a larger population) for this analysis consists of (iid)
independent and identically distrubuted random varialbles for 32 subjects (vehicles) of
11 observations or variables.
Questions of interest for Motor Trend Magazine:
Q1 “Is an automatic or manual transmission better for 'mpg'”
or which type of tramsmission is associated with better mph or gas mileage ratings?
A1. The mean "MPG" rating of all vehicle models including both transmission types
is 20.091 mpg with a 95% confint of 17.917mpg to 22.263mpg.
Q2 "Quantify the MPG difference between automatic and manual transmissions"
or assuming there is an associated difference in mpg ratings between manual and
automatic type transmissions then: What is the expected difference and how accurate
is this estimate based on the given data?
A2. The mean "MPG" of models with automatic transmisions is 17.147 mpg, and those
with manual transmisions 24.392 mpg for a difference of 7.24 mpg in favor of the
manual transmission in the given data set.
Analysis Considerations: √
Descriptive - any(is.na()), str() & summary(),
Exploratory - pairsPlots(), histograms(), boxPlots(), barPlots() QQ_Plots & multiple plots
Regression Models Analysis -
OLS, SLR, BiVariate Regression, Multivariate Linear Regression, Heatmaps,
HCL, PCA, SVD, Mean, T-Test, Z-Test, covariance, OLS, regression to mean (-1),
simple linear regression, statistical linear regression, multivariable
regression, logit & model selection, adjustments, residuals (predict fit, residual
fit (-1)), hatvalues, variation, & dfbetas, R^2, diagnostics; ANOVA, GLMs &
Binary GLMs, coeficients, correlation, confidence intervals, Cooks Distance,
ChiSq-Test, VIF, binary, binomial, poisson, influence & leverage, Odds & OddRatio,
Inferential & Predictive, Causal ~ NA, Mechanistic ~ NA
Diagnostics
See section on Diagnostics
Final Model Selection strategy
Beginning with the simmple linear regression using just one predictor (am),
Use a bivariate model, Use a multivariate model, Use an intercept adjusted multivariate model
Use a multivariate model removing a suspected key regressor, Use the nested multivariate process
Use the step(function) both directions process, Usee different combinations and leave out from
the results of the step(function), Choose the best fit which is understandable and easy to explian
Technical Environment: √
Environment:
System - session Info
Set the Working Directory
Record the System & Session Info
Check which packages have beeb installed
```{r, SetDirectory, echo=TRUE, eval=TRUE, include=FALSE, results='hide'}
setwd("~/Desktop/Coursera_R/7_Regression Models/RM_proj_MPG_MotorTrendData");sessionInfo();installed.packages();date()
```
Raw Data: √
Clean up work space, import the data & check for missing values
Overview: Motor Trend 'mtcars' data set:
```{r, Data_mtcars, include=FALSE, results='hide'}
rm(list=ls());library(car);library(dplyr);library(MASS);data("mtcars");any(is.na(mtcars))
```
A data frame with 32 observations on 11 variables.
[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders (4,6,8)
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time
[, 8] vs V/S (0 = vee-block, 1 = straight-block)
[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears (3:5)
[,11] carb Number of carburetors (1:4,6,8)
Processed Data: √
Factor columns 2 & 8:11 (cyl,vs,am,gear,carb) so there values can be used as levels
```{r, ProcessData, echo=TRUE, eval=TRUE, cache=TRUE, message=FALSE, warning=FALSE, include=FALSE, results='hide'}
data("mtcars")
for(i in c(2,8:11))mtcars[,i] <- as.factor(mtcars[,i]);
str(mtcars)
```
Descriptive Statistics: √
```{r, DescriptiveStats, echo=TRUE, eval=FALSE, cache=TRUE, message=FALSE, warning=FALSE, include=FALSE, results='hide'}
library(datasets);library(dplyr);data("mtcars");head(mtcars)
head(mtcars,4);mean(mtcars$mpg);sd(mtcars$mpg)
t.test(mtcars$mpg);round(t.test(mtcars$mpg)$conf.int,3)
mtcars0 <- mtcars[mtcars$am==0,];mtcars0;t.test(mtcars0$mpg)
mtcars1 <- mtcars[mtcars$am==1,];mtcars1;t.test(mtcars1$mpg)
c4 <- mtcars$mpg[mtcars$cyl==4];c4;t.test(c4,c6,var.equal = TRUE)
c6 <- mtcars$mpg[mtcars$cyl==6];c6
c8 <- mtcars$mpg[mtcars$cyl==8];c8;t.test(c6,c8,var.equal = TRUE)
```
Exploratory Analysis: √
See Appendix A. Figures (pairs-plot, histogram, box-plot)
Statistical Modeling: √
Multivarite Linear Model Finding Best Fit with Step function:
```{r ModelFitStep, echo=TRUE, eval=FALSE, message=FALSE, warning=FALSE, cache=TRUE, include=TRUE, results='asis'}
library(stats);library(MASS)
fstp <- lm(mpg ~ ., data = mtcars)
stp <- step(fstp, trace = FALSE)
coef(summary(stp))
summary(stp)$r.squared
```
Preliminary findings:
# Quesions of interest: & interpretation of results:
A Revisit the Question - Considering all regressors:
A. Is an automatic or manual transmission better for mpg
The results of using multiple linear regression techniques
sugggest that manual transmissions are associated with better
mpg ratings than automatic transmissions
B. Quantify the MPG difference between automatic and manual transmissions
On average manual transmissions provides 24.39 mpg which is 7.24 mpg
more than the 17.15 mpg average of the automatic transmission models
B Primary result
A. Are any other regressors significantly correlated with mpg rating?
a. model fnm6 = factor(am) + cyl + disp + hp + drat + wt
this model has an R^2 value of 85.13%
B. Further testing
a. using the step function in both directions selects wt, qsec and am
as good predictors with an 84.96% R^2 value indicating very good
predictability using this set of regresssors
C Direction, Magnitude, Uncertainty
A.
# Multivarite Linear Model Finding Best Fit with Step function:
```{r MVLM_Step_R^2, echo=TRUE, eval=TRUE, cache=TRUE, include=TRUE, results='asis'}
library(stats);library(MASS)
fstp <- lm(mpg ~ ., data = mtcars)
stp <- step(fstp, trace = FALSE)
coef(summary(stp))
summary(stp)$r.squared
par(mfrow = c(1, 1), mar = c(4,4,4,2))
g <- ggplot(mtcars, aes(x = (wt + qsec + am), y = mpg),)
g <- g + xlab("Model stp = (wt + qsec + am)")
g <- g + ylab("Miles per Gallon")
g <- g + geom_smooth(aes(method = "lm", shape = factor(am)))
g <- g + geom_point(aes(shape = factor(am), size=qsec, colour=wt))
g
```
D Context
A. It should be noted that vehicle weight has a strong negative correlation
to mpg ratings (-86.76%) and the weight of vehicle models with manual transmissions
range from 1.513tons to 3.570 tons and the weight of vehicles with automatic
transmissions range from 2.465 tons to 5.424 tons
E Implications - Congruence with existing knowledge?
A. Sedan, Sports, Luxury
Generally accepted expectations of mpg ratings are that sports and luxury models
typically will have lower mpg ratings than sedans
Diagnostics: √
Diagnostic tests were conducted on model results in accordance with the plan for analysis
considerations. Besides several vehicle models exhibiting leverage on the model fit at the
high end of the qsec, and weight scales results were generally as expected with faster and
or heavier vehilces of both manual and automatic transmissoin types getting lower mpg
ratings and slower lighter vehicle models of both transmission types exhibiting better
or higher mpg ratings.
Hypothesis Test: ?
H0 = mean(automatic transmission)mpg = mean(manual transmission)mpg
Ha = mean(automatic transmission)mpg != mean(manual transmission)mpg
Inference & Prediction: ?
Interpretation of Results: #
Appendix A. Figures: √ need to review again possibly condense to one plot