Jeremy and the data analytics teamof AutoRUs' is assigned to perform data analysis on a new prototype MechaCar, which is suffering from production troubles that are blocking the manufacturing team’s progress. Analysis should be conducted to review the production data for insights that may help the manufacturing team.
In this assignment, we need to assist Jeremy with the following:
- Perform multiple linear regression analysis to identify which variables in the dataset predict the mpg of MechaCar prototypes;
- Collect summary statistics on the pounds per square inch (PSI) of the suspension coils from the manufacturing lots;
- Run t-tests to determine if the manufacturing lots are different from the population mean;
- Design a statistical study to compare MechaCar vehicles to the competition.
** The R script for the statistical analysis can be found here **
-
Which variables/coefficients provided a non-random amount of variance to the mpg values in the dataset? The Pr(>|t|) value for vehicle_length and ground_clearance are well below .05 of significance level, therefore we can say that both vehicle_lenght and ground_clearance provided non-random amount of variance to mpg values. Meaning, they have a significant impact on miles per gallon (mpg) for the MechaCar protypes. The other variables - vehicle_weight, spoiler angle and AWD, do appear to contribute a random amount of variance to the linear model, which is presented in their probability coefficient being larger than .05.
-
Is the slope of the linear model considered to be zero? Why or why not? The slope of the linear model is not considered to be zero, because the linear regression shows that some of the independent variables had a significant effect on the dependent variable.
-
Does this linear model predict mpg of MechaCar prototypes effectively? Why or why not? In current linear regression r-squared is 0.7149, which means that the linear model predicts the miles per gallon (mpg) of MechaCar prototypes effectively approximately 71.4% of the time.
-
The design specifications for the MechaCar suspension coils dictate that the variance of the suspension coils must not exceed 100 pounds per square inch. Does the current manufacturing data meet this design specification for all manufacturing lots in total and each lot individually? Why or why not?
-
Total Summary Table
Based on the total_summary dataframe that was created, we can see that The current manufacturing data meets this design specification for all manufacturing lots in total as the variance of the suspension coils for all three lots is 62.29356 PSI.
- Lot Summary table
Based on a per lot basis however, we can see in the lot_summary dataframe, that the variance does differ by lot number. Lot 1 and Lot 2 have a variance of 0.9795918 and 7.4693878 respectively, which both fall within the variance tolerance of 100-psi. Lot 3 however has a much higher variance of 170.2861224, which exceeds the variance tolerance of 100 PSI, which means that Lot 3 has to be removed from production.
We wrote an RScript using the t.test() function to determine if the PSI across all manufacturing lots is statistically different from the population mean of 1,500 pounds per square inch.
The output above indicates that the suspension coils across all manufacturing lots are not statistically different from the population mean. The p-value is 0.06028 which is greater than 0.05 so we fail to reject the null hypothesis, as there is not enough evidence to support its rejection.
- Lot1. The results of the T-test for the suspension coils against manufacturing Lot1 shows that there is no statistical difference with the population mean. The p-value is 1 which is greater than 0.05 so we fail to reject the null hypothesis.
- Lot2. The results of the T-test for the suspension coils against manufacturing Lot1 shows that there is no statistical difference with the population mean. The p-value is 0.6072 which is greater than 0.05 so we fail to reject the null hypothesis.
- Lot3. The results of the T-test for the suspension coils against manufacturing Lot3 shows that there is a statistical difference from the population mean. The p-value is 0.04168 which is less than 0.05 so we reject the null hypothesis as the evidence suggests that there is a very weak correlation between the sample PSI of Lot 3 and the population mean.
The T tests confirm our belief that something is very odd with Lot3 which needs to be investigated and it might be one of the reasons for the problems that MechaCar is experiencing.
To design a statistical study which will allow us to compare performance of the MechaCar vehicles against performance of vehicles from other manufacturers, following metrics should be considered:
- Cost;
- City or highway fuel efficiency;
- Horse power;
- Maintenance cost;
- Safety rating among others.
- Null Hypothesis - Comparable MechaCar model has a better city fuel efficiency.
- Alternative Hypothesis - Comparable MechaCar model does not hav a better city fuel efficiency.
I would recommend using the t-test model to test the statistical difference between the mean of two samples.
A fuel efficiency data from comparable car models should be randomly collected into a sample for an anaysis.