Skip to content

A statistical study to compare vehicle performance of the MechaCar vehicles against other manufacturers. Perform multiple linear regression analaysis , T-tests and collect summary statistics on various parameters.

Notifications You must be signed in to change notification settings

Cryptotwister/MechaCar_Statistical_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MechaCar_Statistical_Analysis

Jeremy and the data analytics teamof AutoRUs' is assigned to perform data analysis on a new prototype MechaCar, which is suffering from production troubles that are blocking the manufacturing team’s progress. Analysis should be conducted to review the production data for insights that may help the manufacturing team.

In this assignment, we need to assist Jeremy with the following:

  • Perform multiple linear regression analysis to identify which variables in the dataset predict the mpg of MechaCar prototypes;
  • Collect summary statistics on the pounds per square inch (PSI) of the suspension coils from the manufacturing lots;
  • Run t-tests to determine if the manufacturing lots are different from the population mean;
  • Design a statistical study to compare MechaCar vehicles to the competition.

** The R script for the statistical analysis can be found here **

Linear Regression to Predict MPG

Deliverable 1 - Linear Regression

  1. Which variables/coefficients provided a non-random amount of variance to the mpg values in the dataset? The Pr(>|t|) value for vehicle_length and ground_clearance are well below .05 of significance level, therefore we can say that both vehicle_lenght and ground_clearance provided non-random amount of variance to mpg values. Meaning, they have a significant impact on miles per gallon (mpg) for the MechaCar protypes. The other variables - vehicle_weight, spoiler angle and AWD, do appear to contribute a random amount of variance to the linear model, which is presented in their probability coefficient being larger than .05.

  2. Is the slope of the linear model considered to be zero? Why or why not? The slope of the linear model is not considered to be zero, because the linear regression shows that some of the independent variables had a significant effect on the dependent variable.

  3. Does this linear model predict mpg of MechaCar prototypes effectively? Why or why not? In current linear regression r-squared is 0.7149, which means that the linear model predicts the miles per gallon (mpg) of MechaCar prototypes effectively approximately 71.4% of the time.

Summary Statistics on Suspension Coils

  • The design specifications for the MechaCar suspension coils dictate that the variance of the suspension coils must not exceed 100 pounds per square inch. Does the current manufacturing data meet this design specification for all manufacturing lots in total and each lot individually? Why or why not?

  • Total Summary Table

Deliverable 2 - Total Summary table

Based on the total_summary dataframe that was created, we can see that The current manufacturing data meets this design specification for all manufacturing lots in total as the variance of the suspension coils for all three lots is 62.29356 PSI.

  • Lot Summary table

Deliverable 2 - Lot Summary table

Based on a per lot basis however, we can see in the lot_summary dataframe, that the variance does differ by lot number. Lot 1 and Lot 2 have a variance of 0.9795918 and 7.4693878 respectively, which both fall within the variance tolerance of 100-psi. Lot 3 however has a much higher variance of 170.2861224, which exceeds the variance tolerance of 100 PSI, which means that Lot 3 has to be removed from production.

T-Test on Suspension Coils

We wrote an RScript using the t.test() function to determine if the PSI across all manufacturing lots is statistically different from the population mean of 1,500 pounds per square inch.

Deliverable 3 - TTest_total

The output above indicates that the suspension coils across all manufacturing lots are not statistically different from the population mean. The p-value is 0.06028 which is greater than 0.05 so we fail to reject the null hypothesis, as there is not enough evidence to support its rejection.

  • Lot1. The results of the T-test for the suspension coils against manufacturing Lot1 shows that there is no statistical difference with the population mean. The p-value is 1 which is greater than 0.05 so we fail to reject the null hypothesis.

Deliverable 3 - TTest_LOT1

  • Lot2. The results of the T-test for the suspension coils against manufacturing Lot1 shows that there is no statistical difference with the population mean. The p-value is 0.6072 which is greater than 0.05 so we fail to reject the null hypothesis.

Deliverable 3 - TTest_LOT2

  • Lot3. The results of the T-test for the suspension coils against manufacturing Lot3 shows that there is a statistical difference from the population mean. The p-value is 0.04168 which is less than 0.05 so we reject the null hypothesis as the evidence suggests that there is a very weak correlation between the sample PSI of Lot 3 and the population mean.

Deliverable 3 - TTest_LOT3

The T tests confirm our belief that something is very odd with Lot3 which needs to be investigated and it might be one of the reasons for the problems that MechaCar is experiencing.

Study Design: Comparing the MechaCar to the Competition

To design a statistical study which will allow us to compare performance of the MechaCar vehicles against performance of vehicles from other manufacturers, following metrics should be considered:

  • Cost;
  • City or highway fuel efficiency;
  • Horse power;
  • Maintenance cost;
  • Safety rating among others.

Possible Hypotheses:

  1. Null Hypothesis - Comparable MechaCar model has a better city fuel efficiency.
  2. Alternative Hypothesis - Comparable MechaCar model does not hav a better city fuel efficiency.

Statistical Test:

I would recommend using the t-test model to test the statistical difference between the mean of two samples.

Dataset:

A fuel efficiency data from comparable car models should be randomly collected into a sample for an anaysis.

About

A statistical study to compare vehicle performance of the MechaCar vehicles against other manufacturers. Perform multiple linear regression analaysis , T-tests and collect summary statistics on various parameters.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages