This project involves conducting deep statistical analysis using RStudio on traffic and cake datasets. The study explores relationships between variables using regression models, evaluates model significance with statistical tests, and visualizes results using various plots. The project also includes a comprehensive study report created using RMarkdown.
- Traffic Dataset: Used to analyze relationships between different predictors and the response variable
spi
. - Cake Dataset: Used for a two-way ANOVA study to examine the effects of
Recipe
andTemp
onAngle
.
-
Exploratory Data Analysis (EDA)
- Correlation analysis with correlation matrices and scatterplot matrices.
- Identification of relationships between predictors and the response variable
spi
.
-
Regression Model Fitting
- Full multiple linear regression model
lm(spi ~ ., data = traffic)
. - 95% confidence interval estimation for
weather
impact. - F-test for overall regression significance.
- Full multiple linear regression model
-
Model Diagnostics
- Residual plots and normality tests.
- Finding the best regression model through backward elimination.
- Comparing adjusted R² values to determine optimal model selection.
-
Balanced Study Checking
- Ensuring equal number of replicates across factor levels.
-
Preliminary Data Visualization
- Boxplots and interaction plots to analyze factor relationships.
-
Two-Way ANOVA Analysis
- Interaction model fitting to assess significance.
- Model validation through diagnostic plots.
-
Main Effects Analysis
- Hypothesis testing for
Recipe
andTemp
significance. - Final model selection without interaction term based on p-values.
- Hypothesis testing for
- RStudio for data analysis and statistical modeling.
- Base R functions for data manipulation and visualization.
- ggplot2 & Base R plotting functions for visual representation.
- RMarkdown for creating a reproducible statistical report.
- A significant relationship was found between
spi
and predictorsweather
,transport
, androad
in the traffic dataset. - The final regression model explained 73.8% of variation in
spi
. - The cake dataset analysis found that
Recipe
andTemp
had significant main effects onAngle
, but their interaction was not significant.
- Ensure R and RStudio are installed.
- Place datasets (
traffic.csv
,cake.csv
) inside thedata/
folder. - Open
report.Rmd
in RStudio. - Click Knit to generate the PDF report.