F1_ML

The goal of this project is to build a machine learning model to predict the Formula One World Constructors’ Championship Standings for the upcoming 2023 season.

The following listed files include all my hard code and model building steps. My entire project, including the thorough thought and work process, is neatly explained and presented here.

R packages used:

tidyverse	tidymodels	parsnip	kknn	recipes	workflows	glmnet	magrittr	ranger
naniar	visdat	dplyr	ggplot2	ggthemes	corrplot	vip	themis	kableExtra	ISLR

Some of these packages are necessary for the model building process, while others are for concise and convenient coding and visual presentation experience.

The following files are a representation of my overall workflow. I put raw code in .R script files and saved important arguments or variables for later use in the correspondingly named .rda files.

`read_data.R`

R script file read_data.R includes code used to read in csv files.

`modify_data.R`

This modify_data.R file includes code used to manipulate and join the data sets. Inital data cleaning is also executed in this R script file, which can range from converting timestamps into workable numeric variables to streamlining several related variables into one useful parameter.

`eda.R`

Exploratory data analysis code is included in R script file eda.R. This file includes code used to do further cleaning with a focus on missing data. This file also includes some visual exploratory data analysis, mostly looking at possible surface level trends and relationships between variables, which provides some good beginning insight before considering potential models.

`model_building2.R`

This file includes steps to set up the machine learning models. This involves training and testing data splits and building a recipe with the desired response variable and predictors. Using the recipe() tidymodels function allows us to dummy code categorical predictors and impute missing values in the predictors within the step of creating the recipe. I further set up k-fold cross validation and apply different machine learning models to the recipe. I developed the following models to have a thorough discussion of the truly best fitting model.

linear	polynomial regression	k nearest neighbors (knn)	elastic net linear regression
elastic net with lasso regression	elastic net with ridge regression	random forest

To build the models, we use the following steps:

set up each model with tuning parameter, the engine, and regression mdoe
set up a workflow() with each model and the recipe
set up a tuning grid with grid_regular() and levels for tuning the parameters
tune each model with tune_grid() using the corresponding workflow, k-fold cross validation, and tuning grid
collect root mean squared error (RMSE) metric of tuned models and find the lowest RMSE for each model

`model_results_final.rda`

The corresponding .R script file is not in here, but the results are saved in this .rda file. I analyzed the performance of the more noteworthy models: elastic net, polynomial regression, knn, and random forest. For a thorough explanation and interpretation of the parameters and performance of these models, refer to the completed presentation here.

After analyzing RMSE depending on the tuning parameters, I conclude the random forest model with parameters mtry=5, trees=400, and min_n=20 is the best performing model. I use that model to fit on the testing split and once again analyze the RMSE.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitattributes		.gitattributes
README.md		README.md
Rmd_code		Rmd_code
eda.R		eda.R
eda.rda		eda.rda
index.html		index.html
model_building2.R		model_building2.R
model_building_final.rda		model_building_final.rda
model_results_final.rda		model_results_final.rda
modify_data.R		modify_data.R
modify_data.rda		modify_data.rda
read_data.R		read_data.R
read_data.rda		read_data.rda

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

F1_ML

R packages used:

`read_data.R`

`modify_data.R`

`eda.R`

`model_building2.R`

`model_results_final.rda`

About

Releases

Packages

Languages

liang-sarah/F1_ML

Folders and files

Latest commit

History

Repository files navigation

F1_ML

R packages used:

read_data.R

modify_data.R

eda.R

model_building2.R

model_results_final.rda

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`read_data.R`

`modify_data.R`

`eda.R`

`model_building2.R`

`model_results_final.rda`

Packages