This was a project that I have done in my first year of university in 2021, and it served as my first interaction with coding and machine learning.
- My professors, Prof Denis and Prof Yuting, for guiding us closely and teaching us the fundementals of coding and machine learning.
- The student creating the dataset, as this data was taken from the student for our usage.
- Data: Contains the raw data files as well as the cleaned data used for analysis.
- Main Report: Includes the final report detailing the analysis, findings, and conclusions.
- Plots: Stores all the visualisations and plots created during the analysis.
- HDB_Resale.R: The main R script that performs data cleaning, analysis, and visualisation.
To replicate the analysis, follow these steps:
-
Clone the repository:
git clone https://github.com/zhyoung17/hdb-ml.git
-
Navigate to the project directory:
cd hdb-ml
-
Install the required R packages. You can use the following command in R to install necessary packages:
install.packages(c("dplyr", "ggplot2", "readr", "lubridate"))
-
Open the
HDB_Resale.R
script in RStudio or your preferred R environment and run the script to perform the analysis.
The analysis includes the following steps:
- Data Cleaning: Process and clean the raw data to make it suitable for analysis.
- Exploratory Data Analysis (EDA): Generate summary statistics and visualisations to understand the distribution and relationships between variables.
- Modeling: Develop statistical models to predict HDB resale prices based on the features.
- Visualisation: Create plots to visualise the results and insights from the analysis.
The main findings and insights from the analysis are documented in the Main Report
directory. The report includes detailed explanations, visualisations, and conclusions drawn from the data.
I intend to bring this project over to a static page on GitHub Pages within the next few months.