 
A hands-on project demonstrating regression and classification with multiple machine learning models
ML Models Used:
- Regression: Linear Regression, Random Forest, XGBoost.
- Classification: K-Nearest Neighbours, Support Vector Machine, Neural Network.
All models are implemented in Python.
This project demonstrates how to explore, implement, optimise, and evaluate different machine learning models for regression and classification tasks on real-world datasets. It is designed to provide hands-on experience for beginner-to-intermediate ML developers seeking to understand predictive modelling, data preprocessing, and model evaluation.
The project uses two datasets:
- 
Housing_Dataset_Regression.csv – A shorter, messier version of the California Housing Prices dataset. It includes 9 features and the target variable, median house value, which is to be predicted. 
- 
Titanic_Dataset_Classification.csv – A dertier version of the Titanic Dataset. It includes 10 columns, with Survived as the target variable. This dataset is used for binary classification. 
There are no restrictions on the choice of models, allowing exploration of multiple regression and classification algorithms. The project emphasises:
- Practice in data cleaning and preprocessing.
- Hands-on implementation of multiple ML algorithms.
- Understanding how these algorithms function.
- Recognition of their strengths and weaknesses.
- Comparison of model performance using relevant metrics.
- Application to two real-world inspired datasets.
- Clean, modular Python code suitable for learning and adaptation.
- Data Preprocessing – handled missing values, encoded categorical variables, normalised numerical features, and engineered new ones.
- Model Building – trained three models (Lasso Regressor, Random Forest, XGBoost) for regression; K-Nearest Neighbours, Random Forest, and XGBoost for classification.
- Hyperparameter Tuning – optimised models using cross-validation and grid search / Bayesian optimisation methods.
- Evaluation – compared models using MAE, MSE, RMSE, and R² for regression; accuracy, precision, recall, and F1-score for classification.
- Visualisation – plotted performance metrics and feature importance for model interpretability.
For a more detailed discussion of methodology, results, and analysis, please refer to the full report.
| Model | MAE | MSE | RMSE | R² | 
|---|---|---|---|---|
| Lasso Regressor | 53764.9531 | 5022930825.4830 | 70872.6381 | 0.6958 | 
| Random Forest | 52980.4200 | 5076905004.3424 | 71252.4035 | 0.6925 | 
| XGBoost | 52240.2344 | 4932249600.0000 | 70229.9765 | 0.7012 | 
Table 1. Performance of the regression models on the original dataset.
| Model | MAE | MSE | RMSE | R² | 
|---|---|---|---|---|
| Lasso Regressor | 53764.9531 | 5022930825.4830 | 70872.6381 | 0.6958 | 
| Random Forest | 52980.4200 | 5076905004.3424 | 71252.4035 | 0.6925 | 
| XGBoost | 48990.7109 | 4534239744.0000 | 67336.7637 | 0.7254 | 
Table 2. Performance of the regression models on the cleaned dataset.
Conclusion: All three models demonstrated comparable performance, with the optimised XGBoost model achieving the lowest prediction errors.
| Model | Accuracy | F-1 Score | Precision | Recall | ROC AUC | 
|---|---|---|---|---|---|
| KNN | 0.8071 | 0.7158 | 0.7556 | 0.68 | 0.7789 | 
| SVM | 0.7929 | 0.7010 | 0.7234 | 0.68 | 0.7678 | 
| Neural Network | 0.8214 | 0.7312 | 0.7907 | 0.68 | 0.7900 | 
Table 3. Performance of the basic models on the “Titanic” dataset.
| Model | Accuracy | F-1 Score | Precision | Recall | ROC AUC | 
|---|---|---|---|---|---|
| KNN | 0.8214 | 0.7423 | 0.7659 | 0.72 | 0.7989 | 
| SVM | 0.7929 | 0.7010 | 0.7234 | 0.68 | 0.7678 | 
| Neural Network | 0.8429 | 0.7556 | 0.8500 | 0.68 | 0.8067 | 
Table 4. Performance of the fine-tuned models on the “Titanic” dataset.
Conclusion: The Neural Network consistently outperformed the other models across all evaluation metrics, and the fine-tuned version further improved predictive performance.
This project can be explored in three ways: using the Jupyter Notebook (.ipynb), running the Python script (.py), or experimenting directly in Google Colab.
- Click here.
- Colab will open the notebook and prompt you to save a copy to your own Google Drive.
- You can now edit, run, and experiment in your own copy of this project.
- Clone the repository:
git clone https://github.com/Arslan2003/Exploring_ML_Models_for_Regression_and_Classification.git 
- Navigate to the project folder:
cd Exploring_ML_Models_for_Regression_and_Classification
- Install the required packages using requirements.txtpip install -r requirements.txt 
- Open the Exploring_ML_Models_for_Regression_and_Classification.ipynbusing your Jupyter Notebook or Jupyter Lab
- Clone the repository and install the requirements as above.
- Open the Exploring_ML_Models_for_Regression_and_Classification.pyin your preferred Python IDE (e.g., VSCode, PyCharm).
- Run the script:
python Exploring_ML_Models_for_Regression_and_Classification.py 
After getting access to the notebook, play around with the code!
This project is designed for beginner to intermediate users who want to explore and experiment with machine learning. Feel free to try out different data preprocessing steps, test various models and hyperparameters, explore evaluation metrics, and most importantly - have fun learning!
For more advanced users, contributions such as fixing bugs, adding new models, improving documentation, or suggesting new features are very welcome. If you want to contribute formally:
- Fork the repository first.
- Create a new branch for your feature or fix.
- Ensure your code is well-documented and follows Python best practices.
- Submit a pull request describing your changes clearly.
- Arslan Ishanov – project development, model implementation, optimisation, evaluation, and documentation.
- University of Greenwich – for kindly providing the modified datasets and inspiring the development of this project.
This project is licensed under the MIT License.
- You are free to use, modify, and distribute this code, provided that you include the original copyright and license notice.
- The software is provided "as-is," without any warranty.
See the LICENSE file for full details.