Machine Learning Model Optimization and Evaluation

This repository contains two folders: classification(hc-mci) contains the code and data for classification of participants between HC and MCI, and regression(t-moca) contains the code and data for predicting the T-MoCA score of the participants.

In each folder, there are two Python scripts that perform feature selection optimization and model fitting/prediction on a dataset. Additionally, there is a Jupyter Notebook that documents and contains all the code from both scripts feature_selection.py and fit_predict.py.

Dependencies

The scripts requires the following Python libraries:

numpy
pandas
matplotlib
scikit-learn

Configuration

All scripts allow you to customize the ML models to be evaluated by modifying the estimators list. Each model is represented as a dictionary with the following keys:

model: An instance of the machine learning model
name: A unique identifier for the model

Usage Guidelines

Running the Feature Selection

# Inside ./classification(hc-mci) or ./regression(t-moca)
python feature_selection.py

Ensure you have the required dependencies installed
Ensure input CSV files are in the working directory (digimoca_results_totales.csv and datos_participantes_totales.csv)
Creates a results directory if it doesn't exist
Outputs feature importance plots and best features CSV files

Running the Model Prediction

# Inside ./classification(hc-mci) or ./regression(t-moca)
python fit_predict.py

Requires the feature selection step to be completed first
Uses the best features CSV files from the feature selection step
Generates the scatter plot (regression) or ROC curve (classification)
For each model, calculates and displays the R² and RMSE (regression) or the AUC (classification)

Feature Selection Optimization

The feature_selection.py script performs feature selection optimization on a dataset using various machine learning models.

Features

Supports a variety of machine learning models for feature selection optimization
Calculates permutation importance of each feature to determine their relative importance
Evaluates model performance (R² for regression, AUC for classification) as the number of features is increased
Identifies the optimal number of features for each model
Saves the best features for each model in separate CSV files
Generates a plot for each model, visualizing the feature importances and R²/AUC scores

Output

The script generates the following output:

CSV files containing the best features for each model, saved in the results/ directory with the naming convention best_features_<model_name>.csv.
PNG images showing the feature importance plots for each model, saved in the results/ directory with the naming convention feature_importances_<model_name>.png.

Model Fitting and Prediction

The fit_predict.py script is designed to perform model fitting and prediction on a dataset, using the best features identified in the previous feature selection optimization script.

In the regression case, it generates a scatter plot of the predicted vs. true target values for 4 selected ML models.
In the classification case, it generates a ROC curve plot for each ML model.

Features

Loads the best features for each machine learning model from the previous feature selection optimization script
Performs cross-validation predictions for each model using the best features from results/best_features_<model_name>.csv
REGRESSION
- Generates a 2x2 scatter plot of the predicted vs. true target values for all the models
- Calculates and displays the R² score and Root Mean Squared Error (RMSE) for each model
CLASSIFICATION
- Generates a ROC curve plot for all models
- Calculates and displays the Area Under the Curve (AUC)

Output

The script generates the following output:

REGRESSION: A PNG image showing the scatter plot of the predicted vs. true target values for 4 selected models, saved in results/scatter_plot_<model_name_1>_<model_name_2>_..._.png.
CLASSIFICATION: A PNG image showing the ROC curve plot for all models, saved in results/roc_<model_name_1>_<model_name_2>_..._.png.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
classification(hc-mci)		classification(hc-mci)
regression(t-moca)		regression(t-moca)
LEEME.md		LEEME.md
LICENSE		LICENSE
README.md		README.md
pipeline_flowchart.png		pipeline_flowchart.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Model Optimization and Evaluation

Dependencies

Configuration

Usage Guidelines

Running the Feature Selection

Running the Model Prediction

Feature Selection Optimization

Features

Output

Model Fitting and Prediction

Features

Output

About

Releases

Packages

Languages

License

moisesmoalde/ml-analysis

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Model Optimization and Evaluation

Dependencies

Configuration

Usage Guidelines

Running the Feature Selection

Running the Model Prediction

Feature Selection Optimization

Features

Output

Model Fitting and Prediction

Features

Output

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages