Skip to content

Repository for a Master’s Thesis Project on profiling Emergency Department (ED) patients using Machine Learning techniques (clustering and PCA).

License

Notifications You must be signed in to change notification settings

lasigeBioTM/EmergencyPatientsClustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

ED Patient Clustering with Machine Learning

This repository contains a Machine Learning algorithm designed to identify patient profiles that attend Portuguese Emergency Departments (EDs), particularly those classified as non-urgent. The goal is to improve healthcare planning by supporting Primary Health Care (PHC) and ED resource management.

Student: Mafalda Moreira

Supervisor: Francisco Couto

Co-supervisor: Patrícia Moura Rosa

About the Project

This project is part of a Master’s thesis focused on improving the coordination between PHC and EDs using data-driven methods. It includes:

  • Data preprocessing and filtering
  • Clustering analysis with K-Means
  • Evaluation using Silhouette, Davies-Bouldin index, and Calinski-Harabasz index scores
  • PCA for cluster visualization

Files

  • clustering_ED_patients.py: Main script for data loading, clustering, visualization, and exporting.
  • fact_table.csv: contains information regarding each ED episode
  • dim_table.csv: contains information regarding healthcare activity recorded across primary care settings, including aggregated clinical indicators that support statistical analysis of service utilization patterns

Note: These files are not included in the repository due to data confidentiality. You must prepare your own CSV files with appropriate structure and variable names as described in the script comments.

Requirements

  • python: 3.13.3
  • pandas: 2.2.3
  • scikit-learn: 1.6.1
  • matplotlib: 3.10.1
  • seaborn: 0.13.2

Data Availability & Confidentiality

Due to the sensitive nature of the healthcare data used in this study, the dataset cannot be made publicly available. The use of this information is strictly confined to statistical purposes within the scope of public health research, monitoring, and strategic planning, in full compliance with the General Data Protection Regulation (GDPR) and all other applicable legal and ethical standards. Technical identifiers were omitted to safeguard confidentiality and prevent reidentification.

About

Repository for a Master’s Thesis Project on profiling Emergency Department (ED) patients using Machine Learning techniques (clustering and PCA).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages