This project implements an unsupervised clustering pipeline to identify probable members of the open star cluster M67 (NGC 2682) using data from Gaia DR3. The method follows the pyUPMASK algorithm (Solin et al. 2020), which combines clustering, spatial statistics, and probabilistic modeling.
-
Gaia DR3 Data Extraction
- Queried a 1Β° radius around M67 using ADQL via
astroquery.gaia. - Filtered based on astrometric quality (e.g., RUWE, visibility_periods).
- Queried a 1Β° radius around M67 using ADQL via
-
Feature Selection
- Used:
pmra,pmdec,parallax,bp_rp,phot_g_mean_mag
- Used:
-
Clustering
- Applied KMeans clustering on standardized feature space.
-
Random Field Rejection (RFR)
- Used Ripleyβs K function to reject spatially uniform (field-like) clusters.
-
Gaussian + Uniform Mixture Model (GUMM)
- Modeled the (RA, Dec) distribution with a 2D GMM to compute cluster membership probabilities.
-
Kernel Density Estimation (KDE)
- Smoothed the final membership probabilities based on local density.
- Final DataFrame includes:
gumm_prob: membership from Gaussian mixture modelkde_prob: smoothed membership probability
- Stars with
kde_prob > 0.8considered probable cluster members (this threshold can be varied).