Space object classifier with a data analysis and a visualisation.
File 'star_classification.csv'contains of 100 000 observations of a space and an every observation is described by 18 feature columns which a 14th is a class column that defines if a observation is either a star, galaxy or quasar.
Column - informations:
1.'obj_ID' - Object Identifier, the unique value that identifies the object in the image catalog used by the CAS
2.'alpha' - Right Ascension angle (at J2000 epoch)
3.'delta' - Declination angle (at J2000 epoch)
4.'u' - Ultraviolet filter in the photometric system
5.'g' - Green filter in the photometric system
6.'r' - Red filter in the photometric system
7.'i' - Near Infrared filter in the photometric system
8.'z' - Infrared filter in the photometric system
9.'run_ID' - Run Number used to identify the specific scan
10.'rereun_ID' - Rerun Number to specify how the image was processed
11.'cam_col' - Camera column to identify the scanline within the run
12.'field_ID' - Field number to identify each field
13.'spec_obj_ID' - Unique ID used for optical spectroscopic objects (this means that 2 different observations with the same spec_obj_ID must share the output class)
14.'class' - object class (galaxy, star or quasar object)
15.'redshift' - redshift value based on the increase in wavelength
16.'plate' - plate ID, identifies each plate in SDSS
17.'MJD' - Modified Julian Date, used to indicate when a given piece of SDSS data was taken
18.'fiber_ID' - fiber ID that identifies the fiber that pointed the light at the focal plane in each observation
File 'RawData.py' contains a short analysis to get a brief info about the data like a distribution, amount, statistic informations...
File 'AnalysisEDAData.py' contains a deeper analysis with the exploration of correlations and patterns
- visualisation of an every class on the sky with a feature 'alpha' and 'delta': -correlations: Pearson's correlation: -quasar:
Spearman's correlation for 'star':
File 'ML_model.py' contains a machine learning model that classifie an observation with 98.5% accuracy
-
taking all informations from the analysis some columns could be dropped to make a model #df.drop(['obj_ID', 'delta', 'alpha', 'run_ID', 'rerun_ID', 'cam_col', 'field_ID', 'spec_obj_ID', 'fiber_ID']...
-
oversampling was used to make an equal amount of every class
-
the Random Forest Classifier was used to train a model and make an accurate classifier
-
test results was made with a conffusion matrix and a cross validation
All test results are located in file 'run.txt'.