Star_Classifier_Analysis

Space object classifier with a data analysis and a visualisation.

About the dataset:

File 'star_classification.csv'contains of 100 000 observations of a space and an every observation is described by 18 feature columns which a 14th is a class column that defines if a observation is either a star, galaxy or quasar.

Column - informations:

1.'obj_ID' - Object Identifier, the unique value that identifies the object in the image catalog used by the CAS

2.'alpha' - Right Ascension angle (at J2000 epoch)

3.'delta' - Declination angle (at J2000 epoch)

4.'u' - Ultraviolet filter in the photometric system

5.'g' - Green filter in the photometric system

6.'r' - Red filter in the photometric system

7.'i' - Near Infrared filter in the photometric system

8.'z' - Infrared filter in the photometric system

9.'run_ID' - Run Number used to identify the specific scan

10.'rereun_ID' - Rerun Number to specify how the image was processed

11.'cam_col' - Camera column to identify the scanline within the run

12.'field_ID' - Field number to identify each field

13.'spec_obj_ID' - Unique ID used for optical spectroscopic objects (this means that 2 different observations with the same spec_obj_ID must share the output class)

14.'class' - object class (galaxy, star or quasar object)

15.'redshift' - redshift value based on the increase in wavelength

16.'plate' - plate ID, identifies each plate in SDSS

17.'MJD' - Modified Julian Date, used to indicate when a given piece of SDSS data was taken

18.'fiber_ID' - fiber ID that identifies the fiber that pointed the light at the focal plane in each observation

Short analysis of the data and exploratory data analysis:

File 'RawData.py' contains a short analysis to get a brief info about the data like a distribution, amount, statistic informations...

Possible patterns of 'class' objects using visualisations:
using library 'astropy' to more clear astronomy visualisation:
amount of an every class :

File 'AnalysisEDAData.py' contains a deeper analysis with the exploration of correlations and patterns

visualisation of an every class on the sky with a feature 'alpha' and 'delta': -correlations: Pearson's correlation: -quasar:

-star:

-galaxy:

Spearman's correlation for 'star':

Distribution of redshifts with quasars and redshifts with stars and galaxies:

Bulding ML model :

File 'ML_model.py' contains a machine learning model that classifie an observation with 98.5% accuracy

taking all informations from the analysis some columns could be dropped to make a model #df.drop(['obj_ID', 'delta', 'alpha', 'run_ID', 'rerun_ID', 'cam_col', 'field_ID', 'spec_obj_ID', 'fiber_ID']...
oversampling was used to make an equal amount of every class
the Random Forest Classifier was used to train a model and make an accurate classifier
test results was made with a conffusion matrix and a cross validation

All test results are located in file 'run.txt'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Star_Classifier_Analysis

About the dataset:

Short analysis of the data and exploratory data analysis:

Bulding ML model :

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
visualisations		visualisations
AnalysisEDAData.py		AnalysisEDAData.py
ML_model.py		ML_model.py
README.md		README.md
RawData.py		RawData.py
run.txt		run.txt
star_classification.csv		star_classification.csv

claudia13062013/Star_Classifier_Analysis

Folders and files

Latest commit

History

Repository files navigation

Star_Classifier_Analysis

About the dataset:

Short analysis of the data and exploratory data analysis:

Bulding ML model :

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages