In this project explore the exoplanets data that contains information on all known exoplanets (planets outside our solar system) discovered by NASA's various space missions, ground-based observatories, and other sources.
The goal of this project is to analyze the exoplanets dataset.
Some questions that are posed:
- What types of exoplanets exist?
- What are the characteristics of exoplanets?
- What are the common detection methods for exoplanets?
- Are there correlations between different features of exoplanets?
- Which Earth-like exoplanets are most close to Earth?
- analyze data;
- clean up the datasets;
- visualize the data using graphs and charts;
- perform analyze between categories features with contingency tables and chi-square tests;
- perform correlation analysis between numerical features;
- seek to answer the questions;
- making conclusions based on the analysis.
This research has made use of the part NASA Exoplanet Archive, which is operated by the California Institute of Technology, under contract with the National Aeronautics and Space Administration under the Exoplanet Exploration Program.
There is dataset:
nasa_exoplanets.csv- contains information on all known exoplanets (as of the end of 2023 year). The dataset includes information such as the planet's name, mass, radius, distance from Earth, orbital period, and other physical characteristics.
The dataset can be found here.
In this section, we will employ descriptive statistics and data visualization methods to gain a deeper understanding of the data. Some of the key metrics that will be calculated include:
- Frequency distributions
- Counts
- Relationships between variables
- Correlations between numerical features
