From 16d7dee576614f2e48cc06f176a89a21a5c2121c Mon Sep 17 00:00:00 2001 From: VARUNSHIYAM <138989960+Varunshiyam@users.noreply.github.com> Date: Tue, 29 Oct 2024 14:32:32 +0530 Subject: [PATCH] Fixes Issue #94 Exoplanet Detection Added up my Project file as ipynb --- .../Exoplanet_Detection.ipynb | 1384 +++++++++++++++++ 1 file changed, 1384 insertions(+) create mode 100644 Deep_Learning_Models/Exoplanet_Detection/Exoplanet_Detection.ipynb diff --git a/Deep_Learning_Models/Exoplanet_Detection/Exoplanet_Detection.ipynb b/Deep_Learning_Models/Exoplanet_Detection/Exoplanet_Detection.ipynb new file mode 100644 index 0000000..183c382 --- /dev/null +++ b/Deep_Learning_Models/Exoplanet_Detection/Exoplanet_Detection.ipynb @@ -0,0 +1,1384 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 48, + "metadata": { + "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19", + "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5", + "execution": { + "iopub.execute_input": "2022-06-04T13:50:21.971488Z", + "iopub.status.busy": "2022-06-04T13:50:21.970587Z", + "iopub.status.idle": "2022-06-04T13:50:21.982584Z", + "shell.execute_reply": "2022-06-04T13:50:21.981531Z", + "shell.execute_reply.started": "2022-06-04T13:50:21.971443Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/kaggle/input/exoplanetsflux/exoTest.csv\n", + "/kaggle/input/exoplanetsflux/exoTrain.csv\n" + ] + } + ], + "source": [ + "# This Python 3 environment comes with many helpful analytics libraries installed\n", + "# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python\n", + "# For example, here's several helpful packages to load\n", + "\n", + "#import numpy as np # linear algebra\n", + "#import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n", + "\n", + "# Input data files are available in the read-only \"../input/\" directory\n", + "# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory\n", + "\n", + "#import os\n", + "for dirname, _, filenames in os.walk('/kaggle/input'):\n", + " for filename in filenames:\n", + " print(os.path.join(dirname, filename))\n", + "\n", + "# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using \"Save & Run All\" \n", + "# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# **Transit Exoplanets Detection with Deep Learning Models**\n", + "\n", + "Scientists use data collected by space telescopes to find new information that allows us to learn more about the universe. The NASA Kepler Space Telescope has been collecting light from thousands of stars for many years to detect the presence of exoplanets.\n", + "\n", + "![ Transit Method ](https://www.science-et-vie.com/wp-content/uploads/scienceetvie/2022/04/transit-planetaire-750x319.jpg) ![]()\n", + "\n", + "An ExoPlanet is a planet that orbits a star, just like the Earth; however these systems are hundreds or thousands of light years away from Earth, so it is essential to have tools that can assist scientists in understanding whether a given star is likely to have exoplanets. The data collected by space telescopes is huge and new artificial intelligence techniques enable advanced data analysis and powerful predictive models.\n", + "\n", + "In this project we used a dataset of exoplanets, coming from Mikulski Archive, a large archive of astronomical data for classifying the light curve of the stars to check the presence of the exoplanets. First of all, I'm going to apply different feature engineering techniques techniques to the dataset ant then we will present a Convolution Neural Network (CNN), which is a strong model in Deep Learning for time series classification (TSC). Sine the measure of brightness is a standard in this application, this predictive model can be useful for future works with other and new larger dataset." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# **About Dataset**\n", + "\n", + "The data describe the change in flux (light intensity) of several thousand stars. Each Star has a binary label of 2 or 1. 2 indicated that the star is confirmed to have at least one exoplanet in orbit; some observations are in fact multi-planet systems.\n", + "\n", + "As you can imagine, planets themeselves do not emit light, but the Stars that they orbit do. If said Star is watched over several months or years, there may be a regular \"dimming\"ofthe flux (the light intensity). This is evidence that there may be an orbiting body around the Star; such a star could be considered to be a \"candidate\" system. Further study of our candidate system, for example by a satellite that captures light at a different wavelenght, could solidity the belief that the candidate can in fact be \"confirmed\". \n", + "\n", + "# **Description**\n", + "\n", + "TrainSet:\n", + "* 5087 rows or observations\n", + "* 3198 columns or features\n", + "* Column 1 is the label vector. Column 2 - 3198 are the flux values over time\n", + "* 37 confirmed exoplanet-stars and 5050 non-exoplanet-stars\n", + "\n", + "TestSet :\n", + "* 570 rows or observations\n", + "* 3198 columns or features\n", + "* Column 1 is the label vector. Column 2 - 3198 are the flux values over time\n", + "* 5 confirmed exoplanet-stars and 565 non-exoplanet-stars\n", + "\n", + "# **Acknowledgements**\n", + "\n", + "The data presented here are cleaned and are derived from observations made by the NASA Kepler Space Telescope. The Mission is ongoing - for instance data from Campaign 12 was released on 8th Marth 2017. Over 99% of this dataset originates from Campaign 3. To boost the number of exoplanet-stars in the dataset, confirmed exoplanets from other campaigns were also included.\n", + "\n", + "To be clear, all observations from Campaign 3 are included. And in addition to this, confirmed exoplanet-stars from other campaigns are also included.\n", + "\n", + "The datasets were prepared late-summer 2016. \n", + "\n", + "Campaign 3 was used because \"it was felt\" that this Campaign is unlikely to contain any undiscovered (i.e. wrongly labelled) exoplanets.\n", + "\n", + "NASA open-sources the original Kepler Mission data and it is hosted at the Mikulsi Archive. After being beamed down to Earth, NASA applies de-noising algorithms to remove artefacts generated by the telescope. The data - in the astrophysicist, anyone with an ineternet connection can embark on a search to find and retrieve the datafiles from the Archive.\n", + "\n", + "The Transit Method Descriptive image is copyright @ 2021 by [Science & Vie](https://www.science-et-vie.com)" + ] + }, + { + "cell_type": "code", + "execution_count": 76, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T14:42:19.567963Z", + "iopub.status.busy": "2022-06-04T14:42:19.567585Z", + "iopub.status.idle": "2022-06-04T14:42:19.576724Z", + "shell.execute_reply": "2022-06-04T14:42:19.575967Z", + "shell.execute_reply.started": "2022-06-04T14:42:19.567933Z" + } + }, + "outputs": [], + "source": [ + "# Import data analysis Packages\n", + "import numpy as np # linear algebra\n", + "import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n", + "import matplotlib.pyplot as plt # data visualization\n", + "from scipy import signal #\n", + "from scipy.ndimage.filters import gaussian_filter\n", + "from scipy.fftpack import fft\n", + "import scipy\n", + "import seaborn as sns # data visualization\n", + "# import models as m # model creation package\n", + "\n", + "# Import Machine Learning and Deep Learninig packages\n", + "#import sklearn.linear_model as lm\n", + "#import tensorflow as tf\n", + "#import sklearn.svm as svm\n", + "\n", + "#from tensorflow.keras import models\n", + "#from tensorflow.keras import layers\n", + "#from tensorflow.keras.preprocessing import sequence\n", + "\n", + "# Model evaluation methods\n", + "import sklearn.preprocessing as pproc\n", + "from sklearn.model_selection import train_test_split\n", + "#from sklearn.metrics import accuracy_score\n", + "#from sklearn.metrics import confusion_matrix, classification_report\n", + "from sklearn.metrics import plot_confusion_matrix\n", + "from sklearn.preprocessing import normalize\n", + "\n", + "from imblearn.over_sampling import RandomOverSampler" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T13:50:36.015223Z", + "iopub.status.busy": "2022-06-04T13:50:36.014661Z", + "iopub.status.idle": "2022-06-04T13:50:36.019548Z", + "shell.execute_reply": "2022-06-04T13:50:36.018671Z", + "shell.execute_reply.started": "2022-06-04T13:50:36.015190Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2.6.4\n" + ] + } + ], + "source": [ + "# Display Tensorflow version\n", + "print(tf.__version__)" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T13:50:41.858434Z", + "iopub.status.busy": "2022-06-04T13:50:41.857997Z", + "iopub.status.idle": "2022-06-04T13:50:48.350833Z", + "shell.execute_reply": "2022-06-04T13:50:48.349979Z", + "shell.execute_reply.started": "2022-06-04T13:50:41.858397Z" + } + }, + "outputs": [], + "source": [ + "# Import the dataset\n", + "data_train = pd.read_csv('/kaggle/input/exoplanetsflux/exoTrain.csv') # TrainSet\n", + "data_test = pd.read_csv('/kaggle/input/exoplanetsflux/exoTest.csv')" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T13:51:13.237512Z", + "iopub.status.busy": "2022-06-04T13:51:13.236761Z", + "iopub.status.idle": "2022-06-04T13:51:13.276734Z", + "shell.execute_reply": "2022-06-04T13:51:13.275987Z", + "shell.execute_reply.started": "2022-06-04T13:51:13.237474Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
LABELFLUX.1FLUX.2FLUX.3FLUX.4FLUX.5FLUX.6FLUX.7FLUX.8FLUX.9...FLUX.3188FLUX.3189FLUX.3190FLUX.3191FLUX.3192FLUX.3193FLUX.3194FLUX.3195FLUX.3196FLUX.3197
0293.8583.8120.10-26.98-39.56-124.71-135.18-96.27-79.89...-78.07-102.15-102.1525.1348.5792.5439.3261.425.08-39.54
12-38.88-33.83-58.54-40.09-79.31-72.81-86.55-85.33-83.97...-3.28-32.21-32.21-24.89-4.860.76-11.706.4616.0019.93
22532.64535.92513.73496.92456.45466.00464.50486.39436.56...-71.6913.3113.31-29.89-20.885.06-11.80-28.91-70.02-96.67
32326.52347.39302.35298.13317.74312.70322.33311.31312.42...5.71-3.73-3.7330.0520.03-12.67-8.77-17.31-17.3513.98
42-1107.21-1112.59-1118.95-1095.10-1057.55-1034.48-998.34-1022.71-989.57...-594.37-401.66-401.66-357.24-443.76-438.54-399.71-384.65-411.79-510.54
..................................................................
50771125.5778.6998.2991.1678.4245.8261.6922.7339.09...32.3563.2357.9890.43115.12210.093.8016.3327.3521.30
507817.4510.026.87-2.82-1.56-4.30-7.01-6.97-2.54...-5.25-8.560.53-4.29-6.608.75-10.69-9.54-2.48-8.69
50791475.61395.50423.61376.36338.94321.26326.34342.84251.23...543.25453.87344.35266.16242.18163.0286.2913.06161.22213.60
50801-46.63-55.39-64.88-88.75-75.40-64.06-66.37-41.95-68.07...29.646.9032.9456.6328.7128.82-20.12-14.41-43.35-30.04
50811299.41302.77278.68263.48236.89186.93145.45151.20123.38...-126.36-133.82-134.02-98.76-106.60-74.95-46.29-3.08-28.43-48.68
\n", + "

5082 rows × 3198 columns

\n", + "
" + ], + "text/plain": [ + " LABEL FLUX.1 FLUX.2 FLUX.3 FLUX.4 FLUX.5 FLUX.6 FLUX.7 \\\n", + "0 2 93.85 83.81 20.10 -26.98 -39.56 -124.71 -135.18 \n", + "1 2 -38.88 -33.83 -58.54 -40.09 -79.31 -72.81 -86.55 \n", + "2 2 532.64 535.92 513.73 496.92 456.45 466.00 464.50 \n", + "3 2 326.52 347.39 302.35 298.13 317.74 312.70 322.33 \n", + "4 2 -1107.21 -1112.59 -1118.95 -1095.10 -1057.55 -1034.48 -998.34 \n", + "... ... ... ... ... ... ... ... ... \n", + "5077 1 125.57 78.69 98.29 91.16 78.42 45.82 61.69 \n", + "5078 1 7.45 10.02 6.87 -2.82 -1.56 -4.30 -7.01 \n", + "5079 1 475.61 395.50 423.61 376.36 338.94 321.26 326.34 \n", + "5080 1 -46.63 -55.39 -64.88 -88.75 -75.40 -64.06 -66.37 \n", + "5081 1 299.41 302.77 278.68 263.48 236.89 186.93 145.45 \n", + "\n", + " FLUX.8 FLUX.9 ... FLUX.3188 FLUX.3189 FLUX.3190 FLUX.3191 \\\n", + "0 -96.27 -79.89 ... -78.07 -102.15 -102.15 25.13 \n", + "1 -85.33 -83.97 ... -3.28 -32.21 -32.21 -24.89 \n", + "2 486.39 436.56 ... -71.69 13.31 13.31 -29.89 \n", + "3 311.31 312.42 ... 5.71 -3.73 -3.73 30.05 \n", + "4 -1022.71 -989.57 ... -594.37 -401.66 -401.66 -357.24 \n", + "... ... ... ... ... ... ... ... \n", + "5077 22.73 39.09 ... 32.35 63.23 57.98 90.43 \n", + "5078 -6.97 -2.54 ... -5.25 -8.56 0.53 -4.29 \n", + "5079 342.84 251.23 ... 543.25 453.87 344.35 266.16 \n", + "5080 -41.95 -68.07 ... 29.64 6.90 32.94 56.63 \n", + "5081 151.20 123.38 ... -126.36 -133.82 -134.02 -98.76 \n", + "\n", + " FLUX.3192 FLUX.3193 FLUX.3194 FLUX.3195 FLUX.3196 FLUX.3197 \n", + "0 48.57 92.54 39.32 61.42 5.08 -39.54 \n", + "1 -4.86 0.76 -11.70 6.46 16.00 19.93 \n", + "2 -20.88 5.06 -11.80 -28.91 -70.02 -96.67 \n", + "3 20.03 -12.67 -8.77 -17.31 -17.35 13.98 \n", + "4 -443.76 -438.54 -399.71 -384.65 -411.79 -510.54 \n", + "... ... ... ... ... ... ... \n", + "5077 115.12 210.09 3.80 16.33 27.35 21.30 \n", + "5078 -6.60 8.75 -10.69 -9.54 -2.48 -8.69 \n", + "5079 242.18 163.02 86.29 13.06 161.22 213.60 \n", + "5080 28.71 28.82 -20.12 -14.41 -43.35 -30.04 \n", + "5081 -106.60 -74.95 -46.29 -3.08 -28.43 -48.68 \n", + "\n", + "[5082 rows x 3198 columns]" + ] + }, + "execution_count": 53, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Data Analysis\n", + "data_train.head(-5)" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T13:51:17.802311Z", + "iopub.status.busy": "2022-06-04T13:51:17.801967Z", + "iopub.status.idle": "2022-06-04T13:51:18.041207Z", + "shell.execute_reply": "2022-06-04T13:51:18.040369Z", + "shell.execute_reply.started": "2022-06-04T13:51:17.802283Z" + } + }, + "outputs": [], + "source": [ + "# Permute the dataset\n", + "data_train = np.random.permutation(np.asarray(data_train))\n", + "data_test = np.random.permutation(np.asarray(data_test))" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T13:51:20.572931Z", + "iopub.status.busy": "2022-06-04T13:51:20.572393Z", + "iopub.status.idle": "2022-06-04T13:51:20.636874Z", + "shell.execute_reply": "2022-06-04T13:51:20.635991Z", + "shell.execute_reply.started": "2022-06-04T13:51:20.572890Z" + } + }, + "outputs": [], + "source": [ + "# Get the label column and delete the class column and rescale\n", + "y1 = data_train[:,0]\n", + "y2 = data_test[:,0]\n", + "\n", + "y_train = (y1 - min(y1)) / (max(y1) - min(y1))\n", + "y_test = (y2- min(y2)) / (max(y2) - min(y2))\n", + "\n", + "data_train = np.delete(data_train,1,1)\n", + "data_test = np.delete(data_test,1,1)" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T13:51:26.148675Z", + "iopub.status.busy": "2022-06-04T13:51:26.148293Z", + "iopub.status.idle": "2022-06-04T13:51:26.360230Z", + "shell.execute_reply": "2022-06-04T13:51:26.359504Z", + "shell.execute_reply.started": "2022-06-04T13:51:26.148619Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[]" + ] + }, + "execution_count": 56, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Print the light curve\n", + "time = np.arange(len(data_train[0])) * (36/60) # time in hours\n", + "\n", + "plt.figure(figsize=(20,5))\n", + "plt.title('Flux of Star 10 with confirmed planet')\n", + "plt.ylabel('Flux')\n", + "plt.xlabel('Hours')\n", + "plt.plot(time, data_train[10]) # Change the number to plot what you want" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T13:51:46.850494Z", + "iopub.status.busy": "2022-06-04T13:51:46.849671Z", + "iopub.status.idle": "2022-06-04T13:51:46.955455Z", + "shell.execute_reply": "2022-06-04T13:51:46.954600Z", + "shell.execute_reply.started": "2022-06-04T13:51:46.850458Z" + } + }, + "outputs": [], + "source": [ + "# Normalized data\n", + "data_train_norm = normalize(data_train)\n", + "data_test_norm = normalize(data_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T13:51:49.728777Z", + "iopub.status.busy": "2022-06-04T13:51:49.727835Z", + "iopub.status.idle": "2022-06-04T13:51:49.734208Z", + "shell.execute_reply": "2022-06-04T13:51:49.733318Z", + "shell.execute_reply.started": "2022-06-04T13:51:49.728729Z" + } + }, + "outputs": [], + "source": [ + "# Function to apply gaussian filter to all data\n", + "def gauss_filter(dataset, sigma):\n", + " dts = []\n", + " for x in range(dataset.shape[0]):\n", + " dts.append(gaussian_filter(dataset[x], sigma))\n", + " \n", + " return np.asarray(dts)" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T13:52:02.258151Z", + "iopub.status.busy": "2022-06-04T13:52:02.257426Z", + "iopub.status.idle": "2022-06-04T13:52:03.057856Z", + "shell.execute_reply": "2022-06-04T13:52:03.056885Z", + "shell.execute_reply.started": "2022-06-04T13:52:02.258113Z" + } + }, + "outputs": [], + "source": [ + "# Apply the gaussian filter to all rows data\n", + "data_train_gaussian = gauss_filter(data_train_norm,7.0)\n", + "data_test_gaussian = gauss_filter(data_test_norm,7.0)" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T13:56:52.431648Z", + "iopub.status.busy": "2022-06-04T13:56:52.431269Z", + "iopub.status.idle": "2022-06-04T13:56:52.635192Z", + "shell.execute_reply": "2022-06-04T13:56:52.634466Z", + "shell.execute_reply.started": "2022-06-04T13:56:52.431598Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[]" + ] + }, + "execution_count": 63, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Print the light curves smoothed\n", + "plt.figure(figsize=(20,5))\n", + "plt.title('Flux of Start 10 with confirmed planet, smoothed')\n", + "plt.ylabel('Flux')\n", + "plt.xlabel(\"Hours\")\n", + "plt.plot(time, data_train_gaussian[10])" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T14:05:24.177695Z", + "iopub.status.busy": "2022-06-04T14:05:24.177259Z", + "iopub.status.idle": "2022-06-04T14:05:24.736718Z", + "shell.execute_reply": "2022-06-04T14:05:24.735825Z", + "shell.execute_reply.started": "2022-06-04T14:05:24.177654Z" + } + }, + "outputs": [], + "source": [ + "# Apply FFT to the data smoothed\n", + "frequency = np.arange(len(data_train[0])) * (1/(36.0*60.0))\n", + "\n", + "data_train_fft1 = scipy.fft.fft2(data_train_norm, axes=1)\n", + "data_test_fft1 = scipy.fft.fft2(data_test_norm, axes=1)\n", + "\n", + "data_train_fft = np.abs(data_train_fft1) # calculate the abs value\n", + "data_test_fft = np.abs(data_test_fft1)" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T14:07:06.303680Z", + "iopub.status.busy": "2022-06-04T14:07:06.303282Z", + "iopub.status.idle": "2022-06-04T14:07:06.308097Z", + "shell.execute_reply": "2022-06-04T14:07:06.307007Z", + "shell.execute_reply.started": "2022-06-04T14:07:06.303642Z" + } + }, + "outputs": [], + "source": [ + "# Get the lenght of the FFT data, make something here below in order to make the sequences of the same size\n", + "# only if they have differet dimensions\n", + "len_seq = len(data_train_fft[0])" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T14:09:27.527084Z", + "iopub.status.busy": "2022-06-04T14:09:27.526704Z", + "iopub.status.idle": "2022-06-04T14:09:27.743746Z", + "shell.execute_reply": "2022-06-04T14:09:27.743038Z", + "shell.execute_reply.started": "2022-06-04T14:09:27.527053Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[]" + ] + }, + "execution_count": 67, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Plot the FFT of the signals\n", + "plt.figure(figsize=(20,5))\n", + "plt.title('Flux of Star 1 (with confirmed planet) in domain of frequencies')\n", + "plt.ylabel('Abs value of FFT result')\n", + "plt.xlabel('Frequency')\n", + "plt.plot(frequency, data_train_fft[1])" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T14:28:13.243479Z", + "iopub.status.busy": "2022-06-04T14:28:13.243133Z", + "iopub.status.idle": "2022-06-04T14:28:13.386604Z", + "shell.execute_reply": "2022-06-04T14:28:13.385684Z", + "shell.execute_reply.started": "2022-06-04T14:28:13.243451Z" + } + }, + "outputs": [], + "source": [ + "# Oversampling technique to the data\n", + "rm = RandomOverSampler(sampling_strategy=0.5)\n", + "data_train_ovs, y_train_ovs = rm.fit_resample(data_train_fft, y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T14:30:32.958216Z", + "iopub.status.busy": "2022-06-04T14:30:32.957371Z", + "iopub.status.idle": "2022-06-04T14:30:33.011389Z", + "shell.execute_reply": "2022-06-04T14:30:33.009905Z", + "shell.execute_reply.started": "2022-06-04T14:30:32.958177Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "After oversampling, counts of label '1': 2525\n", + "After oversampling, counts of label '0': 5050\n" + ] + } + ], + "source": [ + "# Recap dataset after ovesampling\n", + "print(\"After oversampling, counts of label '1': {}\".format(sum(y_train_ovs==1)))\n", + "print(\"After oversampling, counts of label '0': {}\".format(sum(y_train_ovs==0)))" + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T14:35:45.197387Z", + "iopub.status.busy": "2022-06-04T14:35:45.196801Z", + "iopub.status.idle": "2022-06-04T14:35:45.201620Z", + "shell.execute_reply": "2022-06-04T14:35:45.200752Z", + "shell.execute_reply.started": "2022-06-04T14:35:45.197350Z" + } + }, + "outputs": [], + "source": [ + "# Reshape the data for the neural network model\n", + "data_train_ovs = np.asarray(data_train_ovs)\n", + "data_test_fft = np.asarray(data_test_fft)" + ] + }, + { + "cell_type": "code", + "execution_count": 75, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T14:36:04.601441Z", + "iopub.status.busy": "2022-06-04T14:36:04.600833Z", + "iopub.status.idle": "2022-06-04T14:36:04.606586Z", + "shell.execute_reply": "2022-06-04T14:36:04.605535Z", + "shell.execute_reply.started": "2022-06-04T14:36:04.601400Z" + } + }, + "outputs": [], + "source": [ + "data_train_ovs_nn = data_train_ovs.reshape((data_train_ovs.shape[0], data_train_ovs.shape[1], 1))\n", + "data_test_fft_nn = data_test_fft.reshape((data_test_fft.shape[0], data_test_fft.shape[1], 1))" + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T14:44:04.035858Z", + "iopub.status.busy": "2022-06-04T14:44:04.035175Z", + "iopub.status.idle": "2022-06-04T14:44:04.048805Z", + "shell.execute_reply": "2022-06-04T14:44:04.047923Z", + "shell.execute_reply.started": "2022-06-04T14:44:04.035817Z" + } + }, + "outputs": [], + "source": [ + "# Import all modules for models creation\n", + "import tensorflow as tf\n", + "import pydot\n", + "\n", + "from sklearn.metrics import accuracy_score\n", + "from sklearn.metrics import confusion_matrix, classification_report\n", + "from keras.utils.vis_utils import plot_model\n", + "\n", + "from tensorflow.keras import models\n", + "from tensorflow.keras import layers\n", + "from tensorflow.keras.preprocessing import sequence\n", + "\n", + "\n", + "from sklearn.model_selection import GridSearchCV\n", + "from sklearn.svm import SVC\n", + "\n", + "\n", + "#create the neural network\n", + "def FCN_model(len_seq):\n", + " \n", + " # len_seq = the size of the input sequences\n", + " \n", + " model = tf.keras.Sequential()\n", + " \n", + " #change the input shape if you have sequences less long\n", + " model.add(layers.Conv1D(filters=256, kernel_size=8, activation='relu', input_shape=(len_seq,1)))\n", + " model.add(layers.MaxPool1D(strides=5))\n", + " model.add(layers.BatchNormalization())\n", + " \n", + " \n", + " model.add(layers.Conv1D(filters=340, kernel_size=6, activation='relu'))\n", + " model.add(layers.MaxPool1D(strides=5))\n", + " model.add(layers.BatchNormalization())\n", + " \n", + " \n", + " model.add(layers.Conv1D(filters=256, kernel_size=4, activation='relu'))\n", + " model.add(layers.MaxPool1D(strides=5))\n", + " model.add(layers.BatchNormalization())\n", + " \n", + " \n", + " model.add(layers.Flatten())\n", + " model.add(layers.Dropout(0.3))\n", + " \n", + " \n", + " model.add(layers.Dense(24, activation='relu'))\n", + " model.add(layers.Dropout(0.3))\n", + " \n", + " model.add(layers.Dense(12, activation='relu'))\n", + " \n", + " \n", + " model.add(layers.Dense(8, activation = 'relu'))\n", + " \n", + " \n", + " model.add(layers.Dense(1, activation='sigmoid'))\n", + " \n", + " return model\n", + "\n", + "\n", + "\n", + "#create the SVC model\n", + "def SVC_model():\n", + " \n", + " tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],\n", + " 'C': [1, 10, 100, 1000]},\n", + " {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]\n", + "\n", + " \n", + " clf = GridSearchCV( SVC(), param_grid = tuned_parameters,scoring = 'recall')\n", + " \n", + " \n", + " return clf" + ] + }, + { + "cell_type": "code", + "execution_count": 78, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T14:45:30.300906Z", + "iopub.status.busy": "2022-06-04T14:45:30.300132Z", + "iopub.status.idle": "2022-06-04T14:47:56.747937Z", + "shell.execute_reply": "2022-06-04T14:47:56.746964Z", + "shell.execute_reply.started": "2022-06-04T14:45:30.300855Z" + } + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2022-06-04 14:45:30.395920: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2022-06-04 14:45:30.539557: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2022-06-04 14:45:30.540308: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2022-06-04 14:45:30.541502: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA\n", + "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", + "2022-06-04 14:45:30.541824: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2022-06-04 14:45:30.542534: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2022-06-04 14:45:30.543200: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2022-06-04 14:45:32.761058: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2022-06-04 14:45:32.762439: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2022-06-04 14:45:32.763487: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", + "2022-06-04 14:45:32.764420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 15403 MB memory: -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Model: \"sequential\"\n", + "_________________________________________________________________\n", + "Layer (type) Output Shape Param # \n", + "=================================================================\n", + "conv1d (Conv1D) (None, 3190, 256) 2304 \n", + "_________________________________________________________________\n", + "max_pooling1d (MaxPooling1D) (None, 638, 256) 0 \n", + "_________________________________________________________________\n", + "batch_normalization (BatchNo (None, 638, 256) 1024 \n", + "_________________________________________________________________\n", + "conv1d_1 (Conv1D) (None, 633, 340) 522580 \n", + "_________________________________________________________________\n", + "max_pooling1d_1 (MaxPooling1 (None, 127, 340) 0 \n", + "_________________________________________________________________\n", + "batch_normalization_1 (Batch (None, 127, 340) 1360 \n", + "_________________________________________________________________\n", + "conv1d_2 (Conv1D) (None, 124, 256) 348416 \n", + "_________________________________________________________________\n", + "max_pooling1d_2 (MaxPooling1 (None, 25, 256) 0 \n", + "_________________________________________________________________\n", + "batch_normalization_2 (Batch (None, 25, 256) 1024 \n", + "_________________________________________________________________\n", + "flatten (Flatten) (None, 6400) 0 \n", + "_________________________________________________________________\n", + "dropout (Dropout) (None, 6400) 0 \n", + "_________________________________________________________________\n", + "dense (Dense) (None, 24) 153624 \n", + "_________________________________________________________________\n", + "dropout_1 (Dropout) (None, 24) 0 \n", + "_________________________________________________________________\n", + "dense_1 (Dense) (None, 12) 300 \n", + "_________________________________________________________________\n", + "dense_2 (Dense) (None, 8) 104 \n", + "_________________________________________________________________\n", + "dense_3 (Dense) (None, 1) 9 \n", + "=================================================================\n", + "Total params: 1,030,745\n", + "Trainable params: 1,029,041\n", + "Non-trainable params: 1,704\n", + "_________________________________________________________________\n", + "None\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2022-06-04 14:45:33.704953: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1/15\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2022-06-04 14:45:35.725775: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8005\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "758/758 [==============================] - 17s 12ms/step - loss: 0.1499 - accuracy: 0.9345 - val_loss: 0.0324 - val_accuracy: 0.9912\n", + "Epoch 2/15\n", + "758/758 [==============================] - 9s 12ms/step - loss: 0.0259 - accuracy: 0.9919 - val_loss: 0.0235 - val_accuracy: 0.9930\n", + "Epoch 3/15\n", + "758/758 [==============================] - 9s 12ms/step - loss: 0.0189 - accuracy: 0.9942 - val_loss: 0.0288 - val_accuracy: 0.9895\n", + "Epoch 4/15\n", + "758/758 [==============================] - 9s 12ms/step - loss: 0.0128 - accuracy: 0.9963 - val_loss: 0.0199 - val_accuracy: 0.9912\n", + "Epoch 5/15\n", + "758/758 [==============================] - 9s 11ms/step - loss: 0.0127 - accuracy: 0.9975 - val_loss: 0.0154 - val_accuracy: 0.9965\n", + "Epoch 6/15\n", + "758/758 [==============================] - 8s 11ms/step - loss: 0.0098 - accuracy: 0.9974 - val_loss: 0.0048 - val_accuracy: 0.9965\n", + "Epoch 7/15\n", + "758/758 [==============================] - 9s 12ms/step - loss: 0.0165 - accuracy: 0.9933 - val_loss: 0.0175 - val_accuracy: 0.9947\n", + "Epoch 8/15\n", + "758/758 [==============================] - 9s 11ms/step - loss: 0.0058 - accuracy: 0.9988 - val_loss: 0.0026 - val_accuracy: 1.0000\n", + "Epoch 9/15\n", + "758/758 [==============================] - 9s 11ms/step - loss: 0.0123 - accuracy: 0.9962 - val_loss: 0.0254 - val_accuracy: 0.9947\n", + "Epoch 10/15\n", + "758/758 [==============================] - 9s 11ms/step - loss: 0.0043 - accuracy: 0.9989 - val_loss: 0.0660 - val_accuracy: 0.9895\n", + "Epoch 11/15\n", + "758/758 [==============================] - 9s 12ms/step - loss: 0.0097 - accuracy: 0.9976 - val_loss: 0.0532 - val_accuracy: 0.9895\n", + "Epoch 12/15\n", + "758/758 [==============================] - 9s 11ms/step - loss: 0.0039 - accuracy: 0.9991 - val_loss: 0.0185 - val_accuracy: 0.9947\n", + "Epoch 13/15\n", + "758/758 [==============================] - 9s 11ms/step - loss: 0.0017 - accuracy: 0.9995 - val_loss: 0.0302 - val_accuracy: 0.9965\n", + "Epoch 14/15\n", + "758/758 [==============================] - 9s 12ms/step - loss: 0.0037 - accuracy: 0.9997 - val_loss: 0.0878 - val_accuracy: 0.9895\n", + "Epoch 15/15\n", + "758/758 [==============================] - 9s 12ms/step - loss: 0.0074 - accuracy: 0.9980 - val_loss: 0.0390 - val_accuracy: 0.9965\n" + ] + } + ], + "source": [ + "# Create F.C.N model and run it\n", + "model = FCN_model(len_seq)\n", + "\n", + "model.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),metrics=['accuracy'])\n", + "\n", + "print(model.summary())\n", + "\n", + "history = model.fit(data_train_ovs_nn, y_train_ovs , epochs=15, batch_size = 10, validation_data=(data_test_fft_nn, y_test))" + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T15:01:01.003365Z", + "iopub.status.busy": "2022-06-04T15:01:01.002944Z", + "iopub.status.idle": "2022-06-04T15:01:01.007603Z", + "shell.execute_reply": "2022-06-04T15:01:01.006506Z", + "shell.execute_reply.started": "2022-06-04T15:01:01.003330Z" + } + }, + "outputs": [], + "source": [ + "# Save the model\n", + "# model.save(\"/kaggle/working/exoplanetflux_model_1\")\n", + "# Save -format using the HDF5 standard. \n", + "#model.save(\"/kaggle/working/exoplanetflux_model_1.h5\")\n", + "#load the model if already exsist\n", + "#model = tf.keras.models.load_model(\"/kaggle/working/exoplanetflux_model_1.h5\")" + ] + }, + { + "cell_type": "code", + "execution_count": 84, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T15:03:08.815736Z", + "iopub.status.busy": "2022-06-04T15:03:08.815349Z", + "iopub.status.idle": "2022-06-04T15:03:09.020271Z", + "shell.execute_reply": "2022-06-04T15:03:09.019523Z", + "shell.execute_reply.started": "2022-06-04T15:03:08.815705Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Plot Accuracy\n", + "acc = history.history['accuracy']\n", + "#acc_val = history.history['val_accuracy']\n", + "epochs = range(1, len(acc)+1)\n", + "plt.plot(epochs, acc, 'b', label='accuracy_train')\n", + "#plt.plot(epochs, acc_val, 'g', label='accuracy_val')\n", + "plt.title('accuracy')\n", + "plt.xlabel('epochs')\n", + "plt.ylabel('value of accuracy')\n", + "plt.legend()\n", + "plt.grid()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 85, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T15:03:55.244560Z", + "iopub.status.busy": "2022-06-04T15:03:55.244201Z", + "iopub.status.idle": "2022-06-04T15:03:55.437292Z", + "shell.execute_reply": "2022-06-04T15:03:55.436578Z", + "shell.execute_reply.started": "2022-06-04T15:03:55.244527Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Plot Loss\n", + "loss = history.history['loss']\n", + "#loss_val = history.history['val_loss']\n", + "epochs = range(1, len(acc)+1)\n", + "plt.plot(epochs, loss, 'b', label='loss_train')\n", + "#plt.plot(epochs, loss_val, 'g', label='loss_val')\n", + "plt.title('loss')\n", + "plt.xlabel('epochs')\n", + "plt.ylabel('value of loss')\n", + "plt.legend()\n", + "plt.grid()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 86, + "metadata": { + "execution": { + "iopub.execute_input": "2022-06-04T15:05:44.197762Z", + "iopub.status.busy": "2022-06-04T15:05:44.196895Z", + "iopub.status.idle": "2022-06-04T15:05:44.768248Z", + "shell.execute_reply": "2022-06-04T15:05:44.767490Z", + "shell.execute_reply.started": "2022-06-04T15:05:44.197727Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "accuracy : 0.9964912280701754\n", + " precision recall f1-score support\n", + "\n", + " NO exoplanet confirmed 1.00 1.00 1.00 565\n", + "YES exoplanet confirmed 1.00 0.60 0.75 5\n", + "\n", + " accuracy 1.00 570\n", + " macro avg 1.00 0.80 0.87 570\n", + " weighted avg 1.00 1.00 1.00 570\n", + "\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 86, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAD4CAYAAADSIzzWAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAT+UlEQVR4nO3deZgdVZ2H8ffX3QlJiCGA0MQkyJKwugDK4oDKJpto2AkwEDHSo8IAgwpBRh1c5oGZZ0ARdQyyRESBR8UEZHSYCCogGBQNuwRUkpAFkxAlCEk6Z/5IES+x0/d2crtPV/F+eM5zq07VrTr34T7fnD51qm6klJAk9b2W3A2QpNcqA1iSMjGAJSkTA1iSMjGAJSmTtt4+weDdz3Kahf7OkhlX5m6C+qFBbcSGHqMnmfPXB6/c4PNtiF4PYEnqU1GeP+wNYEnVElk7tT1iAEuqFnvAkpSJPWBJyqSlNXcLGmYAS6oWhyAkKROHICQpE3vAkpSJPWBJysQesCRl4iwIScrEHrAkZdLiGLAk5WEPWJIycRaEJGXiRThJysQhCEnKxCEIScrEHrAkZWIPWJIyKVEPuDwtlaRGtLQ2XuqIiD9ExEMR8ZuIeKCo2ywi7oiIJ4vXTYv6iIgrImJWRMyMiD3qNnWDP6wk9SfR0nhpzAEppd1SSm8v1icB01NKY4HpxTrA4cDYonQAX6t3YANYUrVENF7WzzhgSrE8BTiqpv6babX7gOERMaK7AxnAkqqlBz3giOiIiAdqSsdaR0vA/0bEr2q2taeU5hXL84H2YnkkMLvmvXOKunXyIpykaulBzzalNBmY3M0u+6WU5kbElsAdEfH4Wu9PEZHWr6H2gCVVTRPHgFNKc4vXhcAtwF7AgleGForXhcXuc4HRNW8fVdStkwEsqVKipaXh0u1xIjaOiNe9sgwcAjwMTAMmFLtNAKYWy9OA04rZEPsAS2uGKrrkEISkSonm3YjRDtxSHK8N+HZK6UcRMQO4OSImAn8ETij2vx04ApgFvAicXu8EBrCkamlS/qaUngbe2kX9IuCgLuoTcGZPzmEAS6qUJvaAe50BLKlSDGBJyqSlzsW1/sQAllQt5ekAG8CSqsUhCEnKxACWpEwMYEnKxACWpEyixQCWpCzsAUtSJgawJOVSnvw1gCVViz1gScrEAJakTHwWhCTlUp4OsAEsqVocgpCkTAxgScrEAJakTLwVuSIe/+HF/GXZy3SuWsXKzlXsd8p//N0+73zbWP7zE8cyoK2VRc+/wCEf+tIGnXPggDau/typ7L7z1ixeuox/vOAanpm3mAP33onPnf1+Bg5oY/mKlXzyiz/gpzN+t0HnUn73/PxnXHrJF1jVuYqjjz2eiWd05G5S6dkDrpDDOr7EoueXdbltk6GD+dInT2DcmV9l9vwlbLHp0IaPu/WIzbjqs6dy6BmvDuwPHPUOlvzlr7xp3MUcf+jb+MI54zh10rUsev4Fjjv368x7bim7bD+CW796Jtsf+q8b9NmUV2dnJ//+hc/y9auupb29nZNPPI79DziQ7ceMyd20UitTAJdnwlw/dOLhb2fq9N8ye/4SAJ5b8sKabeOP2JOfX/9x7rtxEl++aDwtDf5ZdOT+b+GGW+8H4Pv/9yD777UjAL99Yg7znlsKwKNPzWPQRgMYOMB/P8vs4YdmMnr0Gxk1ejQDBg7ksCPey113Ts/drNKLiIZLbnUDOCJ2iogLIuKKolwQETv3ReNySylx61fP4p4bzueDx+z7d9vHvnFLhg8bwo+vOod7bjifk4/cC4Adt23nuEP24IDTL2Of8ZfQuWoV44/Ys6FzvmHLTZhTBHpn5yr+/MJf2Xz4xq/a5+iDd+M3j89m+YqVG/gJldPCBQvYasRWa9a3bG9nwYIFGVtUEdGDklm3XaiIuAA4CbgR+GVRPQr4TkTcmFK6ZB3v6wA6ANpG7U/b63dtXov70EGnX86zzy1li02Hctt/n8UTf5jPPb9+as32ttYW9th5NIf/05cZPGgAd035GL+c+QcO2GtH9thla+7+1vkADN5oAM8tXt07vum/zuCNIzdn4IBWRm+1GffdOAmAr3z7Lq6fdl/dNu283VZ8/uxxHPnRr/TCJ5bKrz/0bBtV72/YicCuKaUVtZURcRnwCNBlAKeUJgOTAQbvflZqQjuzeLb4k/+5JS8w7Scz2XPXbV4VwHMXPs+ipct48aXlvPjScu7+9SzessNIIoJv3Xo/n/7ytL875okfuwpY9xjwswuXMmqrTZm78HlaW1sYNnTwmjHokVsO56bLOvjQp67n93P+1FsfW31ky/Z25s+bv2Z94YIFtLe3Z2xRNTQ63Ncf1BuCWAW8oYv6EcW2yhoyaCBDh2y0Zvngd+zEI089+6p9br1rJv+w2/a0trYweNAA9nzTNjz++/nc+csnOPrg3dZclNt02BC2HrFpQ+f94U8f4pT37Q3AMQfvvmamwyZDB/P9L3+YT10xlV/89ulmfUxltOub3swzz/yBOXNms2L5cn50+w959wEH5m5W6ZVpDLheD/hcYHpEPAnMLuq2BsYAZ/Viu7LbcvPXcdNlZwDQ1trKTf/zAHfc+xgfOm4/AL7x3bt54vcLuOPeR5lx84WsWpW47pZ7efSpeQBc/JXbuPVrZ9ESwYqVnfzLJTfzzLwldc973Q/u5ZrPn8bDUz/Dkj8v49RJ1wLw4fHvYvvRW3Bhx+Fc2HE4AO/7yJWvuvCncmlra+PCiz7NRzo+xKpVnRx19LGMGTM2d7NKrx/kasMipe5HCCKiBdgLGFlUzQVmpJQ6GzlBmYcg1HuWzLgydxPUDw1q2/BLYzte8OOGM+eJSw/NGtd15zGllFYB9a8OSVI/UKYesPOAJVVKS0s0XBoREa0R8WBE3FasbxsR90fErIi4KSIGFvUbFeuziu3b1G3rhnxQSepvmh3AwDnAYzXrlwKXp5TGAEtYPVuM4nVJUX95sV/3bW34U0lSCUQ0XuofK0YB7wW+UawHcCDw3WKXKcBRxfK4Yp1i+0FRZ6qFASypUnoyDS0iOiLigZqy9tOQvgicz9+m3W4OPJ9SeuU21Dn8bYLCSIrZYsX2pcX+6+TDBCRVSk/m99beNNbFcY4EFqaUfhUR+zelcWsxgCVVShNnQewLvD8ijgAGAcOALwHDI6Kt6OWOYvXUXIrX0cCciGgDNgEWdXcChyAkVUqzLsKllC5MKY1KKW0DjAd+klI6BbgTOK7YbQIwtVieVqxTbP9JqnOjhQEsqVL64FbkC4DzImIWq8d4ry7qrwY2L+rPAybVO5BDEJIqpTduxEgp3QXcVSw/zeq7g9fe5yXg+J4c1wCWVCn94SE7jTKAJVVKifLXAJZULfaAJSmTMj2Q3QCWVCkl6gAbwJKqxSEIScqkRPlrAEuqFnvAkpSJASxJmTgLQpIyKVEH2ACWVC0OQUhSJiXKXwNYUrW0lCiBDWBJleJFOEnKpET5awBLqhYvwklSJiXKXwNYUrUE5UlgA1hSpTgGLEmZOAtCkjJxHrAkZVKi/DWAJVWL09AkKZMS5a8BLKlaWkuUwAawpEpxCEKSMinRLDQDWFK1lKkH3JK7AZLUTBGNl+6PE4Mi4pcR8duIeCQiLi7qt42I+yNiVkTcFBEDi/qNivVZxfZt6rXVAJZUKRHRcKnjZeDAlNJbgd2AwyJiH+BS4PKU0hhgCTCx2H8isKSov7zYr1sGsKRKaW2Jhkt30movFKsDipKAA4HvFvVTgKOK5XHFOsX2g6JOyhvAkiolelIiOiLigZrS8apjRbRGxG+AhcAdwFPA8ymllcUuc4CRxfJIYDZAsX0psHl3bfUinKRK6cmzIFJKk4HJ3WzvBHaLiOHALcBOG9q+WvaAJVVKsy7C1UopPQ/cCbwDGB4Rr3ReRwFzi+W5wOjVbYg2YBNgUXfHNYAlVUqzLsJFxBZFz5eIGAy8B3iM1UF8XLHbBGBqsTytWKfY/pOUUuruHA5BSKqUJk4DHgFMiYhWVndWb04p3RYRjwI3RsTngQeBq4v9rwauj4hZwGJgfL0TGMCSKqXe7IZGpZRmArt3Uf80sFcX9S8Bx/fkHAawpEop051wvR7AS2Zc2dunUAl1PzImrb8yXdiyByypUuwBS1ImPg1NkjJp1kW4vmAAS6qUEuWvASypWko0BGwAS6qWnjwLIjcDWFKlOA1NkjIpUQfYAJZULc6CkKRMSpS/BrCkavEinCRlUqL8NYAlVYtDEJKUSVCeBDaAJVVKW4kmAhvAkirFx1FKUiaOAUtSJiXqABvAkqrFecCSlEmrF+EkKY8Wp6FJUh4lGoEwgCVVi7MgJCkTL8JJUiYlyl8DWFK1+EB2ScqkRLPQDGBJ1VKmZ0GU6R8LSaorelC6PU7E6Ii4MyIejYhHIuKcon6ziLgjIp4sXjct6iMiroiIWRExMyL2qNdWA1hSpbRENFzqWAl8LKW0C7APcGZE7AJMAqanlMYC04t1gMOBsUXpAL5Wt63r9xElqX9qVg84pTQvpfTrYvkvwGPASGAcMKXYbQpwVLE8DvhmWu0+YHhEjOjuHAawpEppaYmGS0R0RMQDNaWjq2NGxDbA7sD9QHtKaV6xaT7QXiyPBGbXvG1OUbdOXoSTVCk96VWmlCYDk7vbJyKGAt8Dzk0p/bn2Il9KKUVEWq+GYgBLqphmzoKIiAGsDt8bUkrfL6oXRMSIlNK8YohhYVE/Fxhd8/ZRRd06OQQhqVKaOAsigKuBx1JKl9VsmgZMKJYnAFNr6k8rZkPsAyytGarokj1gSZXSxB7wvsCpwEMR8Zui7pPAJcDNETER+CNwQrHtduAIYBbwInB6vRMYwJIqpbVJAZxSupt1d5QP6mL/BJzZk3MYwJIqpTz3wRnAkiqmRHciG8CSqsWfJJKkTOwBS1ImYQ9YkvJo1iyIvmAAS6qUEuWvASypWgxgScrEMWBJyqREv8lpAEuqlgZ+6aLfMIAlVYpDEHqV+fPmcdGF57N40SKI4LjjT+CUUyfUf6Mq7eWXX+aDE05hxfLlrOzs5OD3HMpHzzo7d7NKzyEIvUprWysfP38SO++yK8uWvcD4449ln3fsy/ZjxuRumjIaOHAgV10zhSFDNmbFihWcftrJ7PfOd/GWt+6Wu2mlVqYesA9k7wNbbLElO++yKwAbbzyU7bbbjoULF2RulXKLCIYM2RiAlStXsnLlyqb+msNrVUTjJTd7wH1s7tw5PP7YY7z5LW/N3RT1A52dnZx0wjHMfuYZTjzpZL8XTdAPcrVh690Djoh1Pu299pdGr76q29+7e015cdkyPnbu2Xxi0icZOnRo7uaoH2htbeXm703lx9N/ysMPzWTWk7/L3aTSa41ouOS2IT3gi4Fru9pQ+0ujL61kvX8xtEpWrFjBeeeezRHvfR8Hv+eQ3M1RPzNs2DD23Gtv7rn754wZu0Pu5pRb/lxtWLcBHBEz17UJaG9+c6oppcS/ffoitttuO077QN2fidJrxOLFi2lra2PYsGG89NJL3PeLezn9g2fkblbplekiXL0ecDtwKLBkrfoA7u2VFlXQg7/+FbdNm8rYHXbghGPGAfDP557HO9/17swtU05/em4hn7poEqs6O1mVEoccehjv2v+A3M0qvX4wstCwWP07cuvYGHE1cG3x43Rrb/t2SunkeidwCEJd6eZrp9ewwQM2vPs64+mlDX+79txuk6xx3W0POKU0sZttdcNXkvpciXrATkOTVCk+C0KSMilP/BrAkqqmRAlsAEuqlCpNQ5OkUinRELABLKlaDGBJyqRMQxA+jlJSpTTzcZQRcU1ELIyIh2vqNouIOyLiyeJ106I+IuKKiJgVETMjYo96xzeAJVVK9KA04DrgsLXqJgHTU0pjgenFOsDhwNiidABfq3dwA1hStTQxgVNKPwMWr1U9DphSLE8Bjqqp/2Za7T5geESM6O74BrCkSome/Ffz7PKidDRwivaU0rxieT5/ezLkSGB2zX5zirp18iKcpErpyY9y1j67fH2klFJErPejpewBS6qWJg8Cd2HBK0MLxevCon4uMLpmv1FF3ToZwJIqpSdDEOtpGjChWJ4ATK2pP62YDbEPsLRmqKJLDkFIqpRm3ogREd8B9gdeHxFzgM8AlwA3R8RE4I/ACcXutwNHALOAF4G6P3/T7QPZm8EHsqsrPpBdXWnGA9l/N//Fhr9dO2w1pP8+kF2SSqc8N8IZwJKqxQeyS1Im5YlfA1hS1ZQogQ1gSZVSpqehGcCSKqVEQ8AGsKRqMYAlKROHICQpE3vAkpRJifLXAJZULfaAJSmb8iSwASypUnryQPbcDGBJleIQhCRl4jQ0ScqlPPlrAEuqlhLlrwEsqVocA5akTKJECWwAS6qU8sSvASypYkrUATaAJVWL09AkKRN7wJKUiQEsSZk4BCFJmdgDlqRMSpS/BrCkiilRAhvAkirFMWBJyqRMD2Rvyd0ASWqq6EGpd6iIwyLiiYiYFRGTmt1UA1hSpUQP/uv2OBGtwFeAw4FdgJMiYpdmttUAllQpEY2XOvYCZqWUnk4pLQduBMY1s629PgY8qK1EI+K9LCI6UkqTc7dD/Yvfi+bqSeZERAfQUVM1ueb/xUhgds22OcDeG97Cv7EH3Lc66u+i1yC/F5mklCanlN5eU/r0H0IDWJK6NhcYXbM+qqhrGgNYkro2AxgbEdtGxEBgPDCtmSdwHnDfcpxPXfF70Q+llFZGxFnAj4FW4JqU0iPNPEeklJp5PElSgxyCkKRMDGBJysQA7iO9fUujyiciromIhRHxcO62KA8DuA/0xS2NKqXrgMNyN0L5GMB9o9dvaVT5pJR+BizO3Q7lYwD3ja5uaRyZqS2S+gkDWJIyMYD7Rq/f0iipfAzgvtHrtzRKKh8DuA+klFYCr9zS+Bhwc7NvaVT5RMR3gF8AO0bEnIiYmLtN6lveiixJmdgDlqRMDGBJysQAlqRMDGBJysQAlqRMDGBJysQAlqRM/h/u8LJ3c5gOMwAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "#predict the test set and plot results\n", + "y_test_pred = model.predict(data_test_fft_nn)\n", + "y_test_pred = (y_test_pred > 0.5)\n", + "\n", + "\n", + "accuracy = accuracy_score(y_test, y_test_pred)\n", + "print(\"accuracy : \", accuracy)\n", + "\n", + "print(classification_report(y_test, y_test_pred, target_names=[\"NO exoplanet confirmed\",\"YES exoplanet confirmed\"]))\n", + "\n", + "conf_matrix = confusion_matrix([int(x) for x in y_test ], [int(y) for y in y_test_pred ])\n", + "sns.heatmap(conf_matrix, annot=True, cmap='Blues')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Result \n", + "The CNN has achieved good results. With all the problems written above with a sufficient number of epochs (according to our tests they must be greater than 15) the model is able to detect 5 exoplanet in the test set ( 100 % accuracy ).\n", + "\n", + "# Future work\n", + "In the field of applied astrophysics the transit detection of exoplanets is an important technique for the discovery of new planets. It would be very interesting to test our models ( in particular CNN ) in new larger datasets. Recall that in the MAST archive ( Mikulski Archive for Space Telescopes ) it is possible to obtain the light curves of thousands of other stars collected by the kepler telescope. However these data are not in a form directly usable by a machine learning model and therefore would require to be downloaded and processed to extract the light curves. In this project due to a limited time it was not possible to test the models with new datasets. In case someone would like to do this the code has been constructed to be able to change a number of reduced variables to adapt the models to new datasets." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# **THANKS YOU**" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}