You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To better understand the relationships between various features and the diameter, we graphed several feature correlations. This graphical analysis aids in identifying potential relationships and patterns that might not be immediately evident through raw data or simple statistical summaries.
160
+
161
+

162
+
159
163
160
164
#### Data Preproccessing:
161
165
@@ -166,6 +170,8 @@ We utilized a heatmap to visualize the correlations between different features i
166
170
5. Removed observations in the top 5% of 'a' and 'diameter'.
167
171
6. Normalized data using MinMaxScaler
168
172
173
+
174
+
169
175
Here is a link to the notebook where we carried out the data exploration and data preprocessing steps :
@@ -318,7 +324,30 @@ This deep neural network model predicts diameter of asteroid using features `moi
318
324
### Discussion:
319
325
320
326
321
-
The project began with our comprehensive data exploration and preprocessing which reduced our data set size quite a lot. After looking into how to check for data culling issies we settled on the Kolmogorov-Smirnov test to make sure that our data distribution was not heavily affected, and thankfully it was not. We started with a basic linear regression model but quickly moved on to higher degree polynomial regression models because the relationship between absolute magnitude and our diameter was nonlinear. However, while this was a decent model, for this project we wanted to come up with a more original approach to make our research more useful.
327
+
The project began with our comprehensive data exploration and preprocessing which reduced our data set size quite a lot. After looking into how to check for data culling issies we settled on the Kolmogorov-Smirnov test to make sure that our data distribution was not heavily affected, and thankfully it was not.
328
+
329
+
**Distribution Difference after dropping NAN values in preproccesing**
330
+
331
+
- Histogram of q:
332
+
<imgsrc="https://github.com/harshilxd/Asteroid-Feature-Prediction/blob/ad64fd0dae7179fc48cc827d7dcccacfba86e356/diagrams/heatmaps/q%20before%20and%20after%20drop.png"alt="histogram of q"width="700"/>
333
+
334
+
- Histogram of H:
335
+
<imgsrc="https://github.com/harshilxd/Asteroid-Feature-Prediction/blob/ad64fd0dae7179fc48cc827d7dcccacfba86e356/diagrams/heatmaps/h%20before%20and%20after%20drop.png"alt="histogram of q"width="700"/>
336
+
337
+
- Histogram of moid:
338
+
<imgsrc="https://github.com/harshilxd/Asteroid-Feature-Prediction/blob/ad64fd0dae7179fc48cc827d7dcccacfba86e356/diagrams/heatmaps/moid%20before%20and%20after%20drop.png"alt="histogram of q"width="700"/>
339
+
340
+
341
+
These visualizations provide several insights:
342
+
343
+
-**Identifying Outliers:**
344
+
- Scatter plots help in easily identifying any outliers that may exist in the data, which could potentially skew the analysis or indicate errors or special cases.
345
+
346
+
-**Understanding Distribution:**
347
+
- The spread and clustering of points in these graphs can provide an understanding of how uniformly or variably the features are distributed.
348
+
349
+
350
+
We started with a basic linear regression model but quickly moved on to higher degree polynomial regression models because the relationship between absolute magnitude and our diameter was nonlinear. However, while this was a decent model, for this project we wanted to come up with a more original approach to make our research more useful.
322
351
323
352
We settled on using a neural network for our first model because we wanted to model more complex relationships and our regression was not giving us good enough results. We were able to use many features in combination to capture the complex relationship for prediction, and then we used hyperparameter tuning to further optimize the model.
0 commit comments