Skip to content

Commit bfeeea4

Browse files
authored
Update README.md
1 parent 61f400c commit bfeeea4

File tree

1 file changed

+30
-1
lines changed

1 file changed

+30
-1
lines changed

README.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,10 @@ The dataset that we chose comprises of 839,714 observations and 31 features. Her
156156
We utilized a heatmap to visualize the correlations between different features in the dataset.
157157
<img src = "https://github.com/harshilxd/Asteroid-Feature-Prediction/blob/3315ed5312a593256612473dce74bb4eb014859e/diagrams/heatmaps/heatmap_full.png" alt = "correlation heatmap">
158158

159+
To better understand the relationships between various features and the diameter, we graphed several feature correlations. This graphical analysis aids in identifying potential relationships and patterns that might not be immediately evident through raw data or simple statistical summaries.
160+
161+
![diagrams/heatmaps/diameter vs graphs.png](https://github.com/harshilxd/Asteroid-Feature-Prediction/blob/0a965110aba28f2dfd9f969f2118bf6372483734/diagrams/heatmaps/diameter%20vs%20graphs.png)
162+
159163

160164
#### Data Preproccessing:
161165

@@ -166,6 +170,8 @@ We utilized a heatmap to visualize the correlations between different features i
166170
5. Removed observations in the top 5% of 'a' and 'diameter'.
167171
6. Normalized data using MinMaxScaler
168172

173+
174+
169175
Here is a link to the notebook where we carried out the data exploration and data preprocessing steps :
170176
[Data Exploration Notebook](https://github.com/harshilxd/Asteroid-Feature-Prediction/blob/3315ed5312a593256612473dce74bb4eb014859e/source/Data_Exploration.ipynb)
171177

@@ -318,7 +324,30 @@ This deep neural network model predicts diameter of asteroid using features `moi
318324
### Discussion:
319325

320326

321-
The project began with our comprehensive data exploration and preprocessing which reduced our data set size quite a lot. After looking into how to check for data culling issies we settled on the Kolmogorov-Smirnov test to make sure that our data distribution was not heavily affected, and thankfully it was not. We started with a basic linear regression model but quickly moved on to higher degree polynomial regression models because the relationship between absolute magnitude and our diameter was nonlinear. However, while this was a decent model, for this project we wanted to come up with a more original approach to make our research more useful.
327+
The project began with our comprehensive data exploration and preprocessing which reduced our data set size quite a lot. After looking into how to check for data culling issies we settled on the Kolmogorov-Smirnov test to make sure that our data distribution was not heavily affected, and thankfully it was not.
328+
329+
**Distribution Difference after dropping NAN values in preproccesing**
330+
331+
- Histogram of q:
332+
<img src="https://github.com/harshilxd/Asteroid-Feature-Prediction/blob/ad64fd0dae7179fc48cc827d7dcccacfba86e356/diagrams/heatmaps/q%20before%20and%20after%20drop.png" alt="histogram of q" width="700"/>
333+
334+
- Histogram of H:
335+
<img src="https://github.com/harshilxd/Asteroid-Feature-Prediction/blob/ad64fd0dae7179fc48cc827d7dcccacfba86e356/diagrams/heatmaps/h%20before%20and%20after%20drop.png" alt="histogram of q" width="700"/>
336+
337+
- Histogram of moid:
338+
<img src="https://github.com/harshilxd/Asteroid-Feature-Prediction/blob/ad64fd0dae7179fc48cc827d7dcccacfba86e356/diagrams/heatmaps/moid%20before%20and%20after%20drop.png" alt="histogram of q" width="700"/>
339+
340+
341+
These visualizations provide several insights:
342+
343+
- **Identifying Outliers:**
344+
- Scatter plots help in easily identifying any outliers that may exist in the data, which could potentially skew the analysis or indicate errors or special cases.
345+
346+
- **Understanding Distribution:**
347+
- The spread and clustering of points in these graphs can provide an understanding of how uniformly or variably the features are distributed.
348+
349+
350+
We started with a basic linear regression model but quickly moved on to higher degree polynomial regression models because the relationship between absolute magnitude and our diameter was nonlinear. However, while this was a decent model, for this project we wanted to come up with a more original approach to make our research more useful.
322351

323352
We settled on using a neural network for our first model because we wanted to model more complex relationships and our regression was not giving us good enough results. We were able to use many features in combination to capture the complex relationship for prediction, and then we used hyperparameter tuning to further optimize the model.
324353

0 commit comments

Comments
 (0)