Data Preprocessing and Cleanup for SBIN Historical Stock Data #96
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fixes #13
SBIN_cleaned.csv
This pull request addresses the preprocessing and cleanup of the SBIN historical stock dataset. The following steps were completed:
Date Column Formatting: Converted the 'Date' column from string format to a datetime object for easier analysis and consistency.
Missing Data Handling: Removed rows containing missing values across multiple columns including 'Open', 'Close', 'High', 'Low', 'Adj Close', and 'Volume'.
Feature Engineering: Added new columns for enhanced data analysis:
Price Range: Difference between the daily 'High' and 'Low' prices.
Daily Return: Percentage change between the 'Open' and 'Close' prices.
Descriptive Statistics: Basic statistical summary was generated to better understand the data distribution after cleanup.
The cleaned dataset is now ready for further analysis, including trend visualization, correlation studies, and volatility analysis.
Changes Made:
Added SBIN_cleaned.csv which contains the preprocessed data.
Removed missing data and ensured the dataset is suitable for analytical tasks.
Impact:
This PR ensures that the dataset is clean, properly formatted, and enriched with additional features, which will improve the accuracy and effectiveness of any future analysis or modeling.
Fixes #13
This pull request includes the exploratory data analysis (EDA) performed on the cleaned SBIN historical stock dataset. The following tasks were completed as part of the EDA:
Descriptive Statistics: Summary statistics for key numerical columns such as Open, Close, High, Low, Volume, Price Range, and Daily Return.
Trend Visualization: A time series plot was created to show the trend of the stock’s closing price over time.
Correlation Analysis: A heatmap was generated to identify correlations between key stock variables (Open, Close, High, Low, Volume, etc.).
Volatility Analysis: Histograms were plotted to show the distribution of Price Range and Daily Return, giving insights into stock price volatility.
Changes Made:
Added EDA_SBIN_clean-checkpoint.ipynb notebook that contains the full EDA process, along with visualizations.
The analysis highlights important trends and patterns in the SBIN dataset.
Impact:
This PR provides insights into the behavior and volatility of SBIN stock, laying the groundwork for further predictive modeling or advanced analysis.