Stock price forecasting using LSTM deep learning algorithm in python, using PyTorch and PyTorch Lightning.
This project is an attempt at using Long-Short-Term-Memory neural network algorithm to predict stock prices. The predictions are made using historical stock price data scraped from yahoo.finance.com using BeautifulSoup python library
Data Scraping from Yahoo Finance
Libraries used:
* requests: For sending HTTP requests and retrieving web content.
* BeautifulSoup (bs4): For parsing HTML content.
* pandas: For data manipulation and analysis.
* os.path: For path manipulation operations.
* webbrowser: For opening web pages in a browser.
Functions:
scrape_data(header):
* Input: header (dict) - HTTP request headers.
* Scrapes current data of technology sector stocks from Yahoo Finance, extracts relevant information from the HTML content, and saves it to a CSV file.
scrape_historical_data(header, symbol, save_dir):
* Input: header (dict) - HTTP request headers, symbol (str) - Stock symbol for which historical data is to be scraped, save_dir (str) - Directory path to save the downloaded file.
* Scrapes historical data for the specified stock symbol from Yahoo Finance, extracts the download link from the HTML content, and downloads the CSV file containing historical data.
filter_data(files):
* Input: files (list) - List of file names containing historical data.
* Description: Reads each CSV file, filters out the first 750 records, and saves the filtered data back to the respective files. This step reduces the dataset size by removing older records.
Usage:
* Make sure the necessary libraries are installed (requests, beautifulsoup4, pandas).
* Ensure the Yahoo Finance URLs are accessible.
* Adjust parameters such as header, symbol, and save_dir according to your requirements.
* Run the script.
LSTM Stock Price Prediction Model
Libraries used:
* pandas: For data manipulation and analysis.
* numpy: For numerical operations and array manipulation.
* scikit-learn (sklearn): For data preprocessing tasks such as scaling and train-test split.
* torch: Core library for building and training neural networks.
* torch.nn: Module containing various neural network layers and loss functions.
* pytorch_lightning: A lightweight PyTorch wrapper for high-performance neural network training.
* TensorBoardLogger: Logger for PyTorch Lightning that logs metrics for visualization in TensorBoard.
Functions:
data_windowing(file, window_size):
* Input: file (str) - Name of the CSV file containing historical stock data, window_size (int) - Size of the input sequence/window.
* Output: Returns input sequences and corresponding output labels for training the LSTM model.
* Reads the CSV file, extracts the 'Close' prices, and creates input-output pairs by sliding a window over the data.
generate_predictions(model, dataloader):
* Input: model (LSTMModel) - Trained LSTM model, dataloader (DataLoader) - PyTorch DataLoader containing the data for prediction.
* Output: Returns arrays of predictions and actual values.
* Runs the trained model on the data provided by the dataloader to generate predictions. It iterates over batches of data, computes predictions using the model, and appends them to a list. After processing all batches, it concatenates the predictions and actual values into numpy arrays.
* plot_predictions(predictions, actuals, title):
* Input: predictions (ndarray) - Array of predicted values, actuals (ndarray) - Array of actual values, title (str) - Title for the plot.
* Plots the predicted values against the actual values over time. It visualizes the trend and performance of the model's predictions compared to the ground truth. The plot includes a legend to distinguish between actual and predicted values, and axis labels for clarity.
Classes:
LSTMModel(pl.LigtningModule):
Description: Defines the LSTM model architecture inheriting from PyTorch Lightning's LightningModule.
Attributes:
* lstm: LSTM layer with specified input size and hidden size.
* linear: Linear layer for output prediction.
Methods:
* __init__(self, input_size, hidden_size): Constructor method to initialize the model.
* forward(self, input): Forward pass through the model.
* configure_optimizers(self): Configures the optimizer for training.
* training_step(self, batch, batch_idx): Defines a single training step.
*validation_step(self, batch, batch_idx): Defines a single validation step
Usage:
* Ensure the historical stock data is available in a CSV file.
* Adjust the parameters such as window_size, hidden_size, and max_epochs according to your requirements.
* Run the script.
Conclusions: LSTMs are widely employed for time series forecasting, yet they encounter challenges in accurately predicting stock price data owing to the inherent randomness and complexity of the stock market. A relatively straightforward deep learning algorithm relying solely on past prices cannot suffice for predicting a domain as intricate as the stock market. The graph illustrates the disparities between the actual observed price values and the predictions generated by the LSTM: As depicted, the algorithm tends to produce predictions closely resembling the actual prices observed on the preceding day. Overall, while LSTM algorithms can capture certain patterns in stock price data, their ability to accurately predict future prices is constrained by the inherent challenges and complexities of financial markets.