Next Word Predictor using LSTM and RNN

Overview

This project implements a Next Word Predictor using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) layers. The model is trained on Shakespeare's Hamlet text data to predict the next word in a given sequence of words.

Dataset

The dataset used for training is based on the text from Hamlet, one of Shakespeare's most famous plays. The text is preprocessed by:

Removing special characters and unnecessary whitespace
Converting to lowercase
Tokenizing into words and sequences

Model Architecture

The model is built using TensorFlow and Keras, consisting of the following layers:

Embedding Layer - Converts words into dense vector representations
LSTM Layers - Captures long-term dependencies in the text
Dense Layer - Outputs probabilities for the next word prediction

Installation

To run this project, install the required dependencies:

pip install tensorflow numpy pandas nltk

Usage

Run the script to train the model:

python train.py

After training, use the model for predictions:

from predictor import predict_next_word

text = "To be or not to"
next_word = predict_next_word(text)
print("Predicted next word:", next_word)

Performance

The model's accuracy improves with more training epochs and a larger dataset.
Since Shakespearean language is complex, results may vary depending on the training data's diversity.

Future Enhancements

Implementing a Transformer-based model for better contextual understanding
Fine-tuning the model on additional Shakespearean plays
Expanding the dataset for broader text generalization

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
app.py		app.py
experiemnts.ipynb		experiemnts.ipynb
hamlet.txt		hamlet.txt
next_word_lstm.h5		next_word_lstm.h5
next_word_lstm_model_with_early_stopping.h5		next_word_lstm_model_with_early_stopping.h5
requirements.txt		requirements.txt
tokenizer.pickle		tokenizer.pickle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Next Word Predictor using LSTM and RNN

Overview

Dataset

Model Architecture

Installation

Usage

Performance

Future Enhancements

About

Releases

Packages

Languages

Varun2010080023/Next-word-Predictor

Folders and files

Latest commit

History

Repository files navigation

Next Word Predictor using LSTM and RNN

Overview

Dataset

Model Architecture

Installation

Usage

Performance

Future Enhancements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages