Sequence to Sequence Models take a sequence of items (words, letters, time series, etc) and output another sequence of items. They are also known as Encoder-Decoder models because they use both parts of the Transformer architecture.
Such models are best suited for tasks revolving around generating new sentences depending on a given input, such as summarization, translation, or generative question answering.
This project showcases a Text Summarizer which as the name suggests, outputs a summary for a given text input. To take it up a notch, this particular summarizer has been fine tuned specifically to generate a title for a given review.
Input : Review for a product
Output : Meaningful short summary for the review
Amazon Multilingual Reviews Dataset
A multilingual Text-to-Text Transfer Transformer (mT5) model has been used in this project.
mT5 is basically a multilingual variant of T5 that has been pre-trained on a Common Crawl-based dataset covering 101 languages. The model architecture and training procedure that we use for mT5 closely follows that of T5.
T5 is a pre-trained language model whose primary distinction is its use of a unified “text-to-text” format for all text-based NLP problems. This approach is natural for generative tasks where the task format requires the model to generate text conditioned on some input.
Given the sequence-to-sequence structure of this task format, T5 uses a basic encoder-decoder Transformer architecture as proposed by Vaswani et al. (2017)
1. Install the required modules
To get started, clone this repository and run the below command to make sure all required modules are installed.
pip install -r requirements.txt
2. Run driver.py
Commonly modified arguments have been configured in argument_parser.py
to be passed as command line arguments.
model_card
Model to be used, default = "google/mt5-small"batch_size
Size of batch, default = 32weight_decay
Weight decay, default = 0.01learning_rate
Learning rate, default = 5.6e-5save_total_limit
Number of checkpoints to save, default = 3num_train_epochs
Number of training epochs, default = 3output_dir
Output Directory, default = "."
Note: All above mentioned arguments are optional, to be used as and when required.
Example:
python driver.py --model_card "google/mt5-base" --learning_rate 2e-5 --batch_size 16 --num_train_epochs 4
Some outputs of the final model are shown below.
Note: Original label shows the original title from the dataset and review is the input for the model.