Hybrid Text Summarization through Reinforcement Learning

This repository contains the code for the "Deep Natural Language Processing" final project at Politecnico di Torino during the academic year 2021/2022.
We explored the hybrid neural summarization architecture proposed by Zmandar et al [1], starting from the codebase of Chen [2]. This novel approach has an extractor agent that filters the most salient information abstractor agent that will paraphrase them. Then, a reinforcement learning agent will reward the produced output for jointly learning both agents. We explored this architecture across two different domains and with two different reinforcement learning policies. We proved that the performances dropped with respect to Rouge-L score on a different domain and with less number of summary sentences as a reference. Moreover, on a randomly selected subsample we showed that despite a lower Rouge-L we obtain comparable results on BERTScore, that takes into account the context and the semantic of the produced summaries.

📜 Report
💻 Slide

Install dependencies

A full requirements.txt is already provided. Therefore you can easily create your virtual environment and install the provided dependencies.

python -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
python setup.py develop

Metrics

Two main metrics are inspected on our study: Rouge-L and BERTScore. The former measures the longest common subsequence between the ground-truth text and the output generated by the model whereas the latter considers both syntactic overlapping between hypothesis and reference and the context.

However, the extraction of the BERTScore is really expensive, therefore a "small" subsample was extracted from the proposed datasets in order to evaluate the performances.

In order to compute the proposed scores we used pyrouge and bertscore.

Datasets

As reported here, two main datasets are used for the proposed experiments, namely Financial Narrative Summarisation and CNN/Daily Mail).

Due to the computational limitations of our machines, for the experiments we used two different configurations, called "small", and "large", reflecting the dimension of the extracted sub-samples. Below we reported the general information. Further details about the motivations behind these settings and the preprocessing, please read section X.Y of the report.

Note: for the same reason, the "Large" dataset from CNN/DailyCNN is a random subsample extracted from the full dataset, containing more than 300k news.

	FNS		CNN/Daily
Split	Large	Small	Large	Small
train	2,550	300	10,000	1,200
val	450	50	1,000	100
test	363	50	1,000	150

Due to the random extraction of FNS validation set and CNN/Daily subsample, we reported the ids used on each data split under the dataset folder.

Preprocessing

Once the data has been downloaded, it has to be preprocessed (see the report for further details). Then, the labels will be extracted according to the selected metric.

Split the data (only for FNS)

python ./src/split_train_val.py \
      --dataset_path=<path to the selected dataset> \
      --suffix=<trainval's foldername> \
      --reports_folder=<name of reports subfolder> \
      --summaries_folder=<name of summaries subfolder> \

Preprocess the data

python ./scripts/preprocess_text[_dailycnn].py \
      --dataset_path=<path to the selected dataset> \
      --preprocessed_path=<output path of the preprocessed dataset> \
      --filtered_path=<output path of the filtered dataset> [not for daily]\
      --reports_folder=<name of reports subfolder> \
      --summaries_folder=<name of summaries subfolder>

Extract labels

python ./scripts/extract_labels.py \
      --dataset_path=<path to the selected dataset> \
      --destination_path=<output path> \
      --dataset_split=<train/val/test (list)> \
      --reports_folder=<name of reports subfolder> \
      --summaries_folder=<name of summaries subfolder>

Training

The selected hyperparameters are repoted in the Appendix of our paper. Moreover, here you can find all the pretrained models!

Train Gensim Word2Vec

python ./scripts/train_word2vec.py \
      --corpus_path=<path to fullcorpus.txt (inside the preprocessed and/or filtered folder)> \
      --destination_path=<output path of the w2v model> \
      --vector_size=<dimension of the embedding>

Train extractor

python ./scripts/train_extractor_ml.py \
      --data_path=<path of the extracted labels> \
      --path=<output path of the checkpoints and logs> \
      --w2v=<w2v filepath | extenaion .model > \
      --emb_dim=<dimension of the embedding>

Train abstractor

python ./scripts/train_abstractor.py \
      --data_path=<path of the extracted labels> \
      --path=<output path of the checkpoints and logs> \
      --w2v=<w2v filepath> \
      --emb_dim=<dimension of the embedding>

Train Reinforcement Learning

!python ./scripts/train_full_rl.py \
        --data_path=<path of the extracted labels> \
        --ext_dir=<path (root) of the extractor checkpoints> \
        --abs_dir=<path (root) of the abstractor checkpoints> \
        --path=<path of the checkpoints and output labels>
        --ckpt_freq=<number of batches between two checkpoints>  \
        --batch=<batch size>
        --n_sentences=<maximum number of sentences per file> \
        --reward=<bert/rouge>

Inference and evaluation

Inference on the test set and evaluate the results according to both Rouge-L and BERT.

!python ./src/inference.py \
        --output_path=<path of the model's outputs> \
        --model_dir=<path (root) of the RL checkpoints> \
        --data_path=<path of the extracted labels> \
        --n_sentences=<maximum number of sentences per file>

Results

The following table summarizes the Rouge scores obtained on the entire datasets (large).

FNS		DailyCNN
Only Extractor	Full pipeline	Only Extractor	Full pipeline
0.36	0.38	0.20	0.23

While in the upcoming we have the cross-evaluation on dataset samples (small) performed by using different reinforcement rewards and extracted labels.

	FNS		DailyCNN
Extracted labels & RL Policy	Rouge-L	BERT score	Rouge-L	BERT score
Rouge-L	0.27	0.80	0.09	0.78
BERT score	0.26	0.81	0.10	0.78

Pre trained models

In this shared folder you can find the pre trained models used for all the main experiments reported above and on our paper. In particular there are combinations of datasets (fns,cnndaily) and metrics used for the extracted labels and reinforcement learning policies.

References

The main references followed for the proposed project are:

Main paper : Joint abstractive and extractive method for long financial document.
Support paper : Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting.
Initial codebase : repository.

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
assets		assets
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hybrid Text Summarization through Reinforcement Learning

Install dependencies

Metrics

Datasets

Preprocessing

Training

Inference and evaluation

Results

Pre trained models

References

Contributors

About

Releases

Packages

Languages

License

francescodisalvo05/nlp-financial-summarization-rl

Folders and files

Latest commit

History

Repository files navigation

Hybrid Text Summarization through Reinforcement Learning

Install dependencies

Metrics

Datasets

Preprocessing

Training

Inference and evaluation

Results

Pre trained models

References

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages