README: XSUM Training Process

Overview

This document outlines the process of training a T5-small model for summarization using the XSUM dataset in three distinct rounds. Each round builds upon the output model from the previous round. Due to runtime restrictions, the training was conducted across multiple Google Colab accounts, requiring adjustments to model paths between rounds.

Dataset

Type: XSUM
Purpose: Summarization task
Characteristics: High-quality and concise summaries for articles

Model

Type: T5-small
Purpose: A lightweight and efficient model for text-to-text tasks, such as summarization, translation, Q&A, and text classification.
Characteristics:
- Developed by Google as part of the T5 (Text-to-Text Transfer Transformer) framework.
- Pre-trained on the Colossal Clean Crawled Corpus (C4).
- Converts all NLP tasks into a text generation problem.
- Designed for environments with limited computational resources.
Parameters:
- Total parameters: ~60 million
- Encoder layers: 6
- Decoder layers: 6
- Hidden size: 512
- Attention heads: 8
- Feedforward network size: 2048

Training Process

Round 1

Input Model: Pre-trained T5 model
Dataset: 'iohadrubin/mini_xsum'
Output Model Path: round1_output_model (save this model for subsequent rounds)
Colab Account: Account 1
Note: Ensure that the output model is downloaded and backed up for use in the next round.

Round 2

Input Model: round1_output_model
Dataset: 'woshityj/xsum_dataset'
Output Model Path: round2_output_model (save this model for subsequent rounds)
Colab Account: Account 2
Adjustments: Update the path to the round1_output_model to match the new Colab environment before starting training.

Round 3

Input Model: round2_output_model
Dataset: 'Kamaljp/xsum_3000'
Output Model Path: round3_output_model (final trained model)
Colab Account: Account 3
Adjustments: Update the path to the round2_output_model to match the new Colab environment before starting training.

Dataset

Type: CRAFT-Summarization
Purpose: Summarization task
Characteristics: High-quality long summaries for articles

Round 4

Input Model: round3_output_model
Dataset: 'ingoziegler/CRAFT-Summarization'
Output Model Path: round4_output_model (final trained model)
Colab Account: Account 4
Adjustments: Update the path to the round3_output_model to match the new Colab environment before starting training.

Validation and testing

-Once the final model (round3_output_model) is trained, it can be validated and tested on new article data for summarization. Ensure the model performs well on unseen examples, and adjust parameters as necessary.

Model demo

-After training the final model, you can use it to summarize new articles.

Key Considerations

Model Path Adjustments: As training spanned multiple Colab accounts, the output model paths from each round must be explicitly updated in the subsequent account.
- Example: Download round1_output_model from Account 1 and upload it to Account 2 before initiating Round 2.
Runtime Restrictions: Ensure sufficient runtime and GPU availability for each training session.
Consistency: Verify that the dataset remains consistent across rounds to ensure uniformity in training.

Final Output

The final model (round4_output_model) is trained through three successive rounds of fine-tuning using the XSUM dataset. It can now be used for high-quality summarization tasks.

Notes

Always verify that the correct model is loaded at the start of each round.
Maintain proper naming conventions and version control for models to avoid confusion.
For further fine-tuning or deployment, ensure the final model is accessible in a centralized location.
A Extra training round(4) was added to ensure the model's ability to handle longer summaries
All training could be done on a single account, I did it on 2 accounts, you can do it on one account with a subscription to colab or by waiting for available computing units from google colab

##Datasets and model

The pretrained model and datasets used were from 'huggingface.co'.

Contact

For issues or questions related to this training process, please reach out to the respective Colab account holders.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
model_demo.ipynb		model_demo.ipynb
pretrained_model.ipynb		pretrained_model.ipynb
t5_tester.ipynb		t5_tester.ipynb
t5_trainer_round_1.ipynb		t5_trainer_round_1.ipynb
t5_trainer_round_2.ipynb		t5_trainer_round_2.ipynb
t5_trainer_round_3.ipynb		t5_trainer_round_3.ipynb
t5_trainer_round_4.ipynb		t5_trainer_round_4.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README: XSUM Training Process

Overview

Dataset

Model

Training Process

Round 1

Round 2

Round 3

Dataset

Round 4

Validation and testing

Model demo

Key Considerations

Final Output

Notes

Contact

About

Releases

Packages

Languages

bibowskii/LLM-T5-small-summarizer

Folders and files

Latest commit

History

Repository files navigation

README: XSUM Training Process

Overview

Dataset

Model

Training Process

Round 1

Round 2

Round 3

Dataset

Round 4

Validation and testing

Model demo

Key Considerations

Final Output

Notes

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages