This document outlines the process of training a T5-small model for summarization using the XSUM dataset in three distinct rounds. Each round builds upon the output model from the previous round. Due to runtime restrictions, the training was conducted across multiple Google Colab accounts, requiring adjustments to model paths between rounds.
- Type: XSUM
- Purpose: Summarization task
- Characteristics: High-quality and concise summaries for articles
- Type:
T5-small
- Purpose: A lightweight and efficient model for text-to-text tasks, such as summarization, translation, Q&A, and text classification.
- Characteristics:
- Developed by Google as part of the T5 (Text-to-Text Transfer Transformer) framework.
- Pre-trained on the Colossal Clean Crawled Corpus (C4).
- Converts all NLP tasks into a text generation problem.
- Designed for environments with limited computational resources.
- Parameters:
- Total parameters: ~60 million
- Encoder layers: 6
- Decoder layers: 6
- Hidden size: 512
- Attention heads: 8
- Feedforward network size: 2048
- Input Model: Pre-trained T5 model
- Dataset: 'iohadrubin/mini_xsum'
- Output Model Path:
round1_output_model
(save this model for subsequent rounds) - Colab Account: Account 1
- Note: Ensure that the output model is downloaded and backed up for use in the next round.
- Input Model:
round1_output_model
- Dataset: 'woshityj/xsum_dataset'
- Output Model Path:
round2_output_model
(save this model for subsequent rounds) - Colab Account: Account 2
- Adjustments: Update the path to the
round1_output_model
to match the new Colab environment before starting training.
- Input Model:
round2_output_model
- Dataset: 'Kamaljp/xsum_3000'
- Output Model Path:
round3_output_model
(final trained model) - Colab Account: Account 3
- Adjustments: Update the path to the
round2_output_model
to match the new Colab environment before starting training.
- Type: CRAFT-Summarization
- Purpose: Summarization task
- Characteristics: High-quality long summaries for articles
- Input Model:
round3_output_model
- Dataset: 'ingoziegler/CRAFT-Summarization'
- Output Model Path:
round4_output_model
(final trained model) - Colab Account: Account 4
- Adjustments: Update the path to the
round3_output_model
to match the new Colab environment before starting training.
-Once the final model (round3_output_model) is trained, it can be validated and tested on new article data for summarization. Ensure the model performs well on unseen examples, and adjust parameters as necessary.
-After training the final model, you can use it to summarize new articles.
- Model Path Adjustments: As training spanned multiple Colab accounts, the output model paths from each round must be explicitly updated in the subsequent account.
- Example: Download
round1_output_model
from Account 1 and upload it to Account 2 before initiating Round 2.
- Example: Download
- Runtime Restrictions: Ensure sufficient runtime and GPU availability for each training session.
- Consistency: Verify that the dataset remains consistent across rounds to ensure uniformity in training.
- The final model (
round4_output_model
) is trained through three successive rounds of fine-tuning using the XSUM dataset. It can now be used for high-quality summarization tasks.
- Always verify that the correct model is loaded at the start of each round.
- Maintain proper naming conventions and version control for models to avoid confusion.
- For further fine-tuning or deployment, ensure the final model is accessible in a centralized location.
- A Extra training round(4) was added to ensure the model's ability to handle longer summaries
- All training could be done on a single account, I did it on 2 accounts, you can do it on one account with a subscription to colab or by waiting for available computing units from google colab
##Datasets and model
- The pretrained model and datasets used were from 'huggingface.co'.
For issues or questions related to this training process, please reach out to the respective Colab account holders.