GitHub - sfme/LLM-TaylorSwift-LyricsGenerator: "LLM (HF Transformers) for TS Lyric Generation"

Taylor Swift Lyric Generation using fine-tuned LLM (distil-GPT-2 for now)

Task: Lyrics Generation from Taylor Swift Albums (from Kaggle)

The original task can be seen in the Kaggle website, along with the dataset. See here.
We will take a Text Generation approach to the problem; using causal language modelling LLMs. Note that a Seq2Seq approach could have been used.
We will fine-tune a pre-existing model like GPT-2 or distil-GPT-2 using a NVIDIA T4 GPU.

Jupyter Notebooks (or Google Colab)

lyrics_taylor_swift_preprocess_data_and_finetune_LLM.ipynb used to run data-preprocessing.
lyrics_tayor_swift_generate_from_saved_LLM.ipynb used to fine-tune pre-trained LLM model (distil-GPT-2).
Exploratory_Analysis_Taylor_Swift_Albums_Lyrics.ipynb initial exploration on the dataset, and contains early finetuning of a model just for experimentation purposes.

Note: it is fine to focus on the first two notebooks, the last only included in case the reader needs to explore the dataset in more detail.

Python Files

The scripts should be used by first applying pre-processing (data_process.py) ; then dataset tokenization for use in causal language modelling (data_tokenization.py); then training or fine-tuning the pre-trained LLM on the taylor swift dataset.

data_process.py contains the data pre-processing needed and saves to a .json file
data_tokenization.py defines the tokenizer and huggingface datasets (train and eval) to be used
train_model.py defines the code (using huggingface transformers package) used to fine-tune the LLM on the datasets defined above.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
data_preprocess.py		data_preprocess.py
data_tokenization.py		data_tokenization.py
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Taylor Swift Lyric Generation using fine-tuned LLM (distil-GPT-2 for now)

Task: Lyrics Generation from Taylor Swift Albums (from Kaggle)

Jupyter Notebooks (or Google Colab)

Python Files

About

Releases

Packages

Languages

sfme/LLM-TaylorSwift-LyricsGenerator

Folders and files

Latest commit

History

Repository files navigation

Taylor Swift Lyric Generation using fine-tuned LLM (distil-GPT-2 for now)

Task: Lyrics Generation from Taylor Swift Albums (from Kaggle)

Jupyter Notebooks (or Google Colab)

Python Files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages