This repository contains the fourth assignment for the Applied Natural Language Processing course.
The assignment focuses on fine-tuning pre-trained language models (RoBERTa, BART) for the ComVE shared task from SemEval-2020.
The work is divided into three subtasks, each implemented in a separate Jupyter Notebook. Graduate students are required to complete all three subtasks.
The main goals of this assignment are to:
- Combine and pre-process input texts for text matching and multiple-choice problems.
- Fine-tune pre-trained language models for classification and sequence-to-sequence tasks.
- Evaluate model performance with accuracy, BLEU, and ROUGE metrics.
- Gain hands-on experience with Hugging Face Transformers, Datasets, and Trainer API.
Graduate students additionally complete SubTask C, implementing an end-to-end sequence-to-sequence solution.
.
├── Pretrained_LM_Subtask_A.ipynb   # SubTask A: Text Matching (nonsensical statement detection)
├── Pretrained_LM_Subtask_B.ipynb   # SubTask B: Multiple Choice (reason classification)
├── Pretrained_LM_Subtask_C.ipynb   # SubTask C: Seq2Seq (reason generation) – graduate task
├── Assessment 4_ requirements.txt  # Dependencies for Linux/Windows
├── Assignment 4_ requirements-macos.txt  # Dependencies for macOS
├── .gitignore
└── README.md
python3 -m venv venv
source venv/bin/activate   # Linux/macOS
venv\Scripts\activate      # WindowsFor Linux/Windows:
pip install -r "Assessment 4_ requirements.txt"For macOS:
pip install -r "Assignment 4_ requirements-macos.txt"The assignment uses the SemEval-2020 Task 4 Commonsense Validation and Explanation (ComVE) dataset.
- 
Download/unzip the dataset: unzip ALL\ data.zip -d SemEval2020-Task4-DataThis will create a folder SemEval2020-Task4-Data/containing:- Training Data (subtaskA_data_all.csv,subtaskB_data_all.csv,subtaskC_data_all.csv, etc.)
- Development Data
- Test Data
- Gold answers
 
- Training Data (
- 
Verify structure: SemEval2020-Task4-Data/ ├── ALL data/ │ ├── Training Data/ │ ├── Dev Data/ │ └── Test Data/
Each notebook can be run independently:
- 
SubTask A ( Pretrained_LM_Subtask_A.ipynb): Text matching using RoBERTa viaAutoModelForSequenceClassification. Task: Given two similar statements, identify the nonsensical one. Evaluation: Accuracy
- 
SubTask B ( Pretrained_LM_Subtask_B.ipynb): Multiple-choice classification using RoBERTa via viaAutoModelForMultipleChoice. Task: Given a nonsensical statement and three candidate reasons, classify which reason explains the statement. Evaluation: Accuracy
- 
SubTask C ( Pretrained_LM_Subtask_C.ipynb): Sequence-to-sequence explanation generation using BART viaAutoModelForSeq2SeqLM. Task: Given a nonsensical statement, generate a valid explanation. Evaluation: BLEU and ROUGE scores
For testing and grading:
pytest test.py          # Undergraduate tasks (A and B)
pytest test_grads.py    # Graduate task (C)- SubTask A: Accuracy (expected ~0.49 on reduced dataset, ~0.93 on full training).
- SubTask B: Accuracy (expected ~0.51 reduced, ~0.93 full).
- SubTask C: BLEU and ROUGE metrics (expected BLEU ~0.22, ROUGE ~0.46).
- 
For local training, set: shrink_dataset = True base_model = True colab = False 
- 
For full-scale experiments (recommended), use Google Colab with GPU/TPU: shrink_dataset = False base_model = False colab = True 
- 
Set shrink_dataset = Truefor quick debugging,Falsefor full training.
- 
GPU/TPU (e.g., Google Colab) is recommended for full fine-tuning runs. 
- 
ALL data.zipis ignored from version control for space and licensing reasons; see Dataset Setup above.