Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions Class 8 Homework.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"\n",
"## Learning Objectives\n",
"\n",
"* Generate abstractive summaries of academic documents using LLaMA 3 (7B).\n",
"* Generate abstractive summaries of academic documents using LLaMA 3 (8B) Instruct.\n",
"* Collect two candidate summaries per paper and have annotators select the better summary.\n",
"* Prepare the dataset of summary pairs and preference labels for reward model training.\n",
"* Train a reward model (e.g., DeBERTa-v3) on the collected preference data.\n",
Expand All @@ -23,7 +23,7 @@
"## Project Design\n",
"\n",
"* **Data Collection:** Select 10 academic papers (including both text and figures) from arXiv or recent NLP conference proceedings.\n",
"* **Summary Generation:** For each paper, use the LLaMA 3 (7B) model to generate *two* different summaries. Vary the prompting strategy or sampling parameters to produce diverse outputs.\n",
"* **Summary Generation:** For each paper, use the LLaMA 3 (8B) model to generate *two* different summaries. Vary the prompting strategy or sampling parameters to produce diverse outputs.\n",
"* **Human Annotation:** Have one or two human annotators compare each pair of summaries for a paper and choose the better one (e.g. more informative, coherent, factually consistent, etc.). Record which summary is preferred.\n",
"* **Data Formatting:** Create a dataset (e.g. in JSONL format) of summary pairs and preference labels. Each entry should include the two summary texts and which one was chosen (for example, fields `chosen` and `rejected` as required by reward modeling tools).\n",
"* **Reward Model Training:** Fine-tune a reward model (such as DeBERTa-v3) on this preference data. Use the chosen/rejected summary pairs so the model learns to assign higher scores to the preferred summaries.\n",
Expand Down Expand Up @@ -143,7 +143,7 @@
"* Install required Python libraries: `transformers`, `datasets`, `evaluate`, `trl` (Hugging Face TRL), and `accelerate`.\n",
"* (Optional) Install `peft` if you want to use parameter-efficient fine-tuning for the reward model.\n",
"* Ensure you have GPU access for model training (e.g., use Google Colab Pro, AWS, or a local GPU).\n",
"* Download or load the LLaMA 3 (7B) model checkpoint and a DeBERTa-v3 checkpoint (for example, via Hugging Face Hub).\n",
"* Download or load the LLaMA 3 (8B) model checkpoint and a DeBERTa-v3 checkpoint (for example, via Hugging Face Hub).\n",
"\n",
"## Deliverables\n",
"\n",
Expand Down
Loading