inference-ai-course · christinezhaogmail · Jan 12, 2026 · Jan 12, 2026 · Jan 12, 2026 · Jan 12, 2026
diff --git a/Class 8 Homework.ipynb b/Class 8 Homework.ipynb
@@ -14,7 +14,7 @@
     "\n",
     "## Learning Objectives\n",
     "\n",
-    "* Generate abstractive summaries of academic documents using LLaMA 3 (7B).\n",
+    "* Generate abstractive summaries of academic documents using LLaMA 3 (8B) Instruct.\n",
     "* Collect two candidate summaries per paper and have annotators select the better summary.\n",
     "* Prepare the dataset of summary pairs and preference labels for reward model training.\n",
     "* Train a reward model (e.g., DeBERTa-v3) on the collected preference data.\n",
@@ -23,7 +23,7 @@
     "## Project Design\n",
     "\n",
     "* **Data Collection:** Select 10 academic papers (including both text and figures) from arXiv or recent NLP conference proceedings.\n",
-    "* **Summary Generation:** For each paper, use the LLaMA 3 (7B) model to generate *two* different summaries. Vary the prompting strategy or sampling parameters to produce diverse outputs.\n",
+    "* **Summary Generation:** For each paper, use the LLaMA 3 (8B) model to generate *two* different summaries. Vary the prompting strategy or sampling parameters to produce diverse outputs.\n",
     "* **Human Annotation:** Have one or two human annotators compare each pair of summaries for a paper and choose the better one (e.g. more informative, coherent, factually consistent, etc.). Record which summary is preferred.\n",
     "* **Data Formatting:** Create a dataset (e.g. in JSONL format) of summary pairs and preference labels. Each entry should include the two summary texts and which one was chosen (for example, fields `chosen` and `rejected` as required by reward modeling tools).\n",
     "* **Reward Model Training:** Fine-tune a reward model (such as DeBERTa-v3) on this preference data. Use the chosen/rejected summary pairs so the model learns to assign higher scores to the preferred summaries.\n",
@@ -143,7 +143,7 @@
     "* Install required Python libraries: `transformers`, `datasets`, `evaluate`, `trl` (Hugging Face TRL), and `accelerate`.\n",
     "* (Optional) Install `peft` if you want to use parameter-efficient fine-tuning for the reward model.\n",
     "* Ensure you have GPU access for model training (e.g., use Google Colab Pro, AWS, or a local GPU).\n",
-    "* Download or load the LLaMA 3 (7B) model checkpoint and a DeBERTa-v3 checkpoint (for example, via Hugging Face Hub).\n",
+    "* Download or load the LLaMA 3 (8B) model checkpoint and a DeBERTa-v3 checkpoint (for example, via Hugging Face Hub).\n",
     "\n",
     "## Deliverables\n",
     "\n",