diff --git a/exercise.ipynb b/exercise.ipynb
index 825565c..f05295e 100644
--- a/exercise.ipynb
+++ b/exercise.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "32e9a3ff",
+   "id": "f4c5998d",
    "metadata": {},
    "source": [
     "# Exercise 7: Failure Modes And Limits of Deep Learning"
@@ -10,20 +10,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "14f7e5c5",
+   "id": "400de6a4",
    "metadata": {},
    "source": [
-    "In the following exercise, we explore the failure modes and limits of neural networks. \n",
-    "Neural networks are powerful, but it is important to understand their limits and the predictable reasons that they fail. \n",
+    "In the following exercise, we explore the failure modes and limits of neural networks.\n",
+    "Neural networks are powerful, but it is important to understand their limits and the predictable reasons that they fail.\n",
     "These exercises illustrate how the content of datasets, especially differences between the training and inference/test datasets, can affect the network's output in unexpected ways.\n",
     "<br></br>\n",
-    "While neural networks are generally less interpretable than other types of machine learning, it is still important to investigate the \"internal reasoning\" of the network as much as possible to discover failure modes, or situations in which the network does not perform well. \n",
-    "This exercise introduces a tool called Integrated Gradients that helps us makes sense of the network \"attention\". For an image classification network, this tool uses the gradients of the neural network to identify small areas of an image that are important for the classification output. "
+    "While neural networks are generally less interpretable than other types of machine learning, it is still important to investigate the \"internal reasoning\" of the network as much as possible to discover failure modes, or situations in which the network does not perform well.\n",
+    "This exercise introduces a tool called Integrated Gradients that helps us makes sense of the network \"attention\". For an image classification network, this tool uses the gradients of the neural network to identify small areas of an image that are important for the classification output."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b3f01058",
+   "id": "3baf2a90",
    "metadata": {},
    "source": [
     "\n",
@@ -44,7 +44,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6e00d0a5",
+   "id": "fac88ce5",
    "metadata": {},
    "source": [
     "### Acknowledgements\n",
@@ -53,7 +53,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8e22caa6",
+   "id": "12f7ca06",
    "metadata": {},
    "source": [
     "### Data Loading\n",
@@ -61,13 +61,13 @@
     "The following will load the MNIST dataset, which already comes split into a training and testing dataset.\n",
     "The MNIST dataset contains images of handwritten digits 0-9.\n",
     "This data was already downloaded in the setup script.\n",
-    "Documentation for this pytorch dataset is available at https://pytorch.org/vision/main/generated/torchvision.datasets.MNIST.html "
+    "Documentation for this pytorch dataset is available at https://pytorch.org/vision/main/generated/torchvision.datasets.MNIST.html"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "78859698",
+   "id": "2eaac11e",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -90,7 +90,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3a919a2b",
+   "id": "0fa59082",
    "metadata": {},
    "source": [
     "### Part 1: Preparation of a Tainted Dataset\n",
@@ -101,11 +101,11 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "d97dd503",
+   "id": "44f8cc97",
    "metadata": {},
    "outputs": [],
    "source": [
-    "#Imports:\n",
+    "# Imports:\n",
     "import torch\n",
     "import numpy\n",
     "from scipy.ndimage import convolve\n",
@@ -115,7 +115,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "6b7f185d",
+   "id": "fa4dbba7",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -126,18 +126,18 @@
   },
   {
    "cell_type": "markdown",
-   "id": "809f06b3",
+   "id": "3afbd53b",
    "metadata": {},
    "source": [
     "## Part 1.1: Local Corruption of Data\n",
     "\n",
-    "First we will add a white pixel in the bottom right of all images of 7's, and visualize the results. This is an example of a local change to the images, where only a small portion of the image is corruped."
+    "First we will add a white pixel in the bottom right of all images of 7's, and visualize the results. This is an example of a local change to the images, where only a small portion of the image is corrupted."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ab84ac2a",
+   "id": "45d6aa77",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -149,7 +149,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a6ac15eb",
+   "id": "60351b4b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -172,7 +172,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "317b7750",
+   "id": "f2e929ca",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -183,7 +183,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9d9e9d29",
+   "id": "2dbcf4b2",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -194,17 +194,17 @@
   },
   {
    "cell_type": "markdown",
-   "id": "45f694dd",
+   "id": "39ce6b99",
    "metadata": {},
    "source": [
-    "## Part 1.2: Global Corrution of data\n",
+    "## Part 1.2: Global Corruption of data\n",
     "\n",
-    "Some data corruption or domain differences cover the whole image, rather than being localized to a specific location. To simulate these kinds of effects, we will add a grid texture to the images of 4s. "
+    "Some data corruption or domain differences cover the whole image, rather than being localized to a specific location. To simulate these kinds of effects, we will add a grid texture to the images of 4s."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "393fabd3",
+   "id": "45f7b920",
    "metadata": {},
    "source": [
     "You may have noticed that the images are stored as arrays of integers. First we cast them to float to be able to add textures easily without integer wrapping issues."
@@ -213,7 +213,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2d42797d",
+   "id": "20be6faf",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -224,7 +224,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4bc5dbbf",
+   "id": "698581c8",
    "metadata": {},
    "source": [
     "Then we create the grid texture and visualize it."
@@ -233,7 +233,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a8316e01",
+   "id": "69f364a2",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -249,7 +249,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b1813a88",
+   "id": "a2e35eaf",
    "metadata": {},
    "source": [
     "Next we add the texture to all 4s in the train and test set."
@@ -258,7 +258,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "0da537d9",
+   "id": "e773f840",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -269,17 +269,17 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9d52525b",
+   "id": "d8c22dfb",
    "metadata": {},
    "source": [
-    "After adding the texture, we have to make sure the values are between 0 and 255 and then cast back to uint8. \n",
+    "After adding the texture, we have to make sure the values are between 0 and 255 and then cast back to uint8.\n",
     "Then we visualize a couple 4s from the dataset to see if the grid texture has been added properly."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "3da13396",
+   "id": "20d299d2",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -295,7 +295,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "9a574027",
+   "id": "6c9fc998",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -317,7 +317,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b5f9669e",
+   "id": "ae4eef7e",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -328,7 +328,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "49f29cfa",
+   "id": "a90db194",
    "metadata": {},
    "source": [
     "\n",
@@ -340,7 +340,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5cc8d289",
+   "id": "1f6d7182",
    "metadata": {},
    "source": [
     "\n",
@@ -353,7 +353,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f6344ee0",
+   "id": "613e4cd4",
    "metadata": {},
    "source": [
     "\n",
@@ -363,7 +363,7 @@
     "    <ol>\n",
     "        <li> Consider a dataset with white dots on images of all digits: let's call it the <b>all-dots</b> data. How different is this from the original dataset? Are the classes more or less distinct from each other? </li>\n",
     "        <li> How do you think a digit classifier trained on <b>all-dots</b> data and tested on <b>all-dots</b> data would perform? </li>\n",
-    "        <li> Now consider the analagous <b>all-grid</b> data with the grid pattern added to all images. Are the classes more or less distinct from each other? Would a digit classifier trained on <b>all-grid</b> converge?</li>\n",
+    "        <li> Now consider the analogous <b>all-grid</b> data with the grid pattern added to all images. Are the classes more or less distinct from each other? Would a digit classifier trained on <b>all-grid</b> converge?</li>\n",
     "    </ol>\n",
     "If you want to test your hypotheses, you can create these all-dots and all-grid train and test datasets and use them for training in bonus questions of the following section.\n",
     "</div>"
@@ -371,7 +371,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9ed6712d",
+   "id": "2fb5fede",
    "metadata": {},
    "source": [
     "### Part 2: Create and Train an Image Classification Neural Network on Clean and Tainted Data\n",
@@ -382,7 +382,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "8f50e627",
+   "id": "cdcd46d5",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -396,7 +396,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e4ff9d38",
+   "id": "c2cf6bfa",
    "metadata": {},
    "source": [
     "Now we will train the neural network. A training function is provided below - this should be familiar, but make sure you look it over and understand what is happening in the training loop."
@@ -405,7 +405,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "09880627",
+   "id": "98eddd14",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -417,8 +417,8 @@
     "    pbar = tqdm(total=len(tainted_train_dataset)//batch_size)\n",
     "    for batch_idx, (raw, target) in enumerate(train_loader):\n",
     "        optimizer.zero_grad()\n",
-    "        raw = raw.cuda()\n",
-    "        target = target.cuda()\n",
+    "        raw = raw.to(device)\n",
+    "        target = target.to(device)\n",
     "        output = model(raw)\n",
     "        loss = criterion(output, target)\n",
     "        loss.backward()\n",
@@ -430,7 +430,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "608d3b8d",
+   "id": "af0edd25",
    "metadata": {},
    "source": [
     "We have to choose hyperparameters for our model. We have selected to train for two epochs, with a batch size of 64 for training and 1000 for testing. We are using the cross entropy loss, a standard multi-class classification loss."
@@ -439,7 +439,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "fb663954",
+   "id": "3deddbd3",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -458,7 +458,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "60e94694",
+   "id": "4ed7aa39",
    "metadata": {},
    "source": [
     "Next we initialize a clean model, and a tainted model. We want to have reproducible results, so we set the initial weights with a specific random seed. The seed number does not matter, just that it is the same!"
@@ -467,7 +467,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "555e5d3e",
+   "id": "43b0197b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -483,7 +483,7 @@
     "    if isinstance(m, (nn.Linear, nn.Conv2d)):\n",
     "        torch.nn.init.xavier_uniform_(m.weight, )\n",
     "        m.bias.data.fill_(0.01)\n",
-    "   \n",
+    "\n",
     "# Fixing seed with magical number and setting weights:\n",
     "torch.random.manual_seed(42)\n",
     "model_clean.apply(init_weights)\n",
@@ -495,7 +495,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0802649b",
+   "id": "1ae683d2",
    "metadata": {},
    "source": [
     "Next we initialize the clean and tainted dataloaders, again with a specific random seed for reproducibility."
@@ -504,7 +504,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "de2b11db",
+   "id": "081f197c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -518,7 +518,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ba79624d",
+   "id": "597135be",
    "metadata": {},
    "source": [
     "Now it is time to train the neural networks! We are storing the training loss history for each model so we can visualize it later."
@@ -527,7 +527,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "534bcda6",
+   "id": "e32e286d",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -560,7 +560,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1604e50b",
+   "id": "75895920",
    "metadata": {},
    "source": [
     "Now we visualize the loss history for the clean and tainted models."
@@ -569,7 +569,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "d655cc98",
+   "id": "7006b624",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -584,7 +584,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5aee0f7b",
+   "id": "4467a232",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -595,7 +595,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "17fb3a1e",
+   "id": "e6853659",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -606,7 +606,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f89eb1e4",
+   "id": "786976e5",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -617,7 +617,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "dbaa3b78",
+   "id": "b151cf85",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-success\"><h3>\n",
@@ -629,7 +629,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e9e0f3ce",
+   "id": "046f2d98",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-block alert-warning\"><h3>\n",
@@ -644,20 +644,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1751b788",
+   "id": "efb66b32",
    "metadata": {},
    "source": [
     "### Part 3: Examining the Results of the Clean and Tainted Networks\n",
     "\n",
     "Now that we have initialized our clean and tainted datasets and trained our models on them, it is time to examine how these models perform on the clean and tainted test sets!\n",
     "\n",
-    "We provide a `predict` function below that will return the prediction and ground truth labels given a particualr model and dataset."
+    "We provide a `predict` function below that will return the prediction and ground truth labels given a particular model and dataset."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c0853f13",
+   "id": "67a73dc1",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -669,17 +669,17 @@
     "    dataset_groundtruth = []\n",
     "    with torch.no_grad():\n",
     "        for x, y_true in dataset:\n",
-    "            inp = x[None].cuda()\n",
+    "            inp = x[None].to(device)\n",
     "            y_pred = model(inp)\n",
     "            dataset_prediction.append(y_pred.argmax().cpu().numpy())\n",
     "            dataset_groundtruth.append(y_true)\n",
-    "    \n",
+    "\n",
     "    return np.array(dataset_prediction), np.array(dataset_groundtruth)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "71a2f9cf",
+   "id": "eaa7a921",
    "metadata": {},
    "source": [
     "Now we call the predict method with the clean and tainted models on the clean and tainted datasets."
@@ -688,7 +688,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ff3be5a7",
+   "id": "92257da3",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -700,16 +700,16 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a5f3fbc9",
+   "id": "b7426171",
    "metadata": {},
    "source": [
-    "We can investivate the results using the confusion matrix, which you should recall from the Introduction to Machine Learning exercise. The function in the cell below will create a nicely annotated confusion matrix."
+    "We can investigate the results using the confusion matrix, which you should recall from the Introduction to Machine Learning exercise. The function in the cell below will create a nicely annotated confusion matrix."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "5f0e804c",
+   "id": "994b40c0",
    "metadata": {
     "lines_to_next_cell": 1
    },
@@ -718,8 +718,8 @@
     "from sklearn.metrics import confusion_matrix\n",
     "import seaborn as sns\n",
     "import pandas as pd\n",
-    "# Plot confusion matrix \n",
-    "# orginally from Runqi Yang; \n",
+    "# Plot confusion matrix\n",
+    "# originally from Runqi Yang;\n",
     "# see https://gist.github.com/hitvoice/36cf44689065ca9b927431546381a3f7\n",
     "def cm_analysis(y_true, y_pred, title, figsize=(10,10)):\n",
     "    \"\"\"\n",
@@ -754,17 +754,17 @@
     "                annot[i, j] = ''\n",
     "            else:\n",
     "                annot[i, j] = '%.1f%%\\n%d' % (p, c)\n",
-    "    cm = pd.DataFrame(cm, index=labels, columns=labels)\n",
+    "    cm = pd.DataFrame(cm_perc, index=labels, columns=labels)\n",
     "    cm.index.name = 'Actual'\n",
     "    cm.columns.name = 'Predicted'\n",
     "    fig, ax = plt.subplots(figsize=figsize)\n",
-    "    ax=sns.heatmap(cm, annot=annot, fmt='', vmax=30)\n",
+    "    ax = sns.heatmap(cm, annot=annot, fmt=\"\", vmax=100)\n",
     "    ax.set_title(title)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "869d3190",
+   "id": "a1321ccd",
    "metadata": {},
    "source": [
     "Now we will generate confusion matrices for each model/data combination. Take your time and try and interpret these, and then try and answer the questions below."
@@ -773,7 +773,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2efb0286",
+   "id": "348d2b4d",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -785,7 +785,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5c8983a8",
+   "id": "19651455",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -796,7 +796,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "93c1483f",
+   "id": "651dfee3",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -807,7 +807,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "319465a3",
+   "id": "6d40345f",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -818,7 +818,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8467009a",
+   "id": "e17b0677",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -829,7 +829,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1b2061b7",
+   "id": "04ca2bfa",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-success\"><h3>\n",
@@ -841,7 +841,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6ef0ce00",
+   "id": "8cf32682",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-block alert-warning\"><h3>\n",
@@ -856,7 +856,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d961c467",
+   "id": "afbe6a03",
    "metadata": {},
    "source": [
     "### Part 4: Interpretation with Integrated Gradients\n",
@@ -865,7 +865,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5a5aa094",
+   "id": "b290da92",
    "metadata": {},
    "source": [
     "\n",
@@ -875,7 +875,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "5aa90a8b",
+   "id": "896bdba0",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -908,7 +908,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6b73c86e",
+   "id": "b549928d",
    "metadata": {},
    "source": [
     "Next we provide a function to visualize the output of integrated gradients, using the function above to actually run the algorithm."
@@ -917,7 +917,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "66a0588b",
+   "id": "8827a868",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -928,11 +928,11 @@
     "\n",
     "    # Transpose integrated gradients output\n",
     "    attr_ig = np.transpose(attr_ig[0].cpu().detach().numpy(), (1, 2, 0))\n",
-    "    \n",
+    "\n",
     "    # Transpose and normalize original image:\n",
     "    original_image = np.transpose((test_input[0].detach().numpy() * 0.5) + 0.5, (1, 2, 0))\n",
     "\n",
-    "     # This visualises the attribution of labels to pixels\n",
+    "    # This visualises the attribution of labels to pixels\n",
     "    figure, axis = plt.subplots(nrows=1, ncols=2, figsize=(4, 2.5), width_ratios=[1, 1])\n",
     "    viz.visualize_image_attr(attr_ig, \n",
     "                             original_image, \n",
@@ -956,10 +956,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a637525d",
+   "id": "f39ba38b",
    "metadata": {},
    "source": [
-    "To start examining the results, we will call the `visualize_integrated_gradients` with the tainted and clean models on the tainted and clean sevens. \n",
+    "To start examining the results, we will call the `visualize_integrated_gradients` with the tainted and clean models on the tainted and clean sevens.\n",
     "\n",
     "The visualization will show the original image plus an overlaid attribution map that generally signifies the importance of each pixel, plus the attribution map only. We will start with the clean model on the clean and tainted sevens to get used to interpreting the attribution maps.\n"
    ]
@@ -967,7 +967,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a06d0634",
+   "id": "eadea48c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -977,18 +977,18 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0822d5ff",
+   "id": "b5599149",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
-    "    Task 4.1: Interpereting the Clean Model's Attention on 7s</h4>\n",
+    "    Task 4.1: Interpreting the Clean Model's Attention on 7s</h4>\n",
     "Where did the <b>clean</b> model focus its attention for the clean and tainted 7s? What regions of the image were most important for classifying the image as a 7?\n",
     "</div>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cee392ec",
+   "id": "09cd4b31",
    "metadata": {},
    "source": [
     "Now let's look at the attention of the tainted model!"
@@ -997,7 +997,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "62929d9e",
+   "id": "004c2744",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1007,18 +1007,18 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e49b5678",
+   "id": "10f6e82a",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
-    "    Task 4.2: Interpereting the Tainted Model's Attention on 7s</h4>\n",
+    "    Task 4.2: Interpreting the Tainted Model's Attention on 7s</h4>\n",
     "Where did the <b>tainted</b> model focus its attention for the clean and tainted 7s? How was this different than the clean model? Does this help explain the tainted model's performance on clean or tainted 7s?\n",
     "</div>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f33a3636",
+   "id": "5f1a65c7",
    "metadata": {},
    "source": [
     "Now let's look at the regions of the image that Integrated Gradients highlights as important for classifying fours in the clean and tainted models."
@@ -1027,7 +1027,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "98f09f64",
+   "id": "c20db2dc",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1039,18 +1039,18 @@
   },
   {
    "cell_type": "markdown",
-   "id": "059654d6",
+   "id": "db17eead",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
-    "    Task 4.3: Interpereting the focus on 4s</h4>\n",
+    "    Task 4.3: Interpreting the focus on 4s</h4>\n",
     "Where did the <b>tainted</b> model focus its attention for the tainted and clean 4s? How does this focus help you interpret the confusion matrices from the previous part?\n",
     "</div>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a859b818",
+   "id": "30a9b553",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -1061,20 +1061,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9e419f62",
+   "id": "335772f7",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-block alert-success\"><h3>\n",
     "    Checkpoint 4</h3>\n",
     "    <ol>\n",
-    "        Congrats on finishing the intergrated gradients task! Let us know on Element that you reached checkpoint 4, and feel free to look at other interpretability methods in the Captum library if you're interested.\n",
+    "        Congrats on finishing the integrated gradients task! Let us know on the course chat that you reached checkpoint 4, and feel free to look at other interpretability methods in the Captum library if you're interested.\n",
     "    </ol>\n",
     "</div>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6b2959a2",
+   "id": "8af404b4",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-block alert-warning\"><h3>\n",
@@ -1088,7 +1088,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "76193ea5",
+   "id": "9295ffc7",
    "metadata": {},
    "source": [
     "## Part 5: Importance of using the right training data\n",
@@ -1100,7 +1100,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "afdb487c",
+   "id": "7cbf3b1a",
    "metadata": {},
    "source": [
     "First, we will write a function to add noise to the MNIST dataset, so that we can train a model to denoise it."
@@ -1109,7 +1109,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "66dd076a",
+   "id": "1a3769ac",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1122,7 +1122,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "98cb3bb8",
+   "id": "3a3a8139",
    "metadata": {},
    "source": [
     "Next we will visualize a couple MNIST examples with and without noise."
@@ -1131,7 +1131,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "63f337f7",
+   "id": "36f20530",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1160,17 +1160,17 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8853090e",
+   "id": "8622949e",
    "metadata": {},
    "source": [
     "### UNet model\n",
     "\n",
-    "Let's try denoising with a UNet, \"CARE-style\". As UNets and denoising implementations are not the focus of this exercise, we provide the model for you in the following cell. "
+    "Let's try denoising with a UNet, \"CARE-style\". As UNets and denoising implementations are not the focus of this exercise, we provide the model for you in the following cell."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bfba19a4",
+   "id": "9ab55c00",
    "metadata": {},
    "source": [
     "The training loop code is also provided here. It is similar to the code used to train the image classification model previously, but look it over to make sure there are no surprises."
@@ -1179,47 +1179,47 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "34de1247",
+   "id": "66bd1d56",
    "metadata": {},
    "outputs": [],
    "source": [
     "from tqdm import tqdm\n",
     "\n",
     "def train_denoising_model(train_loader, model, criterion, optimizer, history):\n",
-    "    \n",
+    "\n",
     "    # Puts model in 'training' mode:\n",
     "    model.train()\n",
-    "    \n",
+    "\n",
     "    # Initialises progress bar:\n",
     "    pbar = tqdm(total=len(train_loader.dataset)//batch_size_train)\n",
     "    for batch_idx, (image, target) in enumerate(train_loader):\n",
     "\n",
     "        # add line here during Task 2.2\n",
-    "        \n",
+    "\n",
     "        # Zeroing gradients:\n",
     "        optimizer.zero_grad()\n",
-    "        \n",
+    "\n",
     "        # Moves image to GPU memory:\n",
-    "        image = image.cuda()\n",
-    "        \n",
+    "        image = image.to(device)\n",
+    "\n",
     "        # Adds noise to make the noisy image:\n",
     "        noisy = add_noise(image)\n",
-    "        \n",
+    "\n",
     "        # Runs model on noisy image:\n",
     "        output = model(noisy)\n",
-    "        \n",
+    "\n",
     "        # Computes loss:\n",
     "        loss = criterion(output, image)\n",
-    "        \n",
+    "\n",
     "        # Backpropagates gradients:\n",
     "        loss.backward()\n",
-    "        \n",
+    "\n",
     "        # Optimises model parameters given the current gradients:\n",
     "        optimizer.step()\n",
-    "        \n",
+    "\n",
     "        # appends loss history:\n",
     "        history[\"loss\"].append(loss.item())\n",
-    "        \n",
+    "\n",
     "        # updates progress bar:\n",
     "        pbar.update(1)\n",
     "    return history"
@@ -1227,7 +1227,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "86f458bb",
+   "id": "6d20945b",
    "metadata": {},
    "source": [
     "Here we choose hyperparameters and initialize the model and data loaders."
@@ -1236,7 +1236,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a69275b4",
+   "id": "827d2f32",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1274,7 +1274,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2cfec16b",
+   "id": "3a0153a5",
    "metadata": {},
    "source": [
     "Finally, we run the training loop!"
@@ -1283,7 +1283,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "9b3103df",
+   "id": "716b936f",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1294,7 +1294,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a8ddab2b",
+   "id": "b24bdfbd",
    "metadata": {},
    "source": [
     "As before, we will visualize the training loss. If all went correctly, it should decrease from around 1.0 to less than 0.2."
@@ -1303,7 +1303,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "b749e2d3",
+   "id": "bc71bff7",
    "metadata": {
     "lines_to_next_cell": 1
    },
@@ -1319,7 +1319,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "33b18ff9",
+   "id": "2b474711",
    "metadata": {},
    "source": [
     "### Check denoising performance\n",
@@ -1330,7 +1330,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "7fb1ba9d",
+   "id": "e1d20e0b",
    "metadata": {
     "lines_to_next_cell": 1
    },
@@ -1339,7 +1339,7 @@
     "def apply_denoising(image, model):\n",
     "    # add batch and channel dimensions\n",
     "    image = torch.unsqueeze(torch.unsqueeze(image, 0), 0)\n",
-    "    prediction = model(image.cuda())\n",
+    "    prediction = model(image.to(device))\n",
     "    # remove batch and channel dimensions before returning\n",
     "    return prediction.detach().cpu()[0,0]"
    ]
@@ -1347,7 +1347,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "6d0b2276",
+   "id": "4b77f687",
    "metadata": {
     "lines_to_next_cell": 1
    },
@@ -1373,7 +1373,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d64d4b73",
+   "id": "b5eb2c28",
    "metadata": {},
    "source": [
     "We pick 8 images to show:"
@@ -1382,7 +1382,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "d201b55f",
+   "id": "3a1d22bd",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1392,7 +1392,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "216613e6",
+   "id": "29912374",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -1403,17 +1403,17 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1e8bbf40",
+   "id": "8a598bb3",
    "metadata": {},
    "source": [
-    "### Apply trained model on 'wrong' data \n",
+    "### Apply trained model on 'wrong' data\n",
     "\n",
     "Apply the denoising model trained above to some example _noisy_ images derived from the Fashion-MNIST dataset.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ec89d5cf",
+   "id": "4b63fc64",
    "metadata": {},
    "source": [
     "### Load the Fashion MNIST dataset\n",
@@ -1424,7 +1424,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "e006bc77",
+   "id": "d03b2297",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1445,7 +1445,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d20560de",
+   "id": "31d01ee1",
    "metadata": {},
    "source": [
     "Next we apply the denoising model we trained on the MNIST data to FashionMNIST, and visualize the results."
@@ -1454,7 +1454,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2c0ffe7c",
+   "id": "aab3b99c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1464,7 +1464,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e0bc45a6",
+   "id": "e12f3a1d",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -1475,7 +1475,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6d61dfab",
+   "id": "3c296abe",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -1486,7 +1486,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e6483d98",
+   "id": "749d2d87",
    "metadata": {},
    "source": [
     "### Train the denoiser on both MNIST and FashionMNIST\n",
@@ -1497,7 +1497,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "09e48578",
+   "id": "e52a2f68",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1534,7 +1534,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2f080bc7",
+   "id": "76324612",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1545,7 +1545,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "361df7de",
+   "id": "1544565a",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1555,19 +1555,29 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f88adf9e",
+   "id": "d2646697",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
     "    Task 5.4: </h4>\n",
-    "How does the new denoiser perform compared to the one from the previous section?\n",
+    "How does the new denoiser perform compared to the one from the previous section? Why?\n",
     "</div>"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "4f02d520",
+   "metadata": {},
+   "source": [
+    "### Train the denoiser on both MNIST and FashionMNIST, shuffling the training data\n",
+    "\n",
+    "We previously performed the training sequentially on the MNIST data first then followed by the FashionMNIST data. Now, we ask for the training data to be shuffled and observe the impact on performance. (noe the `shuffle=True` in the lines below)"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ef8f51df",
+   "id": "fb070c5c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1604,7 +1614,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "34473ef0",
+   "id": "2cfefa77",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1615,7 +1625,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "65bffa85",
+   "id": "56718c41",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1625,7 +1635,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "388c8c72",
+   "id": "df6234dd",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -1636,21 +1646,21 @@
   },
   {
    "cell_type": "markdown",
-   "id": "52244cd5",
+   "id": "dbe9b728",
    "metadata": {},
    "source": [
     "\n",
     "<div class=\"alert alert-block alert-success\"><h3>\n",
     "    Checkpoint 5</h3>\n",
     "    <ol>\n",
-    "        Congrats on reaching the final checkpoint! Let us know on Element, and we'll discuss the questions once reaching critical mass.\n",
+    "        Congrats on reaching the final checkpoint! Let us know on the course chat, and we'll discuss the questions once reaching critical mass.\n",
     "    </ol>\n",
     "</div>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3af95611",
+   "id": "b69ac817",
    "metadata": {},
    "source": [
     "\n",
@@ -1664,7 +1674,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c0afb23d",
+   "id": "b682aed4",
    "metadata": {},
    "source": []
   }
diff --git a/solution.ipynb b/solution.ipynb
index a2ce143..11e9d64 100644
--- a/solution.ipynb
+++ b/solution.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "32e9a3ff",
+   "id": "f4c5998d",
    "metadata": {},
    "source": [
     "# Exercise 7: Failure Modes And Limits of Deep Learning"
@@ -10,20 +10,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "14f7e5c5",
+   "id": "400de6a4",
    "metadata": {},
    "source": [
-    "In the following exercise, we explore the failure modes and limits of neural networks. \n",
-    "Neural networks are powerful, but it is important to understand their limits and the predictable reasons that they fail. \n",
+    "In the following exercise, we explore the failure modes and limits of neural networks.\n",
+    "Neural networks are powerful, but it is important to understand their limits and the predictable reasons that they fail.\n",
     "These exercises illustrate how the content of datasets, especially differences between the training and inference/test datasets, can affect the network's output in unexpected ways.\n",
     "<br></br>\n",
-    "While neural networks are generally less interpretable than other types of machine learning, it is still important to investigate the \"internal reasoning\" of the network as much as possible to discover failure modes, or situations in which the network does not perform well. \n",
-    "This exercise introduces a tool called Integrated Gradients that helps us makes sense of the network \"attention\". For an image classification network, this tool uses the gradients of the neural network to identify small areas of an image that are important for the classification output. "
+    "While neural networks are generally less interpretable than other types of machine learning, it is still important to investigate the \"internal reasoning\" of the network as much as possible to discover failure modes, or situations in which the network does not perform well.\n",
+    "This exercise introduces a tool called Integrated Gradients that helps us makes sense of the network \"attention\". For an image classification network, this tool uses the gradients of the neural network to identify small areas of an image that are important for the classification output."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b3f01058",
+   "id": "3baf2a90",
    "metadata": {},
    "source": [
     "\n",
@@ -44,7 +44,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6e00d0a5",
+   "id": "fac88ce5",
    "metadata": {},
    "source": [
     "### Acknowledgements\n",
@@ -53,7 +53,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8e22caa6",
+   "id": "12f7ca06",
    "metadata": {},
    "source": [
     "### Data Loading\n",
@@ -61,13 +61,13 @@
     "The following will load the MNIST dataset, which already comes split into a training and testing dataset.\n",
     "The MNIST dataset contains images of handwritten digits 0-9.\n",
     "This data was already downloaded in the setup script.\n",
-    "Documentation for this pytorch dataset is available at https://pytorch.org/vision/main/generated/torchvision.datasets.MNIST.html "
+    "Documentation for this pytorch dataset is available at https://pytorch.org/vision/main/generated/torchvision.datasets.MNIST.html"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "78859698",
+   "id": "2eaac11e",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -90,7 +90,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3a919a2b",
+   "id": "0fa59082",
    "metadata": {},
    "source": [
     "### Part 1: Preparation of a Tainted Dataset\n",
@@ -101,11 +101,11 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "d97dd503",
+   "id": "44f8cc97",
    "metadata": {},
    "outputs": [],
    "source": [
-    "#Imports:\n",
+    "# Imports:\n",
     "import torch\n",
     "import numpy\n",
     "from scipy.ndimage import convolve\n",
@@ -115,7 +115,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "6b7f185d",
+   "id": "fa4dbba7",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -126,18 +126,18 @@
   },
   {
    "cell_type": "markdown",
-   "id": "809f06b3",
+   "id": "3afbd53b",
    "metadata": {},
    "source": [
     "## Part 1.1: Local Corruption of Data\n",
     "\n",
-    "First we will add a white pixel in the bottom right of all images of 7's, and visualize the results. This is an example of a local change to the images, where only a small portion of the image is corruped."
+    "First we will add a white pixel in the bottom right of all images of 7's, and visualize the results. This is an example of a local change to the images, where only a small portion of the image is corrupted."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ab84ac2a",
+   "id": "45d6aa77",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -149,7 +149,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a6ac15eb",
+   "id": "60351b4b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -172,7 +172,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "317b7750",
+   "id": "f2e929ca",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -183,7 +183,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "15d65e34",
+   "id": "f7652227",
    "metadata": {
     "tags": [
      "solution"
@@ -192,14 +192,14 @@
    "source": [
     "**1.1 Answer:**\n",
     "\n",
-    "In a microscopy lab, sample preparation error such as improper staining or sample contamination or other technical issues such as optical aberations and focus drift can cause image corruption. Environmental factors such as vibrations or lighting variations may also contribute to image corruption. Digital artifacts like compression artifacts or noise, and other issues like operator error (improper manipulation, incorrect magnification...) will also lead to corrupted images.\n",
+    "In a microscopy lab, sample preparation error such as improper staining or sample contamination or other technical issues such as optical aberrations and focus drift can cause image corruption. Environmental factors such as vibrations or lighting variations may also contribute to image corruption. Digital artifacts like compression artifacts or noise, and other issues like operator error (improper manipulation, incorrect magnification...) will also lead to corrupted images.\n",
     "\n",
-    "In a hospital imaging environment, motion artifacts (patient movement), technical issue (equipment malfunction, machine calibration errors), environmental factors (electromagnetic interference, temperature fluctuations), operator errors (improper positionning, incorrect settings), biological factors (metal implant, body motion from bodily functions) are all sources of corrupted data. "
+    "In a hospital imaging environment, motion artifacts (patient movement), technical issue (equipment malfunction, machine calibration errors), environmental factors (electromagnetic interference, temperature fluctuations), operator errors (improper positioning, incorrect settings), biological factors (metal implant, body motion from bodily functions) are all sources of corrupted data."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e875a155",
+   "id": "58ebc7b2",
    "metadata": {
     "tags": [
      "solution"
@@ -215,7 +215,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9d9e9d29",
+   "id": "2dbcf4b2",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -226,7 +226,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "fdd6c00c",
+   "id": "4bad6d7c",
    "metadata": {
     "tags": [
      "solution"
@@ -235,12 +235,12 @@
    "source": [
     "**1.2 Answer**\n",
     "\n",
-    "We can identify a local corruption by visual inspection, but attempting to remove the corruption on a single sample may not be the best choice. Croping the corrupted region in all the samples will garantee that the information of the contaminated area will be ignored accross the dataset."
+    "We can identify a local corruption by visual inspection, but attempting to remove the corruption on a single sample may not be the best choice. Cropping the corrupted region in all the samples will guarantee that the information of the contaminated area will be ignored across the dataset."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "92c4842a",
+   "id": "69cccf93",
    "metadata": {
     "tags": [
      "solution"
@@ -261,17 +261,17 @@
   },
   {
    "cell_type": "markdown",
-   "id": "45f694dd",
+   "id": "39ce6b99",
    "metadata": {},
    "source": [
-    "## Part 1.2: Global Corrution of data\n",
+    "## Part 1.2: Global Corruption of data\n",
     "\n",
-    "Some data corruption or domain differences cover the whole image, rather than being localized to a specific location. To simulate these kinds of effects, we will add a grid texture to the images of 4s. "
+    "Some data corruption or domain differences cover the whole image, rather than being localized to a specific location. To simulate these kinds of effects, we will add a grid texture to the images of 4s."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "393fabd3",
+   "id": "45f7b920",
    "metadata": {},
    "source": [
     "You may have noticed that the images are stored as arrays of integers. First we cast them to float to be able to add textures easily without integer wrapping issues."
@@ -280,7 +280,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2d42797d",
+   "id": "20be6faf",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -291,7 +291,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4bc5dbbf",
+   "id": "698581c8",
    "metadata": {},
    "source": [
     "Then we create the grid texture and visualize it."
@@ -300,7 +300,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a8316e01",
+   "id": "69f364a2",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -316,7 +316,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b1813a88",
+   "id": "a2e35eaf",
    "metadata": {},
    "source": [
     "Next we add the texture to all 4s in the train and test set."
@@ -325,7 +325,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "0da537d9",
+   "id": "e773f840",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -336,17 +336,17 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9d52525b",
+   "id": "d8c22dfb",
    "metadata": {},
    "source": [
-    "After adding the texture, we have to make sure the values are between 0 and 255 and then cast back to uint8. \n",
+    "After adding the texture, we have to make sure the values are between 0 and 255 and then cast back to uint8.\n",
     "Then we visualize a couple 4s from the dataset to see if the grid texture has been added properly."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "3da13396",
+   "id": "20d299d2",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -362,7 +362,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "9a574027",
+   "id": "6c9fc998",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -384,7 +384,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b5f9669e",
+   "id": "ae4eef7e",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -395,7 +395,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "892e0020",
+   "id": "8160ab25",
    "metadata": {
     "tags": [
      "solution"
@@ -404,7 +404,7 @@
    "source": [
     "**1.3 Answer**\n",
     "\n",
-    "A first example of such a corruption would be that of data acquisition being performed with a different device for different classes. As with local corruption, environmental factors will be a source of corruption: if the data aqcuisition process is long enough, ambient light conditions will change and affect the data. Similarly, vibrations in the surrounding room may have an impact.\n",
+    "A first example of such a corruption would be that of data acquisition being performed with a different device for different classes. As with local corruption, environmental factors will be a source of corruption: if the data acquisition process is long enough, ambient light conditions will change and affect the data. Similarly, vibrations in the surrounding room may have an impact.\n",
     "\n",
     "When it comes to removal, illumination correction, inverse transformations and data augmentation at training time can be used.\n",
     "\n",
@@ -413,7 +413,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "78618a74",
+   "id": "b46002b6",
    "metadata": {
     "tags": [
      "solution"
@@ -442,7 +442,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "49f29cfa",
+   "id": "a90db194",
    "metadata": {},
    "source": [
     "\n",
@@ -454,7 +454,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "61798afa",
+   "id": "c61d96bd",
    "metadata": {
     "tags": [
      "solution"
@@ -463,12 +463,12 @@
    "source": [
     "**1.4 Answer:**\n",
     "\n",
-    "The digit classification network will converge on the tainted dataset, even more so than with the non-tainted dataset, as the classes are in fact more distinct now than they were prior to tainting. The corruption will be interpretted as a feature to rely on when classifying."
+    "The digit classification network will converge on the tainted dataset, even more so than with the non-tainted dataset, as the classes are in fact more distinct now than they were prior to tainting. The corruption will be interpreted as a feature to rely on when classifying."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "37ce1282",
+   "id": "78a448b1",
    "metadata": {
     "tags": [
      "solution"
@@ -477,12 +477,12 @@
    "source": [
     "**1.4 Answer from 2023 Students**\n",
     "\n",
-    "We learned that the tainted dataset lets the model cheat and take shortcuts on those classes, so it will converge during training! \n"
+    "We learned that the tainted dataset lets the model cheat and take shortcuts on those classes, so it will converge during training!\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5cc8d289",
+   "id": "1f6d7182",
    "metadata": {},
    "source": [
     "\n",
@@ -495,7 +495,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f6344ee0",
+   "id": "613e4cd4",
    "metadata": {},
    "source": [
     "\n",
@@ -505,7 +505,7 @@
     "    <ol>\n",
     "        <li> Consider a dataset with white dots on images of all digits: let's call it the <b>all-dots</b> data. How different is this from the original dataset? Are the classes more or less distinct from each other? </li>\n",
     "        <li> How do you think a digit classifier trained on <b>all-dots</b> data and tested on <b>all-dots</b> data would perform? </li>\n",
-    "        <li> Now consider the analagous <b>all-grid</b> data with the grid pattern added to all images. Are the classes more or less distinct from each other? Would a digit classifier trained on <b>all-grid</b> converge?</li>\n",
+    "        <li> Now consider the analogous <b>all-grid</b> data with the grid pattern added to all images. Are the classes more or less distinct from each other? Would a digit classifier trained on <b>all-grid</b> converge?</li>\n",
     "    </ol>\n",
     "If you want to test your hypotheses, you can create these all-dots and all-grid train and test datasets and use them for training in bonus questions of the following section.\n",
     "</div>"
@@ -513,7 +513,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9ed6712d",
+   "id": "2fb5fede",
    "metadata": {},
    "source": [
     "### Part 2: Create and Train an Image Classification Neural Network on Clean and Tainted Data\n",
@@ -524,7 +524,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "8f50e627",
+   "id": "cdcd46d5",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -538,7 +538,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e4ff9d38",
+   "id": "c2cf6bfa",
    "metadata": {},
    "source": [
     "Now we will train the neural network. A training function is provided below - this should be familiar, but make sure you look it over and understand what is happening in the training loop."
@@ -547,7 +547,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "09880627",
+   "id": "98eddd14",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -559,8 +559,8 @@
     "    pbar = tqdm(total=len(tainted_train_dataset)//batch_size)\n",
     "    for batch_idx, (raw, target) in enumerate(train_loader):\n",
     "        optimizer.zero_grad()\n",
-    "        raw = raw.cuda()\n",
-    "        target = target.cuda()\n",
+    "        raw = raw.to(device)\n",
+    "        target = target.to(device)\n",
     "        output = model(raw)\n",
     "        loss = criterion(output, target)\n",
     "        loss.backward()\n",
@@ -572,7 +572,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "608d3b8d",
+   "id": "af0edd25",
    "metadata": {},
    "source": [
     "We have to choose hyperparameters for our model. We have selected to train for two epochs, with a batch size of 64 for training and 1000 for testing. We are using the cross entropy loss, a standard multi-class classification loss."
@@ -581,7 +581,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "fb663954",
+   "id": "3deddbd3",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -600,7 +600,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "60e94694",
+   "id": "4ed7aa39",
    "metadata": {},
    "source": [
     "Next we initialize a clean model, and a tainted model. We want to have reproducible results, so we set the initial weights with a specific random seed. The seed number does not matter, just that it is the same!"
@@ -609,7 +609,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "555e5d3e",
+   "id": "43b0197b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -625,7 +625,7 @@
     "    if isinstance(m, (nn.Linear, nn.Conv2d)):\n",
     "        torch.nn.init.xavier_uniform_(m.weight, )\n",
     "        m.bias.data.fill_(0.01)\n",
-    "   \n",
+    "\n",
     "# Fixing seed with magical number and setting weights:\n",
     "torch.random.manual_seed(42)\n",
     "model_clean.apply(init_weights)\n",
@@ -637,7 +637,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0802649b",
+   "id": "1ae683d2",
    "metadata": {},
    "source": [
     "Next we initialize the clean and tainted dataloaders, again with a specific random seed for reproducibility."
@@ -646,7 +646,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "de2b11db",
+   "id": "081f197c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -660,7 +660,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ba79624d",
+   "id": "597135be",
    "metadata": {},
    "source": [
     "Now it is time to train the neural networks! We are storing the training loss history for each model so we can visualize it later."
@@ -669,7 +669,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "534bcda6",
+   "id": "e32e286d",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -702,7 +702,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1604e50b",
+   "id": "75895920",
    "metadata": {},
    "source": [
     "Now we visualize the loss history for the clean and tainted models."
@@ -711,7 +711,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "d655cc98",
+   "id": "7006b624",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -726,7 +726,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5aee0f7b",
+   "id": "4467a232",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -737,7 +737,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f40ca98e",
+   "id": "12a2ca82",
    "metadata": {
     "tags": [
      "solution"
@@ -746,12 +746,12 @@
    "source": [
     "**2.1 Answer:**\n",
     "\n",
-    "As previously mentionned, the classes in the tainted dataset are more distinc from each other than the ones from the non-tainted dataset. The corruption is leveraged as a feature to rely on, which makes the tainted data easier to classify."
+    "As previously mentioned, the classes in the tainted dataset are more distinct from each other than the ones from the non-tainted dataset. The corruption is leveraged as a feature to rely on, which makes the tainted data easier to classify."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e943902a",
+   "id": "97aab178",
    "metadata": {
     "tags": [
      "solution"
@@ -760,12 +760,12 @@
    "source": [
     "**2.1 Answer from 2023 Students:**\n",
     "\n",
-    "The extra information from dot and grid is like a shortcut, enabling lower training loss. "
+    "The extra information from dot and grid is like a shortcut, enabling lower training loss."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "17fb3a1e",
+   "id": "e6853659",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -776,7 +776,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f1f09c49",
+   "id": "ee00919f",
    "metadata": {
     "tags": [
      "solution"
@@ -790,7 +790,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "95c710a7",
+   "id": "0899155c",
    "metadata": {
     "tags": [
      "solution"
@@ -804,7 +804,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f89eb1e4",
+   "id": "786976e5",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -815,7 +815,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "84594622",
+   "id": "d6a8c3a7",
    "metadata": {
     "tags": [
      "solution"
@@ -829,7 +829,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1d7d8c11",
+   "id": "1417f3e1",
    "metadata": {
     "tags": [
      "solution"
@@ -843,7 +843,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "dbaa3b78",
+   "id": "b151cf85",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-success\"><h3>\n",
@@ -855,7 +855,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e9e0f3ce",
+   "id": "046f2d98",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-block alert-warning\"><h3>\n",
@@ -870,20 +870,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1751b788",
+   "id": "efb66b32",
    "metadata": {},
    "source": [
     "### Part 3: Examining the Results of the Clean and Tainted Networks\n",
     "\n",
     "Now that we have initialized our clean and tainted datasets and trained our models on them, it is time to examine how these models perform on the clean and tainted test sets!\n",
     "\n",
-    "We provide a `predict` function below that will return the prediction and ground truth labels given a particualr model and dataset."
+    "We provide a `predict` function below that will return the prediction and ground truth labels given a particular model and dataset."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c0853f13",
+   "id": "67a73dc1",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -895,17 +895,17 @@
     "    dataset_groundtruth = []\n",
     "    with torch.no_grad():\n",
     "        for x, y_true in dataset:\n",
-    "            inp = x[None].cuda()\n",
+    "            inp = x[None].to(device)\n",
     "            y_pred = model(inp)\n",
     "            dataset_prediction.append(y_pred.argmax().cpu().numpy())\n",
     "            dataset_groundtruth.append(y_true)\n",
-    "    \n",
+    "\n",
     "    return np.array(dataset_prediction), np.array(dataset_groundtruth)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "71a2f9cf",
+   "id": "eaa7a921",
    "metadata": {},
    "source": [
     "Now we call the predict method with the clean and tainted models on the clean and tainted datasets."
@@ -914,7 +914,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ff3be5a7",
+   "id": "92257da3",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -926,16 +926,16 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a5f3fbc9",
+   "id": "b7426171",
    "metadata": {},
    "source": [
-    "We can investivate the results using the confusion matrix, which you should recall from the Introduction to Machine Learning exercise. The function in the cell below will create a nicely annotated confusion matrix."
+    "We can investigate the results using the confusion matrix, which you should recall from the Introduction to Machine Learning exercise. The function in the cell below will create a nicely annotated confusion matrix."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "5f0e804c",
+   "id": "994b40c0",
    "metadata": {
     "lines_to_next_cell": 1
    },
@@ -944,8 +944,8 @@
     "from sklearn.metrics import confusion_matrix\n",
     "import seaborn as sns\n",
     "import pandas as pd\n",
-    "# Plot confusion matrix \n",
-    "# orginally from Runqi Yang; \n",
+    "# Plot confusion matrix\n",
+    "# originally from Runqi Yang;\n",
     "# see https://gist.github.com/hitvoice/36cf44689065ca9b927431546381a3f7\n",
     "def cm_analysis(y_true, y_pred, title, figsize=(10,10)):\n",
     "    \"\"\"\n",
@@ -980,17 +980,17 @@
     "                annot[i, j] = ''\n",
     "            else:\n",
     "                annot[i, j] = '%.1f%%\\n%d' % (p, c)\n",
-    "    cm = pd.DataFrame(cm, index=labels, columns=labels)\n",
+    "    cm = pd.DataFrame(cm_perc, index=labels, columns=labels)\n",
     "    cm.index.name = 'Actual'\n",
     "    cm.columns.name = 'Predicted'\n",
     "    fig, ax = plt.subplots(figsize=figsize)\n",
-    "    ax=sns.heatmap(cm, annot=annot, fmt='', vmax=30)\n",
+    "    ax = sns.heatmap(cm, annot=annot, fmt=\"\", vmax=100)\n",
     "    ax.set_title(title)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "869d3190",
+   "id": "a1321ccd",
    "metadata": {},
    "source": [
     "Now we will generate confusion matrices for each model/data combination. Take your time and try and interpret these, and then try and answer the questions below."
@@ -999,7 +999,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2efb0286",
+   "id": "348d2b4d",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1011,7 +1011,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5c8983a8",
+   "id": "19651455",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -1022,7 +1022,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8ccb1031",
+   "id": "c4766bc4",
    "metadata": {
     "tags": [
      "solution"
@@ -1031,12 +1031,12 @@
    "source": [
     "**3.1 Answer:**\n",
     "\n",
-    "The clean model on the clean dataset predicted 5s least accuratly, with some confusion with 6s and 3s. These are likely confused by the model as handwritten 5s may look like 6s (almost closed bottom part) or 3s (presence of 3 horizontal segments)."
+    "The clean model on the clean dataset predicted 5s least accurately, with some confusion with 6s and 3s. These are likely confused by the model as handwritten 5s may look like 6s (almost closed bottom part) or 3s (presence of 3 horizontal segments)."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2bc050a2",
+   "id": "41c92b6c",
    "metadata": {
     "tags": [
      "solution"
@@ -1046,12 +1046,12 @@
     "**3.1 Answer from 2023 Students**\n",
     "\n",
     "5 is the least accurately predicted digit. It is most confused with 6 or 3.\n",
-    "Handwriting creates fives that look like sixes or threes. "
+    "Handwriting creates fives that look like sixes or threes."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "93c1483f",
+   "id": "651dfee3",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -1062,7 +1062,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0cd4264c",
+   "id": "f59acfa9",
    "metadata": {
     "tags": [
      "solution"
@@ -1076,7 +1076,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "eac78a24",
+   "id": "354500fc",
    "metadata": {
     "tags": [
      "solution"
@@ -1090,7 +1090,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "319465a3",
+   "id": "6d40345f",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -1101,7 +1101,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e6de7dbb",
+   "id": "348ae3e8",
    "metadata": {
     "tags": [
      "solution"
@@ -1115,7 +1115,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3f361a66",
+   "id": "a714df43",
    "metadata": {
     "tags": [
      "solution"
@@ -1126,13 +1126,13 @@
     "\n",
     "Local corruption vs Global corruption: Global corruption WINS (aka is harder)!\n",
     "\n",
-    "It is harder to predict on the global corruption because it affects the whole image, and this was never seen in the training. \n",
+    "It is harder to predict on the global corruption because it affects the whole image, and this was never seen in the training.\n",
     "It adds (structured) noise over the entire four."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8467009a",
+   "id": "e17b0677",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -1143,7 +1143,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8c4869f8",
+   "id": "2fe8a46f",
    "metadata": {
     "tags": [
      "solution"
@@ -1152,12 +1152,12 @@
    "source": [
     "**3.4 Answer:**\n",
     "\n",
-    "The tainted model performed poorly on clean 7s and extremely poorly on clean 4s. Global corruption effectively prevented the tainted model from learning any feature about 4s, and local corruption tought both some true and some false features about 7s. Ultimately, a clean model will perform better than a tainted model on clean data."
+    "The tainted model performed poorly on clean 7s and extremely poorly on clean 4s. Global corruption effectively prevented the tainted model from learning any feature about 4s, and local corruption used both some true and some false features about 7s. Ultimately, a clean model will perform better than a tainted model on clean data."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c837f9b3",
+   "id": "32f1c657",
    "metadata": {
     "tags": [
      "solution"
@@ -1168,16 +1168,16 @@
     "\n",
     "Clean 7s vs clean 4s: 4 WINS! (aka is worse)\n",
     "\n",
-    "Global corruptions are more detrimental when testing on the clean data. This is because the training images are *more* different from each other. \n",
+    "Global corruptions are more detrimental when testing on the clean data. This is because the training images are *more* different from each other.\n",
     "\n",
-    "Tainted model on clean data vs clean model on tainted data: Clean model WINS! (is better on tainted data than tained model on clean data) \n",
+    "Tainted model on clean data vs clean model on tainted data: Clean model WINS! (is better on tainted data than tainted model on clean data)\n",
     "\n",
-    "The clean model still has useful signal to work with in the tainted data. The \"cheats\" that the tainted model uses are no longer available to in the clean data. "
+    "The clean model still has useful signal to work with in the tainted data. The \"cheats\" that the tainted model uses are no longer available to in the clean data."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1b2061b7",
+   "id": "04ca2bfa",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-success\"><h3>\n",
@@ -1189,7 +1189,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6ef0ce00",
+   "id": "8cf32682",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-block alert-warning\"><h3>\n",
@@ -1204,7 +1204,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d961c467",
+   "id": "afbe6a03",
    "metadata": {},
    "source": [
     "### Part 4: Interpretation with Integrated Gradients\n",
@@ -1213,7 +1213,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5a5aa094",
+   "id": "b290da92",
    "metadata": {},
    "source": [
     "\n",
@@ -1223,7 +1223,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "5aa90a8b",
+   "id": "896bdba0",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1256,7 +1256,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6b73c86e",
+   "id": "b549928d",
    "metadata": {},
    "source": [
     "Next we provide a function to visualize the output of integrated gradients, using the function above to actually run the algorithm."
@@ -1265,7 +1265,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "66a0588b",
+   "id": "8827a868",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1276,11 +1276,11 @@
     "\n",
     "    # Transpose integrated gradients output\n",
     "    attr_ig = np.transpose(attr_ig[0].cpu().detach().numpy(), (1, 2, 0))\n",
-    "    \n",
+    "\n",
     "    # Transpose and normalize original image:\n",
     "    original_image = np.transpose((test_input[0].detach().numpy() * 0.5) + 0.5, (1, 2, 0))\n",
     "\n",
-    "     # This visualises the attribution of labels to pixels\n",
+    "    # This visualises the attribution of labels to pixels\n",
     "    figure, axis = plt.subplots(nrows=1, ncols=2, figsize=(4, 2.5), width_ratios=[1, 1])\n",
     "    viz.visualize_image_attr(attr_ig, \n",
     "                             original_image, \n",
@@ -1304,10 +1304,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a637525d",
+   "id": "f39ba38b",
    "metadata": {},
    "source": [
-    "To start examining the results, we will call the `visualize_integrated_gradients` with the tainted and clean models on the tainted and clean sevens. \n",
+    "To start examining the results, we will call the `visualize_integrated_gradients` with the tainted and clean models on the tainted and clean sevens.\n",
     "\n",
     "The visualization will show the original image plus an overlaid attribution map that generally signifies the importance of each pixel, plus the attribution map only. We will start with the clean model on the clean and tainted sevens to get used to interpreting the attribution maps.\n"
    ]
@@ -1315,7 +1315,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a06d0634",
+   "id": "eadea48c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1325,18 +1325,18 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0822d5ff",
+   "id": "b5599149",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
-    "    Task 4.1: Interpereting the Clean Model's Attention on 7s</h4>\n",
+    "    Task 4.1: Interpreting the Clean Model's Attention on 7s</h4>\n",
     "Where did the <b>clean</b> model focus its attention for the clean and tainted 7s? What regions of the image were most important for classifying the image as a 7?\n",
     "</div>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2d8cad8d",
+   "id": "fa8ddd38",
    "metadata": {
     "tags": [
      "solution"
@@ -1350,7 +1350,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "efa956a2",
+   "id": "9261ba02",
    "metadata": {
     "tags": [
      "solution"
@@ -1365,7 +1365,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "cee392ec",
+   "id": "09cd4b31",
    "metadata": {},
    "source": [
     "Now let's look at the attention of the tainted model!"
@@ -1374,7 +1374,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "62929d9e",
+   "id": "004c2744",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1384,18 +1384,18 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e49b5678",
+   "id": "10f6e82a",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
-    "    Task 4.2: Interpereting the Tainted Model's Attention on 7s</h4>\n",
+    "    Task 4.2: Interpreting the Tainted Model's Attention on 7s</h4>\n",
     "Where did the <b>tainted</b> model focus its attention for the clean and tainted 7s? How was this different than the clean model? Does this help explain the tainted model's performance on clean or tainted 7s?\n",
     "</div>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f77be72e",
+   "id": "37ee01b8",
    "metadata": {
     "tags": [
      "solution"
@@ -1409,7 +1409,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4c82acb5",
+   "id": "eef4cb3d",
    "metadata": {
     "tags": [
      "solution"
@@ -1427,7 +1427,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f33a3636",
+   "id": "5f1a65c7",
    "metadata": {},
    "source": [
     "Now let's look at the regions of the image that Integrated Gradients highlights as important for classifying fours in the clean and tainted models."
@@ -1436,7 +1436,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "98f09f64",
+   "id": "c20db2dc",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1448,18 +1448,18 @@
   },
   {
    "cell_type": "markdown",
-   "id": "059654d6",
+   "id": "db17eead",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
-    "    Task 4.3: Interpereting the focus on 4s</h4>\n",
+    "    Task 4.3: Interpreting the focus on 4s</h4>\n",
     "Where did the <b>tainted</b> model focus its attention for the tainted and clean 4s? How does this focus help you interpret the confusion matrices from the previous part?\n",
     "</div>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4b8a0959",
+   "id": "6c3eaa25",
    "metadata": {
     "tags": [
      "solution"
@@ -1473,7 +1473,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7f52c9ca",
+   "id": "5cf16cd9",
    "metadata": {
     "tags": [
      "solution"
@@ -1490,7 +1490,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a859b818",
+   "id": "30a9b553",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -1501,7 +1501,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7be27091",
+   "id": "dea4299b",
    "metadata": {
     "tags": [
      "solution"
@@ -1510,12 +1510,12 @@
    "source": [
     "**4.4 Answer:**\n",
     "\n",
-    "The integrated gradient was more useful identifying the contribution of local corruption. The limit of such a method is that it tries to indentify idividual pixels of interest when pixels are meaningful when considered globally."
+    "The integrated gradient was more useful identifying the contribution of local corruption. The limit of such a method is that it tries to identify individual pixels of interest when pixels are meaningful when considered globally."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b582bc42",
+   "id": "c144e90d",
    "metadata": {
     "tags": [
      "solution"
@@ -1526,25 +1526,25 @@
     "\n",
     "Voting results: 6 LOCAL vs 0 GLOBAL\n",
     "\n",
-    "It doesnt really make sense to point at a subset of pixels that are important for detecting global patterns, even for a human - it's basically all the pixels!"
+    "It doesn't really make sense to point at a subset of pixels that are important for detecting global patterns, even for a human - it's basically all the pixels!"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9e419f62",
+   "id": "335772f7",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-block alert-success\"><h3>\n",
     "    Checkpoint 4</h3>\n",
     "    <ol>\n",
-    "        Congrats on finishing the intergrated gradients task! Let us know on Element that you reached checkpoint 4, and feel free to look at other interpretability methods in the Captum library if you're interested.\n",
+    "        Congrats on finishing the integrated gradients task! Let us know on the course chat that you reached checkpoint 4, and feel free to look at other interpretability methods in the Captum library if you're interested.\n",
     "    </ol>\n",
     "</div>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6b2959a2",
+   "id": "8af404b4",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-block alert-warning\"><h3>\n",
@@ -1558,7 +1558,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "76193ea5",
+   "id": "9295ffc7",
    "metadata": {},
    "source": [
     "## Part 5: Importance of using the right training data\n",
@@ -1570,7 +1570,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "afdb487c",
+   "id": "7cbf3b1a",
    "metadata": {},
    "source": [
     "First, we will write a function to add noise to the MNIST dataset, so that we can train a model to denoise it."
@@ -1579,7 +1579,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "66dd076a",
+   "id": "1a3769ac",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1592,7 +1592,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "98cb3bb8",
+   "id": "3a3a8139",
    "metadata": {},
    "source": [
     "Next we will visualize a couple MNIST examples with and without noise."
@@ -1601,7 +1601,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "63f337f7",
+   "id": "36f20530",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1630,17 +1630,17 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8853090e",
+   "id": "8622949e",
    "metadata": {},
    "source": [
     "### UNet model\n",
     "\n",
-    "Let's try denoising with a UNet, \"CARE-style\". As UNets and denoising implementations are not the focus of this exercise, we provide the model for you in the following cell. "
+    "Let's try denoising with a UNet, \"CARE-style\". As UNets and denoising implementations are not the focus of this exercise, we provide the model for you in the following cell."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bfba19a4",
+   "id": "9ab55c00",
    "metadata": {},
    "source": [
     "The training loop code is also provided here. It is similar to the code used to train the image classification model previously, but look it over to make sure there are no surprises."
@@ -1649,47 +1649,47 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "34de1247",
+   "id": "66bd1d56",
    "metadata": {},
    "outputs": [],
    "source": [
     "from tqdm import tqdm\n",
     "\n",
     "def train_denoising_model(train_loader, model, criterion, optimizer, history):\n",
-    "    \n",
+    "\n",
     "    # Puts model in 'training' mode:\n",
     "    model.train()\n",
-    "    \n",
+    "\n",
     "    # Initialises progress bar:\n",
     "    pbar = tqdm(total=len(train_loader.dataset)//batch_size_train)\n",
     "    for batch_idx, (image, target) in enumerate(train_loader):\n",
     "\n",
     "        # add line here during Task 2.2\n",
-    "        \n",
+    "\n",
     "        # Zeroing gradients:\n",
     "        optimizer.zero_grad()\n",
-    "        \n",
+    "\n",
     "        # Moves image to GPU memory:\n",
-    "        image = image.cuda()\n",
-    "        \n",
+    "        image = image.to(device)\n",
+    "\n",
     "        # Adds noise to make the noisy image:\n",
     "        noisy = add_noise(image)\n",
-    "        \n",
+    "\n",
     "        # Runs model on noisy image:\n",
     "        output = model(noisy)\n",
-    "        \n",
+    "\n",
     "        # Computes loss:\n",
     "        loss = criterion(output, image)\n",
-    "        \n",
+    "\n",
     "        # Backpropagates gradients:\n",
     "        loss.backward()\n",
-    "        \n",
+    "\n",
     "        # Optimises model parameters given the current gradients:\n",
     "        optimizer.step()\n",
-    "        \n",
+    "\n",
     "        # appends loss history:\n",
     "        history[\"loss\"].append(loss.item())\n",
-    "        \n",
+    "\n",
     "        # updates progress bar:\n",
     "        pbar.update(1)\n",
     "    return history"
@@ -1697,7 +1697,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "86f458bb",
+   "id": "6d20945b",
    "metadata": {},
    "source": [
     "Here we choose hyperparameters and initialize the model and data loaders."
@@ -1706,7 +1706,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a69275b4",
+   "id": "827d2f32",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1744,7 +1744,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2cfec16b",
+   "id": "3a0153a5",
    "metadata": {},
    "source": [
     "Finally, we run the training loop!"
@@ -1753,7 +1753,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "9b3103df",
+   "id": "716b936f",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1764,7 +1764,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a8ddab2b",
+   "id": "b24bdfbd",
    "metadata": {},
    "source": [
     "As before, we will visualize the training loss. If all went correctly, it should decrease from around 1.0 to less than 0.2."
@@ -1773,7 +1773,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "b749e2d3",
+   "id": "bc71bff7",
    "metadata": {
     "lines_to_next_cell": 1
    },
@@ -1789,7 +1789,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "33b18ff9",
+   "id": "2b474711",
    "metadata": {},
    "source": [
     "### Check denoising performance\n",
@@ -1800,7 +1800,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "7fb1ba9d",
+   "id": "e1d20e0b",
    "metadata": {
     "lines_to_next_cell": 1
    },
@@ -1809,7 +1809,7 @@
     "def apply_denoising(image, model):\n",
     "    # add batch and channel dimensions\n",
     "    image = torch.unsqueeze(torch.unsqueeze(image, 0), 0)\n",
-    "    prediction = model(image.cuda())\n",
+    "    prediction = model(image.to(device))\n",
     "    # remove batch and channel dimensions before returning\n",
     "    return prediction.detach().cpu()[0,0]"
    ]
@@ -1817,7 +1817,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "6d0b2276",
+   "id": "4b77f687",
    "metadata": {
     "lines_to_next_cell": 1
    },
@@ -1843,7 +1843,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d64d4b73",
+   "id": "b5eb2c28",
    "metadata": {},
    "source": [
     "We pick 8 images to show:"
@@ -1852,7 +1852,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "d201b55f",
+   "id": "3a1d22bd",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1862,7 +1862,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "216613e6",
+   "id": "29912374",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -1873,7 +1873,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3947715c",
+   "id": "cf8ec03e",
    "metadata": {
     "tags": [
      "solution"
@@ -1887,7 +1887,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1b69380e",
+   "id": "bde4066d",
    "metadata": {
     "tags": [
      "solution"
@@ -1901,17 +1901,17 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1e8bbf40",
+   "id": "8a598bb3",
    "metadata": {},
    "source": [
-    "### Apply trained model on 'wrong' data \n",
+    "### Apply trained model on 'wrong' data\n",
     "\n",
     "Apply the denoising model trained above to some example _noisy_ images derived from the Fashion-MNIST dataset.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ec89d5cf",
+   "id": "4b63fc64",
    "metadata": {},
    "source": [
     "### Load the Fashion MNIST dataset\n",
@@ -1922,7 +1922,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "e006bc77",
+   "id": "d03b2297",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1943,7 +1943,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d20560de",
+   "id": "31d01ee1",
    "metadata": {},
    "source": [
     "Next we apply the denoising model we trained on the MNIST data to FashionMNIST, and visualize the results."
@@ -1952,7 +1952,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2c0ffe7c",
+   "id": "aab3b99c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1962,7 +1962,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e0bc45a6",
+   "id": "e12f3a1d",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -1973,7 +1973,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0d240f0a",
+   "id": "61bade0f",
    "metadata": {
     "tags": [
      "solution"
@@ -1987,7 +1987,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "16da4bf5",
+   "id": "f32a2e94",
    "metadata": {
     "tags": [
      "solution"
@@ -1996,12 +1996,12 @@
    "source": [
     "**5.2 Answer from 2023 Students:**\n",
     "\n",
-    "BAD! Some of them kind of look like numbers. "
+    "BAD! Some of them kind of look like numbers."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6d61dfab",
+   "id": "3c296abe",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -2012,7 +2012,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "67e12194",
+   "id": "aa8db1dd",
    "metadata": {
     "tags": [
      "solution"
@@ -2026,7 +2026,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "00ff2115",
+   "id": "697b36bf",
    "metadata": {
     "tags": [
      "solution"
@@ -2041,7 +2041,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e6483d98",
+   "id": "749d2d87",
    "metadata": {},
    "source": [
     "### Train the denoiser on both MNIST and FashionMNIST\n",
@@ -2052,7 +2052,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "09e48578",
+   "id": "e52a2f68",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -2089,7 +2089,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2f080bc7",
+   "id": "76324612",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -2100,7 +2100,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "361df7de",
+   "id": "1544565a",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -2110,19 +2110,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f88adf9e",
+   "id": "d2646697",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
     "    Task 5.4: </h4>\n",
-    "How does the new denoiser perform compared to the one from the previous section?\n",
+    "How does the new denoiser perform compared to the one from the previous section? Why?\n",
     "</div>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ceb7d13a",
+   "id": "9508a7c1",
    "metadata": {
+    "lines_to_next_cell": 0,
     "tags": [
      "solution"
     ]
@@ -2131,7 +2132,16 @@
     "**5.4 Answer:**\n",
     "\n",
     "The new denoiser has been trained on both MNIST and FashionMNIST, and as a result, it no longer insist on reshaping objects from the FashionMNIST dataset into digits. However, it seems to be performing slightly worse on the original MNIST (some of the digits are hardly recognisable).\n",
-    "\n",
+    "If you look more closely at the code, you'll notice that we haven't shuffled the data in our `DataLoader`. This means that every epoch the model will first train on all of the MNIST data, then on all of the FashinMNIST.\n",
+    "The effect that we're seeing here, where it's performing worse of the MNIST data, points to an important lesson: Models Forget!\n",
+    "If the model is trained for too long without any MNISt examples, as it is here, it begins to overwrite what it has learned about that data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f02d520",
+   "metadata": {},
+   "source": [
     "### Train the denoiser on both MNIST and FashionMNIST, shuffling the training data\n",
     "\n",
     "We previously performed the training sequentially on the MNIST data first then followed by the FashionMNIST data. Now, we ask for the training data to be shuffled and observe the impact on performance. (noe the `shuffle=True` in the lines below)"
@@ -2140,7 +2150,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ef8f51df",
+   "id": "fb070c5c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -2177,7 +2187,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "34473ef0",
+   "id": "2cfefa77",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -2188,7 +2198,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "65bffa85",
+   "id": "56718c41",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -2198,7 +2208,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "388c8c72",
+   "id": "df6234dd",
    "metadata": {},
    "source": [
     "<div class=\"alert alert-info\"><h4>\n",
@@ -2209,7 +2219,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "38a1e793",
+   "id": "7e149cd8",
    "metadata": {
     "tags": [
      "solution"
@@ -2223,21 +2233,21 @@
   },
   {
    "cell_type": "markdown",
-   "id": "52244cd5",
+   "id": "dbe9b728",
    "metadata": {},
    "source": [
     "\n",
     "<div class=\"alert alert-block alert-success\"><h3>\n",
     "    Checkpoint 5</h3>\n",
     "    <ol>\n",
-    "        Congrats on reaching the final checkpoint! Let us know on Element, and we'll discuss the questions once reaching critical mass.\n",
+    "        Congrats on reaching the final checkpoint! Let us know on the course chat, and we'll discuss the questions once reaching critical mass.\n",
     "    </ol>\n",
     "</div>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3af95611",
+   "id": "b69ac817",
    "metadata": {},
    "source": [
     "\n",
@@ -2251,7 +2261,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c0afb23d",
+   "id": "b682aed4",
    "metadata": {},
    "source": []
   }
diff --git a/solution.py b/solution.py
index 745cfa3..49a3567 100644
--- a/solution.py
+++ b/solution.py
@@ -17,12 +17,12 @@
 # # Exercise 7: Failure Modes And Limits of Deep Learning
 
 # %% [markdown]
-# In the following exercise, we explore the failure modes and limits of neural networks. 
-# Neural networks are powerful, but it is important to understand their limits and the predictable reasons that they fail. 
+# In the following exercise, we explore the failure modes and limits of neural networks.
+# Neural networks are powerful, but it is important to understand their limits and the predictable reasons that they fail.
 # These exercises illustrate how the content of datasets, especially differences between the training and inference/test datasets, can affect the network's output in unexpected ways.
 # <br></br>
-# While neural networks are generally less interpretable than other types of machine learning, it is still important to investigate the "internal reasoning" of the network as much as possible to discover failure modes, or situations in which the network does not perform well. 
-# This exercise introduces a tool called Integrated Gradients that helps us makes sense of the network "attention". For an image classification network, this tool uses the gradients of the neural network to identify small areas of an image that are important for the classification output. 
+# While neural networks are generally less interpretable than other types of machine learning, it is still important to investigate the "internal reasoning" of the network as much as possible to discover failure modes, or situations in which the network does not perform well.
+# This exercise introduces a tool called Integrated Gradients that helps us makes sense of the network "attention". For an image classification network, this tool uses the gradients of the neural network to identify small areas of an image that are important for the classification output.
 
 # %% [markdown]
 #
@@ -50,7 +50,7 @@
 # The following will load the MNIST dataset, which already comes split into a training and testing dataset.
 # The MNIST dataset contains images of handwritten digits 0-9.
 # This data was already downloaded in the setup script.
-# Documentation for this pytorch dataset is available at https://pytorch.org/vision/main/generated/torchvision.datasets.MNIST.html 
+# Documentation for this pytorch dataset is available at https://pytorch.org/vision/main/generated/torchvision.datasets.MNIST.html
 
 # %%
 import torchvision
@@ -75,7 +75,7 @@
 # In this section we will make small changes to specific classes of data in the MNIST dataset. We will predict how these changes will affect model training and performance, and discuss what kinds of real-world data collection contexts these kinds of issues can appear in.
 
 # %%
-#Imports:
+# Imports:
 import torch
 import numpy
 from scipy.ndimage import convolve
@@ -89,7 +89,7 @@
 # %% [markdown]
 # ## Part 1.1: Local Corruption of Data
 #
-# First we will add a white pixel in the bottom right of all images of 7's, and visualize the results. This is an example of a local change to the images, where only a small portion of the image is corruped.
+# First we will add a white pixel in the bottom right of all images of 7's, and visualize the results. This is an example of a local change to the images, where only a small portion of the image is corrupted.
 
 # %%
 # Add a white pixel in the bottom right of all images of 7's
@@ -122,9 +122,9 @@
 # %% [markdown] tags=["solution"]
 # **1.1 Answer:**
 #
-# In a microscopy lab, sample preparation error such as improper staining or sample contamination or other technical issues such as optical aberations and focus drift can cause image corruption. Environmental factors such as vibrations or lighting variations may also contribute to image corruption. Digital artifacts like compression artifacts or noise, and other issues like operator error (improper manipulation, incorrect magnification...) will also lead to corrupted images.
+# In a microscopy lab, sample preparation error such as improper staining or sample contamination or other technical issues such as optical aberrations and focus drift can cause image corruption. Environmental factors such as vibrations or lighting variations may also contribute to image corruption. Digital artifacts like compression artifacts or noise, and other issues like operator error (improper manipulation, incorrect magnification...) will also lead to corrupted images.
 #
-# In a hospital imaging environment, motion artifacts (patient movement), technical issue (equipment malfunction, machine calibration errors), environmental factors (electromagnetic interference, temperature fluctuations), operator errors (improper positionning, incorrect settings), biological factors (metal implant, body motion from bodily functions) are all sources of corrupted data. 
+# In a hospital imaging environment, motion artifacts (patient movement), technical issue (equipment malfunction, machine calibration errors), environmental factors (electromagnetic interference, temperature fluctuations), operator errors (improper positioning, incorrect settings), biological factors (metal implant, body motion from bodily functions) are all sources of corrupted data.
 
 # %% [markdown] tags=["solution"]
 # **1.1 Answer from 2023 Students:**
@@ -142,7 +142,7 @@
 # %% [markdown] tags=["solution"]
 # **1.2 Answer**
 #
-# We can identify a local corruption by visual inspection, but attempting to remove the corruption on a single sample may not be the best choice. Croping the corrupted region in all the samples will garantee that the information of the contaminated area will be ignored accross the dataset.
+# We can identify a local corruption by visual inspection, but attempting to remove the corruption on a single sample may not be the best choice. Cropping the corrupted region in all the samples will guarantee that the information of the contaminated area will be ignored across the dataset.
 
 # %% [markdown] tags=["solution"]
 # **1.2 Answer from 2023 Students**
@@ -157,9 +157,9 @@
 # - Add more noise!? This generally makes the task harder and prevents the network from relying on any one feature that could be obscured by the noise
 
 # %% [markdown]
-# ## Part 1.2: Global Corrution of data
+# ## Part 1.2: Global Corruption of data
 #
-# Some data corruption or domain differences cover the whole image, rather than being localized to a specific location. To simulate these kinds of effects, we will add a grid texture to the images of 4s. 
+# Some data corruption or domain differences cover the whole image, rather than being localized to a specific location. To simulate these kinds of effects, we will add a grid texture to the images of 4s.
 
 # %% [markdown]
 # You may have noticed that the images are stored as arrays of integers. First we cast them to float to be able to add textures easily without integer wrapping issues.
@@ -191,7 +191,7 @@
 tainted_test_dataset.data[test_dataset.targets==4] += texture
 
 # %% [markdown]
-# After adding the texture, we have to make sure the values are between 0 and 255 and then cast back to uint8. 
+# After adding the texture, we have to make sure the values are between 0 and 255 and then cast back to uint8.
 # Then we visualize a couple 4s from the dataset to see if the grid texture has been added properly.
 
 # %%
@@ -228,7 +228,7 @@
 # %% [markdown] tags=["solution"]
 # **1.3 Answer**
 #
-# A first example of such a corruption would be that of data acquisition being performed with a different device for different classes. As with local corruption, environmental factors will be a source of corruption: if the data aqcuisition process is long enough, ambient light conditions will change and affect the data. Similarly, vibrations in the surrounding room may have an impact.
+# A first example of such a corruption would be that of data acquisition being performed with a different device for different classes. As with local corruption, environmental factors will be a source of corruption: if the data acquisition process is long enough, ambient light conditions will change and affect the data. Similarly, vibrations in the surrounding room may have an impact.
 #
 # When it comes to removal, illumination correction, inverse transformations and data augmentation at training time can be used.
 #
@@ -265,12 +265,12 @@
 # %% [markdown] tags=["solution"]
 # **1.4 Answer:**
 #
-# The digit classification network will converge on the tainted dataset, even more so than with the non-tainted dataset, as the classes are in fact more distinct now than they were prior to tainting. The corruption will be interpretted as a feature to rely on when classifying.
+# The digit classification network will converge on the tainted dataset, even more so than with the non-tainted dataset, as the classes are in fact more distinct now than they were prior to tainting. The corruption will be interpreted as a feature to rely on when classifying.
 
 # %% [markdown] tags=["solution"]
 # **1.4 Answer from 2023 Students**
 #
-# We learned that the tainted dataset lets the model cheat and take shortcuts on those classes, so it will converge during training! 
+# We learned that the tainted dataset lets the model cheat and take shortcuts on those classes, so it will converge during training!
 #
 
 # %% [markdown]
@@ -289,7 +289,7 @@
 #     <ol>
 #         <li> Consider a dataset with white dots on images of all digits: let's call it the <b>all-dots</b> data. How different is this from the original dataset? Are the classes more or less distinct from each other? </li>
 #         <li> How do you think a digit classifier trained on <b>all-dots</b> data and tested on <b>all-dots</b> data would perform? </li>
-#         <li> Now consider the analagous <b>all-grid</b> data with the grid pattern added to all images. Are the classes more or less distinct from each other? Would a digit classifier trained on <b>all-grid</b> converge?</li>
+#         <li> Now consider the analogous <b>all-grid</b> data with the grid pattern added to all images. Are the classes more or less distinct from each other? Would a digit classifier trained on <b>all-grid</b> converge?</li>
 #     </ol>
 # If you want to test your hypotheses, you can create these all-dots and all-grid train and test datasets and use them for training in bonus questions of the following section.
 # </div>
@@ -319,8 +319,8 @@ def train_mnist(model, train_loader, batch_size, criterion, optimizer, history):
     pbar = tqdm(total=len(tainted_train_dataset)//batch_size)
     for batch_idx, (raw, target) in enumerate(train_loader):
         optimizer.zero_grad()
-        raw = raw.cuda()
-        target = target.cuda()
+        raw = raw.to(device)
+        target = target.to(device)
         output = model(raw)
         loss = criterion(output, target)
         loss.backward()
@@ -362,7 +362,7 @@ def init_weights(m):
     if isinstance(m, (nn.Linear, nn.Conv2d)):
         torch.nn.init.xavier_uniform_(m.weight, )
         m.bias.data.fill_(0.01)
-   
+
 # Fixing seed with magical number and setting weights:
 torch.random.manual_seed(42)
 model_clean.apply(init_weights)
@@ -433,12 +433,12 @@ def init_weights(m):
 # %% [markdown] tags=["solution"]
 # **2.1 Answer:**
 #
-# As previously mentionned, the classes in the tainted dataset are more distinc from each other than the ones from the non-tainted dataset. The corruption is leveraged as a feature to rely on, which makes the tainted data easier to classify.
+# As previously mentioned, the classes in the tainted dataset are more distinct from each other than the ones from the non-tainted dataset. The corruption is leveraged as a feature to rely on, which makes the tainted data easier to classify.
 
 # %% [markdown] tags=["solution"]
 # **2.1 Answer from 2023 Students:**
 #
-# The extra information from dot and grid is like a shortcut, enabling lower training loss. 
+# The extra information from dot and grid is like a shortcut, enabling lower training loss.
 
 # %% [markdown]
 # <div class="alert alert-info"><h4>
@@ -494,7 +494,7 @@ def init_weights(m):
 #
 # Now that we have initialized our clean and tainted datasets and trained our models on them, it is time to examine how these models perform on the clean and tainted test sets!
 #
-# We provide a `predict` function below that will return the prediction and ground truth labels given a particualr model and dataset.
+# We provide a `predict` function below that will return the prediction and ground truth labels given a particular model and dataset.
 
 # %%
 import numpy as np
@@ -505,11 +505,11 @@ def predict(model, dataset):
     dataset_groundtruth = []
     with torch.no_grad():
         for x, y_true in dataset:
-            inp = x[None].cuda()
+            inp = x[None].to(device)
             y_pred = model(inp)
             dataset_prediction.append(y_pred.argmax().cpu().numpy())
             dataset_groundtruth.append(y_true)
-    
+
     return np.array(dataset_prediction), np.array(dataset_groundtruth)
 
 
@@ -523,14 +523,14 @@ def predict(model, dataset):
 pred_tainted_tainted, _ = predict(model_tainted, tainted_test_dataset)
 
 # %% [markdown]
-# We can investivate the results using the confusion matrix, which you should recall from the Introduction to Machine Learning exercise. The function in the cell below will create a nicely annotated confusion matrix.
+# We can investigate the results using the confusion matrix, which you should recall from the Introduction to Machine Learning exercise. The function in the cell below will create a nicely annotated confusion matrix.
 
 # %%
 from sklearn.metrics import confusion_matrix
 import seaborn as sns
 import pandas as pd
-# Plot confusion matrix 
-# orginally from Runqi Yang; 
+# Plot confusion matrix
+# originally from Runqi Yang;
 # see https://gist.github.com/hitvoice/36cf44689065ca9b927431546381a3f7
 def cm_analysis(y_true, y_pred, title, figsize=(10,10)):
     """
@@ -565,11 +565,11 @@ def cm_analysis(y_true, y_pred, title, figsize=(10,10)):
                 annot[i, j] = ''
             else:
                 annot[i, j] = '%.1f%%\n%d' % (p, c)
-    cm = pd.DataFrame(cm, index=labels, columns=labels)
+    cm = pd.DataFrame(cm_perc, index=labels, columns=labels)
     cm.index.name = 'Actual'
     cm.columns.name = 'Predicted'
     fig, ax = plt.subplots(figsize=figsize)
-    ax=sns.heatmap(cm, annot=annot, fmt='', vmax=30)
+    ax = sns.heatmap(cm, annot=annot, fmt="", vmax=100)
     ax.set_title(title)
 
 # %% [markdown]
@@ -590,13 +590,13 @@ def cm_analysis(y_true, y_pred, title, figsize=(10,10)):
 # %% [markdown] tags=["solution"]
 # **3.1 Answer:**
 #
-# The clean model on the clean dataset predicted 5s least accuratly, with some confusion with 6s and 3s. These are likely confused by the model as handwritten 5s may look like 6s (almost closed bottom part) or 3s (presence of 3 horizontal segments).
+# The clean model on the clean dataset predicted 5s least accurately, with some confusion with 6s and 3s. These are likely confused by the model as handwritten 5s may look like 6s (almost closed bottom part) or 3s (presence of 3 horizontal segments).
 
 # %% [markdown] tags=["solution"]
 # **3.1 Answer from 2023 Students**
 #
 # 5 is the least accurately predicted digit. It is most confused with 6 or 3.
-# Handwriting creates fives that look like sixes or threes. 
+# Handwriting creates fives that look like sixes or threes.
 
 # %% [markdown]
 # <div class="alert alert-info"><h4>
@@ -630,7 +630,7 @@ def cm_analysis(y_true, y_pred, title, figsize=(10,10)):
 #
 # Local corruption vs Global corruption: Global corruption WINS (aka is harder)!
 #
-# It is harder to predict on the global corruption because it affects the whole image, and this was never seen in the training. 
+# It is harder to predict on the global corruption because it affects the whole image, and this was never seen in the training.
 # It adds (structured) noise over the entire four.
 
 # %% [markdown]
@@ -642,18 +642,18 @@ def cm_analysis(y_true, y_pred, title, figsize=(10,10)):
 # %% [markdown] tags=["solution"]
 # **3.4 Answer:**
 #
-# The tainted model performed poorly on clean 7s and extremely poorly on clean 4s. Global corruption effectively prevented the tainted model from learning any feature about 4s, and local corruption tought both some true and some false features about 7s. Ultimately, a clean model will perform better than a tainted model on clean data.
+# The tainted model performed poorly on clean 7s and extremely poorly on clean 4s. Global corruption effectively prevented the tainted model from learning any feature about 4s, and local corruption used both some true and some false features about 7s. Ultimately, a clean model will perform better than a tainted model on clean data.
 
 # %% [markdown] tags=["solution"]
 # **3.4 Answer from 2023 Students:**
 #
 # Clean 7s vs clean 4s: 4 WINS! (aka is worse)
 #
-# Global corruptions are more detrimental when testing on the clean data. This is because the training images are *more* different from each other. 
+# Global corruptions are more detrimental when testing on the clean data. This is because the training images are *more* different from each other.
 #
-# Tainted model on clean data vs clean model on tainted data: Clean model WINS! (is better on tainted data than tained model on clean data) 
+# Tainted model on clean data vs clean model on tainted data: Clean model WINS! (is better on tainted data than tainted model on clean data)
 #
-# The clean model still has useful signal to work with in the tainted data. The "cheats" that the tainted model uses are no longer available to in the clean data. 
+# The clean model still has useful signal to work with in the tainted data. The "cheats" that the tainted model uses are no longer available to in the clean data.
 
 # %% [markdown]
 # <div class="alert alert-success"><h3>
@@ -720,11 +720,11 @@ def visualize_integrated_gradients(test_input, model, plot_title):
 
     # Transpose integrated gradients output
     attr_ig = np.transpose(attr_ig[0].cpu().detach().numpy(), (1, 2, 0))
-    
+
     # Transpose and normalize original image:
     original_image = np.transpose((test_input[0].detach().numpy() * 0.5) + 0.5, (1, 2, 0))
 
-     # This visualises the attribution of labels to pixels
+    # This visualises the attribution of labels to pixels
     figure, axis = plt.subplots(nrows=1, ncols=2, figsize=(4, 2.5), width_ratios=[1, 1])
     viz.visualize_image_attr(attr_ig, 
                              original_image, 
@@ -747,7 +747,7 @@ def visualize_integrated_gradients(test_input, model, plot_title):
 
 
 # %% [markdown]
-# To start examining the results, we will call the `visualize_integrated_gradients` with the tainted and clean models on the tainted and clean sevens. 
+# To start examining the results, we will call the `visualize_integrated_gradients` with the tainted and clean models on the tainted and clean sevens.
 #
 # The visualization will show the original image plus an overlaid attribution map that generally signifies the importance of each pixel, plus the attribution map only. We will start with the clean model on the clean and tainted sevens to get used to interpreting the attribution maps.
 #
@@ -758,7 +758,7 @@ def visualize_integrated_gradients(test_input, model, plot_title):
 
 # %% [markdown]
 # <div class="alert alert-info"><h4>
-#     Task 4.1: Interpereting the Clean Model's Attention on 7s</h4>
+#     Task 4.1: Interpreting the Clean Model's Attention on 7s</h4>
 # Where did the <b>clean</b> model focus its attention for the clean and tainted 7s? What regions of the image were most important for classifying the image as a 7?
 # </div>
 
@@ -782,7 +782,7 @@ def visualize_integrated_gradients(test_input, model, plot_title):
 
 # %% [markdown]
 # <div class="alert alert-info"><h4>
-#     Task 4.2: Interpereting the Tainted Model's Attention on 7s</h4>
+#     Task 4.2: Interpreting the Tainted Model's Attention on 7s</h4>
 # Where did the <b>tainted</b> model focus its attention for the clean and tainted 7s? How was this different than the clean model? Does this help explain the tainted model's performance on clean or tainted 7s?
 # </div>
 
@@ -811,7 +811,7 @@ def visualize_integrated_gradients(test_input, model, plot_title):
 
 # %% [markdown]
 # <div class="alert alert-info"><h4>
-#     Task 4.3: Interpereting the focus on 4s</h4>
+#     Task 4.3: Interpreting the focus on 4s</h4>
 # Where did the <b>tainted</b> model focus its attention for the tainted and clean 4s? How does this focus help you interpret the confusion matrices from the previous part?
 # </div>
 
@@ -837,20 +837,20 @@ def visualize_integrated_gradients(test_input, model, plot_title):
 # %% [markdown] tags=["solution"]
 # **4.4 Answer:**
 #
-# The integrated gradient was more useful identifying the contribution of local corruption. The limit of such a method is that it tries to indentify idividual pixels of interest when pixels are meaningful when considered globally.
+# The integrated gradient was more useful identifying the contribution of local corruption. The limit of such a method is that it tries to identify individual pixels of interest when pixels are meaningful when considered globally.
 
 # %% [markdown] tags=["solution"]
 # **4.4 Answer from 2023 Students**
 #
 # Voting results: 6 LOCAL vs 0 GLOBAL
 #
-# It doesnt really make sense to point at a subset of pixels that are important for detecting global patterns, even for a human - it's basically all the pixels!
+# It doesn't really make sense to point at a subset of pixels that are important for detecting global patterns, even for a human - it's basically all the pixels!
 
 # %% [markdown]
 # <div class="alert alert-block alert-success"><h3>
 #     Checkpoint 4</h3>
 #     <ol>
-#         Congrats on finishing the intergrated gradients task! Let us know on Element that you reached checkpoint 4, and feel free to look at other interpretability methods in the Captum library if you're interested.
+#         Congrats on finishing the integrated gradients task! Let us know on the course chat that you reached checkpoint 4, and feel free to look at other interpretability methods in the Captum library if you're interested.
 #     </ol>
 # </div>
 
@@ -911,7 +911,7 @@ def show(index):
 # %% [markdown]
 # ### UNet model
 #
-# Let's try denoising with a UNet, "CARE-style". As UNets and denoising implementations are not the focus of this exercise, we provide the model for you in the following cell. 
+# Let's try denoising with a UNet, "CARE-style". As UNets and denoising implementations are not the focus of this exercise, we provide the model for you in the following cell.
 
 # %% [markdown]
 # The training loop code is also provided here. It is similar to the code used to train the image classification model previously, but look it over to make sure there are no surprises.
@@ -920,40 +920,40 @@ def show(index):
 from tqdm import tqdm
 
 def train_denoising_model(train_loader, model, criterion, optimizer, history):
-    
+
     # Puts model in 'training' mode:
     model.train()
-    
+
     # Initialises progress bar:
     pbar = tqdm(total=len(train_loader.dataset)//batch_size_train)
     for batch_idx, (image, target) in enumerate(train_loader):
 
         # add line here during Task 2.2
-        
+
         # Zeroing gradients:
         optimizer.zero_grad()
-        
+
         # Moves image to GPU memory:
-        image = image.cuda()
-        
+        image = image.to(device)
+
         # Adds noise to make the noisy image:
         noisy = add_noise(image)
-        
+
         # Runs model on noisy image:
         output = model(noisy)
-        
+
         # Computes loss:
         loss = criterion(output, image)
-        
+
         # Backpropagates gradients:
         loss.backward()
-        
+
         # Optimises model parameters given the current gradients:
         optimizer.step()
-        
+
         # appends loss history:
         history["loss"].append(loss.item())
-        
+
         # updates progress bar:
         pbar.update(1)
     return history
@@ -1022,7 +1022,7 @@ def train_denoising_model(train_loader, model, criterion, optimizer, history):
 def apply_denoising(image, model):
     # add batch and channel dimensions
     image = torch.unsqueeze(torch.unsqueeze(image, 0), 0)
-    prediction = model(image.cuda())
+    prediction = model(image.to(device))
     # remove batch and channel dimensions before returning
     return prediction.detach().cpu()[0,0]
 
@@ -1068,7 +1068,7 @@ def visualize_denoising(model, dataset, index):
 # It does decently well, not perfect cause it's lots of noise
 
 # %% [markdown]
-# ### Apply trained model on 'wrong' data 
+# ### Apply trained model on 'wrong' data
 #
 # Apply the denoising model trained above to some example _noisy_ images derived from the Fashion-MNIST dataset.
 #
@@ -1114,7 +1114,7 @@ def visualize_denoising(model, dataset, index):
 # %% [markdown] tags=["solution"]
 # **5.2 Answer from 2023 Students:**
 #
-# BAD! Some of them kind of look like numbers. 
+# BAD! Some of them kind of look like numbers.
 
 # %% [markdown]
 # <div class="alert alert-info"><h4>
@@ -1179,14 +1179,17 @@ def visualize_denoising(model, dataset, index):
 # %% [markdown]
 # <div class="alert alert-info"><h4>
 #     Task 5.4: </h4>
-# How does the new denoiser perform compared to the one from the previous section?
+# How does the new denoiser perform compared to the one from the previous section? Why?
 # </div>
 
 # %% [markdown] tags=["solution"]
 # **5.4 Answer:**
 #
 # The new denoiser has been trained on both MNIST and FashionMNIST, and as a result, it no longer insist on reshaping objects from the FashionMNIST dataset into digits. However, it seems to be performing slightly worse on the original MNIST (some of the digits are hardly recognisable).
-#
+# If you look more closely at the code, you'll notice that we haven't shuffled the data in our `DataLoader`. This means that every epoch the model will first train on all of the MNIST data, then on all of the FashinMNIST.
+# The effect that we're seeing here, where it's performing worse of the MNIST data, points to an important lesson: Models Forget!
+# If the model is trained for too long without any MNISt examples, as it is here, it begins to overwrite what it has learned about that data.
+# %% [markdown]
 # ### Train the denoiser on both MNIST and FashionMNIST, shuffling the training data
 #
 # We previously performed the training sequentially on the MNIST data first then followed by the FashionMNIST data. Now, we ask for the training data to be shuffled and observe the impact on performance. (noe the `shuffle=True` in the lines below)
@@ -1246,7 +1249,7 @@ def visualize_denoising(model, dataset, index):
 # <div class="alert alert-block alert-success"><h3>
 #     Checkpoint 5</h3>
 #     <ol>
-#         Congrats on reaching the final checkpoint! Let us know on Element, and we'll discuss the questions once reaching critical mass.
+#         Congrats on reaching the final checkpoint! Let us know on the course chat, and we'll discuss the questions once reaching critical mass.
 #     </ol>
 # </div>