Merge branch 'main' into new_branch

SkafteNicki · Jan 14, 2025 · fc2468c · fc2468c
2 parents 8f0282d + e1da4cf
commit fc2468c
Show file tree

Hide file tree

Showing 28 changed files with 265 additions and 84 deletions.
diff --git a/.devcontainer/Dockerfile b/.devcontainer/Dockerfile
@@ -1,4 +1,4 @@
-FROM python:3.11-slim-buster
+FROM python:3.11-slim
 
 RUN apt update && \
     apt install --no-install-recommends -y build-essential gcc && \

diff --git a/.github/workflows/repo_scraper.yml b/.github/workflows/repo_scraper.yml
@@ -1,9 +1,9 @@
 name: Run repo scraper
 
 on:
-  #schedule:
-  #  - cron: "0 0 * * *"  # Run at the end of every day
-  workflow_dispatch: {}  # manual executions
+  schedule:
+    - cron: "0 */6 * * *"  # Runs every 6 hours
+  workflow_dispatch: {}  # Allows manual executions
 
 jobs:
   scrape:

diff --git a/README.md b/README.md
@@ -107,7 +107,7 @@ this. Finally, you may try to cut the cost of running your model in production,
 and trying to optimize some steps.
 
 The focus in this course is particularly on the **Operations** part of MLOps as this is what many data scientists are
-missing in their toolbox to take all the knowledge they have about data processing and model development into a
+missing in their toolbox to implement all the knowledge they have about data processing and model development into a
 production setting.
 
 ## ❔ Learning objectives

diff --git a/pages/timeplan.md b/pages/timeplan.md
@@ -20,8 +20,9 @@ be using in the exercises.
 
 Recordings (link to drive folder with mp4 files):
 
-* [🎥2023 Lectures](https://drive.google.com/drive/folders/1j56XyHoPLjoIEmrVcV_9S1FBkXWZBK0w?usp=sharing)
+* [🎥2025 Lectures](https://panopto.dtu.dk/Panopto/Pages/Sessions/List.aspx?folderID=14eeb1b7-5c39-4547-b7c3-b25d007cecd1)
 * [🎥2024 Lectures](https://drive.google.com/drive/folders/1mgLlvfXUT9xdg9EZusgeWAmfpUDSwfL6?usp=sharing)
+* [🎥2023 Lectures](https://drive.google.com/drive/folders/1j56XyHoPLjoIEmrVcV_9S1FBkXWZBK0w?usp=sharing)
 
 ## Week 1
 

diff --git a/reports/README.md b/reports/README.md
@@ -564,16 +564,17 @@ will check the repositories and the code to verify your answers.
 ### Question 31
 
 > **State the individual contributions of each team member. This is required information from DTU, because we need to**
-> **make sure all members contributed actively to the project**
+> **make sure all members contributed actively to the project. Additionally, state if/how you have used generative AI**
+> **tools in your project.**
 >
-> Recommended answer length: 50-200 words.
+> Recommended answer length: 50-300 words.
 >
 > Example:
 > *Student sXXXXXX was in charge of developing of setting up the initial cookie cutter project and developing of the*
 > *docker containers for training our applications.*
 > *Student sXXXXXX was in charge of training our models in the cloud and deploying them afterwards.*
 > *All members contributed to code by...*
->
+> *We have used ChatGPT to help debug our code. Additionally, we used GitHub Copilot to help write some of our code.*
 > Answer:
 
 --- question 31 fill here ---
diff --git a/reports/report.py b/reports/report.py
@@ -156,7 +156,7 @@ def check() -> None:
             ]
         ),
         "question_30": LengthConstraints(min_length=200, max_length=400),
-        "question_31": LengthConstraints(min_length=50, max_length=200),
+        "question_31": LengthConstraints(min_length=50, max_length=300),
     }
     if len(answers) != 31:
         msg = "Number of answers are different from the expected 31. Have you changed the template?"

diff --git a/requirements.txt b/requirements.txt
@@ -2,14 +2,14 @@
 mkdocs-material == 9.5.49
 mkdocs-glightbox == 0.4.0
 mkdocs-material-extensions == 1.3.1
-pymdown-extensions == 10.13
+pymdown-extensions == 10.14
 mkdocs-same-dir == 0.1.3
 mkdocs-git-revision-date-localized-plugin == 1.3.0
 mkdocs-exclude == 1.0.2
 markdown-exec[ansi] == 1.10.0
 
 # Developer stuff
-ruff == 0.8.6
+ruff == 0.9.1
 codespell == 2.3.0
 pre-commit == 4.0.1
 

diff --git a/s1_development_environment/deep_learning_software.md b/s1_development_environment/deep_learning_software.md
@@ -169,7 +169,7 @@ these two commands:
 
 ```bash
 pip install gdown
-gdown --folder https://drive.google.com/drive/folders/1ddWeCcsfmelqxF8sOGBihY9IU98S9JRP?usp=sharing
+gdown --folder 'https://drive.google.com/drive/folders/1ddWeCcsfmelqxF8sOGBihY9IU98S9JRP?usp=sharing'
 ```
 
 The data should be placed in a folder subfolder called `data/corruptedmnist` in the root of the project. Your overall
@@ -215,7 +215,7 @@ future as you start to add more and more features. As subgoals, please fulfill t
 
     ??? example "Starting point for `data.py`"
 
-        ```python linenums="1" title="model.py"
+        ```python linenums="1" title="data.py"
         --8<-- "s1_development_environment/exercise_files/final_exercise/data.py"
         ```
 
@@ -236,7 +236,7 @@ future as you start to add more and more features. As subgoals, please fulfill t
         We have additionally in the solution added functionality for plotting the images together with the labels for
         inspection. Remember: all good machine learning starts with a good understanding of the data.
 
-        ```python linenums="1" hl_lines="17 18" title="model.py"
+        ```python linenums="1" hl_lines="17 18" title="data.py"
         --8<-- "s1_development_environment/exercise_files/final_exercise/data_solution.py"
         ```
 
@@ -245,7 +245,7 @@ future as you start to add more and more features. As subgoals, please fulfill t
 
     ```bash
     python main.py train --lr 1e-4
-    python main.py evaluate trained_model.pt
+    python main.py evaluate model.pth
     ```
 
     which can be implemented in various ways. We provide you with a starting script that uses the `typer` library to
@@ -270,8 +270,8 @@ future as you start to add more and more features. As subgoals, please fulfill t
             "version": "0.2.0",
             "configurations": [
                 {
-                    "name": "Python: Current File",
-                    "type": "python",
+                    "name": "Train",
+                    "type": "debugpy",
                     "request": "launch",
                     "program": "${file}",
                     "args": [

diff --git a/s1_development_environment/exercise_files/fc_model.py b/s1_development_environment/exercise_files/fc_model.py
@@ -8,7 +8,7 @@ class Network(nn.Module):
     Arguments:
         input_size: integer, size of the input layer
         output_size: integer, size of the output layer
-        hidden_layers: list of integers, the sizes of the hidden layers
+        hidden_layers: list of integers (one for each hidden layer), the sizes of the hidden layers
 
     """
 

diff --git a/s1_development_environment/exercise_files/final_exercise/main_solution.py b/s1_development_environment/exercise_files/final_exercise/main_solution.py
@@ -1,8 +1,8 @@
 import matplotlib.pyplot as plt
 import torch
 import typer
-from data import corrupt_mnist
-from model import MyAwesomeModel
+from data_solution import corrupt_mnist
+from model_solution import MyAwesomeModel
 
 DEVICE = torch.device("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")
 

diff --git a/s2_organisation_and_version_control/README.md b/s2_organisation_and_version_control/README.md
@@ -6,7 +6,7 @@
 
 - ![](../figures/icons/git.png){align=right : style="height:100px;width:100px"}
 
-    Learn the basics of version control and how to use `git` to track changes to your code and collaborate with others.
+    Learn the basics of version control and how to use `git` to track changes in your code and collaborate with others.
 
     [:octicons-arrow-right-24: M5: Git](git.md)
 
@@ -38,7 +38,7 @@
 
 Today we take our first steps into the world of MLOps. The set of modules in this session focuses on getting organized
 and making sure that you are familiar with good development practices. While many of the practices you will learn about
-in these modules do not seem that important when you are a single person working on a project, it is crucial when
+in these modules do not seem that important when you are a single person working on a project, it becomes crucial when
 working in large groups that the difference in how different people organize and write their code is minimized.
 The topics in this session will focus on:
 

diff --git a/s2_organisation_and_version_control/cli.md b/s2_organisation_and_version_control/cli.md
@@ -160,7 +160,7 @@ for doing this, and of other excellent frameworks for creating command line inte
             app()
         ```
 
-3. Next, lets try on a bit harder example. Below is a simple script that trains a support vector machine on the iris
+3. Next, let's try on a bit harder example. Below is a simple script that trains a support vector machine on the iris
     dataset.
 
     !!! example "iris_classifier.py"
@@ -172,8 +172,8 @@ for doing this, and of other excellent frameworks for creating command line inte
     Implement a CLI for the script such that the following commands can be run
 
     ```bash
-    python iris_classifier.py train --output 'model.ckpt'  # should train the model and save it to 'model.ckpt'
-    python iris_classifier.py train -o 'model.ckpt'  # should be the same as above
+    python iris_classifier.py --output 'model.ckpt'  # should train the model and save it to 'model.ckpt'
+    python iris_classifier.py -o 'model.ckpt'        # should be the same as above
     ```
 
     ??? success "Solution"
@@ -186,7 +186,7 @@ for doing this, and of other excellent frameworks for creating command line inte
         --8<-- "s2_organisation_and_version_control/exercise_files/typer_exercise_solution.py"
         ```
 
-4. Next lets create a CLI that has more than a single command. Continue working in the basic machine learning
+4. Next let's create a CLI that has more than a single command. Continue working in the basic machine learning
     application from the previous exercise, but this time we want to define two separate commands
 
     ```bash
@@ -205,7 +205,7 @@ for doing this, and of other excellent frameworks for creating command line inte
 
 5. Finally, let's try to define subcommands for our subcommands e.g. something similar to how `git` has the subcommand
     `remote` which in itself has multiple subcommands like `add`, `rename` etc. Continue on the simple machine
-    learning application from the previous exercises, but this time define a cli such that
+    learning application from the previous exercises, but this time define a CLI such that
 
     ```bash
     python iris_classifier.py train svm --kernel 'linear'
@@ -222,7 +222,7 @@ for doing this, and of other excellent frameworks for creating command line inte
         --8<-- "s2_organisation_and_version_control/exercise_files/typer_exercise_solution3.py"
         ```
 
-6. (Optional) Let's try to combine what we have learned until now. Try to make your `typer` cli into a executable
+6. (Optional) Let's try to combine what we have learned until now. Try to make your `typer` CLI into an executable
     script using the `pyproject.toml` file and try it out!
 
     ??? success "Solution"
@@ -269,13 +269,13 @@ to interact with. Here is a example of long command that you might need to run i
 docker run -v $(pwd):/app -w /app --gpus all --rm -it my_image:latest python my_script.py --arg1 val1 --arg2 val2
 ```
 
-This can be a lot to remember, and it can be easy to make mistakes. Instead it would be nice if we could just do
+This can be a lot to remember, and it can be easy to make mistakes. Instead, it would be nice if we could just do
 
 ```bash
 run my_command --arg1=val1 --arg2=val2
 ```
 
-e.g. easier to remember because we have remove a lot of the hard-to-remember stuff, but we are still able to configure
+e.g. easier to remember because we have removed a lot of the hard-to-remember stuff, but we are still able to configure
 it to our liking. To help with this, we are going to look at the [invoke](http://www.pyinvoke.org/) package.
 `invoke` is a Python package that allows you to define tasks that can be
 run from the terminal. It is a bit like a more advanced version of the [Makefile](https://makefiletutorial.com/) that
@@ -324,7 +324,7 @@ easier.
     invoke python
     ```
 
-4. Lets try to create a task that simplifies the process of `git add`, `git commit`, `git push`. Create a task such
+4. Let's try to create a task that simplifies the process of `git add`, `git commit`, `git push`. Create a task such
     that the following command can be run
 
     ```bash

diff --git a/s2_organisation_and_version_control/code_structure.md b/s2_organisation_and_version_control/code_structure.md
@@ -263,7 +263,7 @@ your head around where files are located.
 
     ??? success "Solution"
 
-        ```python linenums="1" title="make_dataset.py"
+        ```python linenums="1" title="data.py"
         --8<-- "s2_organisation_and_version_control/exercise_files/data_solution.py"
         ```
 
@@ -273,9 +273,14 @@ your head around where files are located.
     project. It is similar to `Markefile`s if you are familiar with them. Try out some of the pre-defined tasks:
 
     ```bash
+    # first install invoke
+    pip install invoke
+    # then you can execute the tasks
     invoke preprocess-data  # runs the data.py file
     invoke requirements     # installs all requirements in the requirements.txt file
     invoke train            # runs the train.py file
+    # or get a list of all tasks
+    invoke --list
     ```
 
     In general, we recommend that you add commands to the `tasks.py` file as you move along in the course.
@@ -292,7 +297,7 @@ your head around where files are located.
 
         This is the CNN solution from yesterday and it may differ from the model architecture you have created.
 
-        ```python linenums="1" title="make_dataset.py"
+        ```python linenums="1" title="model.py"
         --8<-- "s2_organisation_and_version_control/exercise_files/model_solution.py"
         ```
 
@@ -304,7 +309,7 @@ your head around where files are located.
 
     ??? success "Solution"
 
-        ```python linenums="1" title="make_dataset.py"
+        ```python linenums="1" title="train.py"
         --8<-- "s2_organisation_and_version_control/exercise_files/train_solution.py"
         ```
 8. Transfer the remaining parts of the `main.py` script into the `src/<project-name>/evaluate.py` script e.g. the parts
@@ -313,7 +318,7 @@ your head around where files are located.
 
     ??? success "Solution"
 
-        ```python linenums="1" title="make_dataset.py"
+        ```python linenums="1" title="evaluate.py"
         --8<-- "s2_organisation_and_version_control/exercise_files/evaluate_solution.py"
         ```
 

diff --git a/s2_organisation_and_version_control/dvc.md b/s2_organisation_and_version_control/dvc.md
@@ -182,7 +182,7 @@ it contains excellent tutorials.
 
     ```bash
     pip install gdown
-    gdown --folder https://drive.google.com/drive/folders/1JTjbom7IrB41Chx6uxLCN16ZwIxHHVw1?usp=sharing
+    gdown --folder 'https://drive.google.com/drive/folders/1JTjbom7IrB41Chx6uxLCN16ZwIxHHVw1?usp=sharing'
     ```
 
     Copy the data to your `data/raw` folder and then rerun your data pipeline to incorporate the new data into the

diff --git a/s2_organisation_and_version_control/exercise_files/visualize_solution.py b/s2_organisation_and_version_control/exercise_files/visualize_solution.py
@@ -8,7 +8,8 @@
 
 def visualize(model_checkpoint: str, figure_name: str = "embeddings.png") -> None:
     """Visualize model predictions."""
-    model = MyAwesomeModel().load_state_dict(torch.load(model_checkpoint))
+    model: torch.nn.Module = MyAwesomeModel()
+    model.load_state_dict(torch.load(model_checkpoint))
     model.eval()
     model.fc = torch.nn.Identity()
 

diff --git a/s3_reproducibility/config_files.md b/s3_reproducibility/config_files.md
@@ -197,7 +197,7 @@ look online for your answers before looking at the solution. Remember: its not a
     |--my_app.py
     ```
 
-12. Finally, a awesome feature of hydra is the
+12. Finally, an awesome feature of hydra is the
     [instantiate](https://hydra.cc/docs/advanced/instantiate_objects/overview/) feature. This allows you to define a
     configuration file that can be used to directly instantiating objects in python. Try to create a configuration file
     that can be used to instantiating the `Adam` optimizer in the `vae_mnist.py` script.
@@ -223,7 +223,10 @@ look online for your answers before looking at the solution. Remember: its not a
 
         @hydra.main(config_name="adam.yaml")
         def main(cfg):
-            optimizer = hydra.utils.instantiate(cfg.optimizer)
+            model = ...  # define the model we want to optimize
+            # the first argument of any optimize is the parameters to optimize
+            # we add those dynamically when we instantiate the optimizer
+            optimizer = hydra.utils.instantiate(cfg.optimizer, params=model.parameters())
             print(optimizer)
 
         if __name__ == "__main__":