Skip to content

Commit

Permalink
Merge branch 'main' into new_branch
Browse files Browse the repository at this point in the history
  • Loading branch information
SkafteNicki authored Jan 14, 2025
2 parents 8f0282d + e1da4cf commit fc2468c
Show file tree
Hide file tree
Showing 28 changed files with 265 additions and 84 deletions.
2 changes: 1 addition & 1 deletion .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM python:3.11-slim-buster
FROM python:3.11-slim

RUN apt update && \
apt install --no-install-recommends -y build-essential gcc && \
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/repo_scraper.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
name: Run repo scraper

on:
#schedule:
# - cron: "0 0 * * *" # Run at the end of every day
workflow_dispatch: {} # manual executions
schedule:
- cron: "0 */6 * * *" # Runs every 6 hours
workflow_dispatch: {} # Allows manual executions

jobs:
scrape:
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ this. Finally, you may try to cut the cost of running your model in production,
and trying to optimize some steps.

The focus in this course is particularly on the **Operations** part of MLOps as this is what many data scientists are
missing in their toolbox to take all the knowledge they have about data processing and model development into a
missing in their toolbox to implement all the knowledge they have about data processing and model development into a
production setting.

## ❔ Learning objectives
Expand Down
3 changes: 2 additions & 1 deletion pages/timeplan.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,9 @@ be using in the exercises.

Recordings (link to drive folder with mp4 files):

* [🎥2023 Lectures](https://drive.google.com/drive/folders/1j56XyHoPLjoIEmrVcV_9S1FBkXWZBK0w?usp=sharing)
* [🎥2025 Lectures](https://panopto.dtu.dk/Panopto/Pages/Sessions/List.aspx?folderID=14eeb1b7-5c39-4547-b7c3-b25d007cecd1)
* [🎥2024 Lectures](https://drive.google.com/drive/folders/1mgLlvfXUT9xdg9EZusgeWAmfpUDSwfL6?usp=sharing)
* [🎥2023 Lectures](https://drive.google.com/drive/folders/1j56XyHoPLjoIEmrVcV_9S1FBkXWZBK0w?usp=sharing)

## Week 1

Expand Down
7 changes: 4 additions & 3 deletions reports/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -564,16 +564,17 @@ will check the repositories and the code to verify your answers.
### Question 31

> **State the individual contributions of each team member. This is required information from DTU, because we need to**
> **make sure all members contributed actively to the project**
> **make sure all members contributed actively to the project. Additionally, state if/how you have used generative AI**
> **tools in your project.**
>
> Recommended answer length: 50-200 words.
> Recommended answer length: 50-300 words.
>
> Example:
> *Student sXXXXXX was in charge of developing of setting up the initial cookie cutter project and developing of the*
> *docker containers for training our applications.*
> *Student sXXXXXX was in charge of training our models in the cloud and deploying them afterwards.*
> *All members contributed to code by...*
>
> *We have used ChatGPT to help debug our code. Additionally, we used GitHub Copilot to help write some of our code.*
> Answer:
--- question 31 fill here ---
2 changes: 1 addition & 1 deletion reports/report.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ def check() -> None:
]
),
"question_30": LengthConstraints(min_length=200, max_length=400),
"question_31": LengthConstraints(min_length=50, max_length=200),
"question_31": LengthConstraints(min_length=50, max_length=300),
}
if len(answers) != 31:
msg = "Number of answers are different from the expected 31. Have you changed the template?"
Expand Down
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@
mkdocs-material == 9.5.49
mkdocs-glightbox == 0.4.0
mkdocs-material-extensions == 1.3.1
pymdown-extensions == 10.13
pymdown-extensions == 10.14
mkdocs-same-dir == 0.1.3
mkdocs-git-revision-date-localized-plugin == 1.3.0
mkdocs-exclude == 1.0.2
markdown-exec[ansi] == 1.10.0

# Developer stuff
ruff == 0.8.6
ruff == 0.9.1
codespell == 2.3.0
pre-commit == 4.0.1

Expand Down
12 changes: 6 additions & 6 deletions s1_development_environment/deep_learning_software.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ these two commands:

```bash
pip install gdown
gdown --folder https://drive.google.com/drive/folders/1ddWeCcsfmelqxF8sOGBihY9IU98S9JRP?usp=sharing
gdown --folder 'https://drive.google.com/drive/folders/1ddWeCcsfmelqxF8sOGBihY9IU98S9JRP?usp=sharing'
```

The data should be placed in a folder subfolder called `data/corruptedmnist` in the root of the project. Your overall
Expand Down Expand Up @@ -215,7 +215,7 @@ future as you start to add more and more features. As subgoals, please fulfill t

??? example "Starting point for `data.py`"

```python linenums="1" title="model.py"
```python linenums="1" title="data.py"
--8<-- "s1_development_environment/exercise_files/final_exercise/data.py"
```

Expand All @@ -236,7 +236,7 @@ future as you start to add more and more features. As subgoals, please fulfill t
We have additionally in the solution added functionality for plotting the images together with the labels for
inspection. Remember: all good machine learning starts with a good understanding of the data.

```python linenums="1" hl_lines="17 18" title="model.py"
```python linenums="1" hl_lines="17 18" title="data.py"
--8<-- "s1_development_environment/exercise_files/final_exercise/data_solution.py"
```

Expand All @@ -245,7 +245,7 @@ future as you start to add more and more features. As subgoals, please fulfill t

```bash
python main.py train --lr 1e-4
python main.py evaluate trained_model.pt
python main.py evaluate model.pth
```

which can be implemented in various ways. We provide you with a starting script that uses the `typer` library to
Expand All @@ -270,8 +270,8 @@ future as you start to add more and more features. As subgoals, please fulfill t
"version": "0.2.0",
"configurations": [
{
"name": "Python: Current File",
"type": "python",
"name": "Train",
"type": "debugpy",
"request": "launch",
"program": "${file}",
"args": [
Expand Down
2 changes: 1 addition & 1 deletion s1_development_environment/exercise_files/fc_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ class Network(nn.Module):
Arguments:
input_size: integer, size of the input layer
output_size: integer, size of the output layer
hidden_layers: list of integers, the sizes of the hidden layers
hidden_layers: list of integers (one for each hidden layer), the sizes of the hidden layers
"""

Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import matplotlib.pyplot as plt
import torch
import typer
from data import corrupt_mnist
from model import MyAwesomeModel
from data_solution import corrupt_mnist
from model_solution import MyAwesomeModel

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")

Expand Down
4 changes: 2 additions & 2 deletions s2_organisation_and_version_control/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

- ![](../figures/icons/git.png){align=right : style="height:100px;width:100px"}

Learn the basics of version control and how to use `git` to track changes to your code and collaborate with others.
Learn the basics of version control and how to use `git` to track changes in your code and collaborate with others.

[:octicons-arrow-right-24: M5: Git](git.md)

Expand Down Expand Up @@ -38,7 +38,7 @@

Today we take our first steps into the world of MLOps. The set of modules in this session focuses on getting organized
and making sure that you are familiar with good development practices. While many of the practices you will learn about
in these modules do not seem that important when you are a single person working on a project, it is crucial when
in these modules do not seem that important when you are a single person working on a project, it becomes crucial when
working in large groups that the difference in how different people organize and write their code is minimized.
The topics in this session will focus on:

Expand Down
18 changes: 9 additions & 9 deletions s2_organisation_and_version_control/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ for doing this, and of other excellent frameworks for creating command line inte
app()
```
3. Next, lets try on a bit harder example. Below is a simple script that trains a support vector machine on the iris
3. Next, let's try on a bit harder example. Below is a simple script that trains a support vector machine on the iris
dataset.

!!! example "iris_classifier.py"
Expand All @@ -172,8 +172,8 @@ for doing this, and of other excellent frameworks for creating command line inte
Implement a CLI for the script such that the following commands can be run

```bash
python iris_classifier.py train --output 'model.ckpt' # should train the model and save it to 'model.ckpt'
python iris_classifier.py train -o 'model.ckpt' # should be the same as above
python iris_classifier.py --output 'model.ckpt' # should train the model and save it to 'model.ckpt'
python iris_classifier.py -o 'model.ckpt' # should be the same as above
```

??? success "Solution"
Expand All @@ -186,7 +186,7 @@ for doing this, and of other excellent frameworks for creating command line inte
--8<-- "s2_organisation_and_version_control/exercise_files/typer_exercise_solution.py"
```

4. Next lets create a CLI that has more than a single command. Continue working in the basic machine learning
4. Next let's create a CLI that has more than a single command. Continue working in the basic machine learning
application from the previous exercise, but this time we want to define two separate commands
```bash
Expand All @@ -205,7 +205,7 @@ for doing this, and of other excellent frameworks for creating command line inte
5. Finally, let's try to define subcommands for our subcommands e.g. something similar to how `git` has the subcommand
`remote` which in itself has multiple subcommands like `add`, `rename` etc. Continue on the simple machine
learning application from the previous exercises, but this time define a cli such that
learning application from the previous exercises, but this time define a CLI such that

```bash
python iris_classifier.py train svm --kernel 'linear'
Expand All @@ -222,7 +222,7 @@ for doing this, and of other excellent frameworks for creating command line inte
--8<-- "s2_organisation_and_version_control/exercise_files/typer_exercise_solution3.py"
```

6. (Optional) Let's try to combine what we have learned until now. Try to make your `typer` cli into a executable
6. (Optional) Let's try to combine what we have learned until now. Try to make your `typer` CLI into an executable
script using the `pyproject.toml` file and try it out!
??? success "Solution"
Expand Down Expand Up @@ -269,13 +269,13 @@ to interact with. Here is a example of long command that you might need to run i
docker run -v $(pwd):/app -w /app --gpus all --rm -it my_image:latest python my_script.py --arg1 val1 --arg2 val2
```
This can be a lot to remember, and it can be easy to make mistakes. Instead it would be nice if we could just do
This can be a lot to remember, and it can be easy to make mistakes. Instead, it would be nice if we could just do
```bash
run my_command --arg1=val1 --arg2=val2
```
e.g. easier to remember because we have remove a lot of the hard-to-remember stuff, but we are still able to configure
e.g. easier to remember because we have removed a lot of the hard-to-remember stuff, but we are still able to configure
it to our liking. To help with this, we are going to look at the [invoke](http://www.pyinvoke.org/) package.
`invoke` is a Python package that allows you to define tasks that can be
run from the terminal. It is a bit like a more advanced version of the [Makefile](https://makefiletutorial.com/) that
Expand Down Expand Up @@ -324,7 +324,7 @@ easier.
invoke python
```
4. Lets try to create a task that simplifies the process of `git add`, `git commit`, `git push`. Create a task such
4. Let's try to create a task that simplifies the process of `git add`, `git commit`, `git push`. Create a task such
that the following command can be run
```bash
Expand Down
13 changes: 9 additions & 4 deletions s2_organisation_and_version_control/code_structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ your head around where files are located.

??? success "Solution"

```python linenums="1" title="make_dataset.py"
```python linenums="1" title="data.py"
--8<-- "s2_organisation_and_version_control/exercise_files/data_solution.py"
```

Expand All @@ -273,9 +273,14 @@ your head around where files are located.
project. It is similar to `Markefile`s if you are familiar with them. Try out some of the pre-defined tasks:
```bash
# first install invoke
pip install invoke
# then you can execute the tasks
invoke preprocess-data # runs the data.py file
invoke requirements # installs all requirements in the requirements.txt file
invoke train # runs the train.py file
# or get a list of all tasks
invoke --list
```
In general, we recommend that you add commands to the `tasks.py` file as you move along in the course.
Expand All @@ -292,7 +297,7 @@ your head around where files are located.
This is the CNN solution from yesterday and it may differ from the model architecture you have created.
```python linenums="1" title="make_dataset.py"
```python linenums="1" title="model.py"
--8<-- "s2_organisation_and_version_control/exercise_files/model_solution.py"
```
Expand All @@ -304,7 +309,7 @@ your head around where files are located.
??? success "Solution"
```python linenums="1" title="make_dataset.py"
```python linenums="1" title="train.py"
--8<-- "s2_organisation_and_version_control/exercise_files/train_solution.py"
```
8. Transfer the remaining parts of the `main.py` script into the `src/<project-name>/evaluate.py` script e.g. the parts
Expand All @@ -313,7 +318,7 @@ your head around where files are located.
??? success "Solution"
```python linenums="1" title="make_dataset.py"
```python linenums="1" title="evaluate.py"
--8<-- "s2_organisation_and_version_control/exercise_files/evaluate_solution.py"
```
Expand Down
2 changes: 1 addition & 1 deletion s2_organisation_and_version_control/dvc.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ it contains excellent tutorials.
```bash
pip install gdown
gdown --folder https://drive.google.com/drive/folders/1JTjbom7IrB41Chx6uxLCN16ZwIxHHVw1?usp=sharing
gdown --folder 'https://drive.google.com/drive/folders/1JTjbom7IrB41Chx6uxLCN16ZwIxHHVw1?usp=sharing'
```
Copy the data to your `data/raw` folder and then rerun your data pipeline to incorporate the new data into the
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@

def visualize(model_checkpoint: str, figure_name: str = "embeddings.png") -> None:
"""Visualize model predictions."""
model = MyAwesomeModel().load_state_dict(torch.load(model_checkpoint))
model: torch.nn.Module = MyAwesomeModel()
model.load_state_dict(torch.load(model_checkpoint))
model.eval()
model.fc = torch.nn.Identity()

Expand Down
7 changes: 5 additions & 2 deletions s3_reproducibility/config_files.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ look online for your answers before looking at the solution. Remember: its not a
|--my_app.py
```

12. Finally, a awesome feature of hydra is the
12. Finally, an awesome feature of hydra is the
[instantiate](https://hydra.cc/docs/advanced/instantiate_objects/overview/) feature. This allows you to define a
configuration file that can be used to directly instantiating objects in python. Try to create a configuration file
that can be used to instantiating the `Adam` optimizer in the `vae_mnist.py` script.
Expand All @@ -223,7 +223,10 @@ look online for your answers before looking at the solution. Remember: its not a
@hydra.main(config_name="adam.yaml")
def main(cfg):
optimizer = hydra.utils.instantiate(cfg.optimizer)
model = ... # define the model we want to optimize
# the first argument of any optimize is the parameters to optimize
# we add those dynamically when we instantiate the optimizer
optimizer = hydra.utils.instantiate(cfg.optimizer, params=model.parameters())
print(optimizer)
if __name__ == "__main__":
Expand Down
Loading

0 comments on commit fc2468c

Please sign in to comment.