Skip to content

Commit 90a566a

Browse files
committed
Merge pull request #77 from ctr26/docs
[init] docs
2 parents 8eda614 + 76cd5f7 commit 90a566a

File tree

7 files changed

+178
-0
lines changed

7 files changed

+178
-0
lines changed

.readthedocs.yaml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Read the Docs configuration file for MkDocs projects
2+
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
3+
4+
# Required
5+
version: 2
6+
7+
# Set the version of Python and other tools you might need
8+
build:
9+
os: ubuntu-22.04
10+
tools:
11+
python: "3.12"
12+
13+
mkdocs:
14+
configuration: mkdocs.yml
15+
16+
# Optionally declare the Python requirements required to build your docs
17+
python:
18+
install:
19+
- requirements: docs/requirements.txt

docs/cli.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
The cli is mostly handled by hydra (https://hydra.cc/docs/intro/). The main commands are:
2+
3+
bie_train: Train a model
4+
bie_predict: Predict with a model
5+
6+
# Training
7+
8+
To train a model, you can use the following command:
9+
10+
```bash
11+
bie_train
12+
```
13+
14+
To see all the available options, you can use the `--help` flag:
15+
16+
```bash
17+
bie_train --help
18+
```
19+
20+
## Data
21+
22+
Out of the box bie_train is configured to try to use torchvision.datasets.ImageFolder to load data.
23+
This can be endlessly overwritte using Hydra's configuration system (e.g. _target_ ).
24+
However, for most applications using the stock ImageFolder class will work.
25+
To then point the model to useful data you need to set the 'receipe.data' key like so:
26+
27+
```bash
28+
bie_train recipe.data=/path/to/data
29+
```
30+
31+
ImageFolder will use PIL to load images, so you can use any image format that PIL supports, this includes jpg, png, bmp, etc, tif.
32+
33+
More exotic formats will require a custom dataset class, which is not covered here; realisitically you should convert your data to a more common format.
34+
PNG for instance is a lossless format that loads quickly from disk due to it's efficient compression.
35+
The bie_train defaults tend to be sane, for instance the data is shuffled, and the data is split into train and validation sets.
36+
37+
It is worth noting that ImageFolder expects the data to be organised into "classes" even though default bie_train does not use the class labels during training.
38+
To denote these classes, you should organise your data into folders, where each folder is a class, and the images in that folder are instances of that class.
39+
See here for more information: https://pytorch.org/vision/stable/datasets.html#imagefolder
40+
41+
## Models
42+
43+
The default model backbone a "resnet18" with a "vae" architecture for autoencoding, but you can specify a different model using the `receipe.model` flag:
44+
45+
```bash
46+
bie_train recipe.model=resnet50_vqvae receipe.data=/path/to/data
47+
```
48+
49+
N.B. the resnet series of models expect the tensor input to (3,224,224) in shape,
50+
51+
52+
### Supervised vs Unsupervised models
53+
54+
By default the model is unsupervised, meaning the class labels are ignored during training.
55+
However, a (experimental) supervised model can be selected by setting:
56+
57+
```bash
58+
bie_train lit_model.model=_target_="bioimage_embed.lightning.torch.AutoEncoderSupervised" receipe.data=/path/to/data
59+
```
60+
61+
This uses contrastive learning using the labelled data, specifically SimCLR: https://arxiv.org/abs/2002.05709
62+
63+
## Reciepes
64+
65+
The major components of the training process are controlled by the "reciepe" schema.
66+
These values are also what is used for generating the uuid of the training run.
67+
This means that the model can infact resume from a crash or be retrained with the same configuration aswell as multiple models being trained in parallel using the same directory.
68+
This is useful for hyperparameter search, or for training multiple models on the same data.
69+
70+
### lr_scheduler and optimizer
71+
72+
The lr_scheduler and optimizer are mimics of the timm library and built using create_optimizer and create_scheduler.
73+
https://timm.fast.ai/Optimizers
74+
and
75+
https://timm.fast.ai/schedulerss
76+
77+
The default optimizer is "adamw" and the default scheduler is "cosine", aswell as some other hyperparameters borrowed from: https://arxiv.org/abs/2110.00476
78+
79+
The way the timm create_* functions work is they receive a generic SimpleNamespace, and only take the keys they need.
80+
The consequence is that timm creates a controlled vocabulary for the hyperparameters in receipe; this makes it possible to choose from the wide variety of optimizers and schedulers in timm.
81+
https://timm.fast.ai
82+
83+
## Augmentation
84+
85+
The package includes a default augmentation, which is stored in the configruation file.
86+
The default augmentation is written using albumentations, which is a powerful library for image augmentation.
87+
https://albumentations.ai/docs/
88+
89+
90+
The default augmentation is a simple set of augmentations that are useful for biological_images, crucially it mostly neglects any RGB and non-physical augmentation effects.
91+
It is recommended to edit the default augmentations in the configuration file and not in the CLI as the commands can get quite long.
92+
93+
94+
## Config file
95+
96+
This will train a model using the default configuration. You can also specify a configuration file using the `--config` flag:
97+
98+
```bash
99+
bie_train --config path/to/config.yaml
100+
```

docs/conf.py

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Configuration file for the Sphinx documentation builder.
2+
#
3+
# This file only contains a selection of the most common options. For a full
4+
# list see the documentation:
5+
# https://www.sphinx-doc.org/en/master/usage/configuration.html
6+
7+
# -- Path setup --------------------------------------------------------------
8+
9+
# If extensions (or modules to document with autodoc) are in another directory,
10+
# add these directories to sys.path here. If the directory is relative to the
11+
# documentation root, use os.path.abspath to make it absolute, like shown here.
12+
#
13+
# import os
14+
# import sys
15+
# sys.path.insert(0, os.path.abspath('.'))
16+
17+
18+
# -- Project information -----------------------------------------------------
19+
20+
project = "Bioimage Embed"
21+
copyright = "2024, Craig Russell"
22+
author = "Craig Russell"
23+
24+
25+
# -- General configuration ---------------------------------------------------
26+
27+
# Add any Sphinx extension module names here, as strings. They can be
28+
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
29+
# ones.
30+
extensions = ["myst_parser"]
31+
32+
33+
# Add any paths that contain templates here, relative to this directory.
34+
templates_path = ["_templates"]
35+
36+
# List of patterns, relative to source directory, that match files and
37+
# directories to ignore when looking for source files.
38+
# This pattern also affects html_static_path and html_extra_path.
39+
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
40+
41+
42+
# -- Options for HTML output -------------------------------------------------
43+
44+
# The theme to use for HTML and HTML Help pages. See the documentation for
45+
# a list of builtin themes.
46+
#
47+
html_theme = "alabaster"
48+
49+
# Add any paths that contain custom static files (such as style sheets) here,
50+
# relative to this directory. They are copied after the builtin static files,
51+
# so a file named "default.css" will overwrite the builtin "default.css".
52+
html_static_path = ["_static"]

docs/index.md

Whitespace-only changes.

docs/library.md

Whitespace-only changes.

docs/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
myst-parser==4.0.0

mkdocs.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
site_name: "Bioimage Embed"
2+
site_url: ""
3+
nav:
4+
- 'cli.md'
5+
- 'library.md'
6+
theme: readthedocs

0 commit comments

Comments
 (0)