Skip to content

Commit 652ee5f

Browse files
committed
[init] docs
1 parent 8eda614 commit 652ee5f

File tree

2 files changed

+100
-0
lines changed

2 files changed

+100
-0
lines changed

docs/cli.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
The cli is mostly handled by hydra (https://hydra.cc/docs/intro/). The main commands are:
2+
3+
bie_train: Train a model
4+
bie_predict: Predict with a model
5+
6+
# Training
7+
8+
To train a model, you can use the following command:
9+
10+
```bash
11+
bie_train
12+
```
13+
14+
To see all the available options, you can use the `--help` flag:
15+
16+
```bash
17+
bie_train --help
18+
```
19+
20+
## Data
21+
22+
Out of the box bie_train is configured to try to use torchvision.datasets.ImageFolder to load data.
23+
This can be endlessly overwritte using Hydra's configuration system (e.g. _target_ ).
24+
However, for most applications using the stock ImageFolder class will work.
25+
To then point the model to useful data you need to set the 'receipe.data' key like so:
26+
27+
```bash
28+
bie_train recipe.data=/path/to/data
29+
```
30+
31+
ImageFolder will use PIL to load images, so you can use any image format that PIL supports, this includes jpg, png, bmp, etc, tif.
32+
33+
More exotic formats will require a custom dataset class, which is not covered here; realisitically you should convert your data to a more common format.
34+
PNG for instance is a lossless format that loads quickly from disk due to it's efficient compression.
35+
The bie_train defaults tend to be sane, for instance the data is shuffled, and the data is split into train and validation sets.
36+
37+
It is worth noting that ImageFolder expects the data to be organised into "classes" even though default bie_train does not use the class labels during training.
38+
To denote these classes, you should organise your data into folders, where each folder is a class, and the images in that folder are instances of that class.
39+
See here for more information: https://pytorch.org/vision/stable/datasets.html#imagefolder
40+
41+
## Models
42+
43+
The default model backbone a "resnet18" with a "vae" architecture for autoencoding, but you can specify a different model using the `receipe.model` flag:
44+
45+
```bash
46+
bie_train recipe.model=resnet50_vqvae receipe.data=/path/to/data
47+
```
48+
49+
N.B. the resnet series of models expect the tensor input to (3,224,224) in shape,
50+
51+
52+
### Supervised vs Unsupervised models
53+
54+
By default the model is unsupervised, meaning the class labels are ignored during training.
55+
However, a (experimental) supervised model can be selected by setting:
56+
57+
```bash
58+
bie_train lit_model.model=_target_="bioimage_embed.lightning.torch.AutoEncoderSupervised" receipe.data=/path/to/data
59+
```
60+
61+
This uses contrastive learning using the labelled data, specifically SimCLR: https://arxiv.org/abs/2002.05709
62+
63+
## Reciepes
64+
65+
The major components of the training process are controlled by the "reciepe" schema.
66+
These values are also what is used for generating the uuid of the training run.
67+
This means that the model can infact resume from a crash or be retrained with the same configuration aswell as multiple models being trained in parallel using the same directory.
68+
This is useful for hyperparameter search, or for training multiple models on the same data.
69+
70+
### lr_scheduler and optimizer
71+
72+
The lr_scheduler and optimizer are mimics of the timm library and built using create_optimizer and create_scheduler.
73+
https://timm.fast.ai/Optimizers
74+
and
75+
https://timm.fast.ai/schedulerss
76+
77+
The default optimizer is "adamw" and the default scheduler is "cosine", aswell as some other hyperparameters borrowed from: https://arxiv.org/abs/2110.00476
78+
79+
The way the timm create_* functions work is they receive a generic SimpleNamespace, and only take the keys they need.
80+
The consequence is that timm creates a controlled vocabulary for the hyperparameters in receipe; this makes it possible to choose from the wide variety of optimizers and schedulers in timm.
81+
https://timm.fast.ai
82+
83+
## Augmentation
84+
85+
The package includes a default augmentation, which is stored in the configruation file.
86+
The default augmentation is written using albumentations, which is a powerful library for image augmentation.
87+
https://albumentations.ai/docs/
88+
89+
90+
The default augmentation is a simple set of augmentations that are useful for biological_images, crucially it mostly neglects any RGB and non-physical augmentation effects.
91+
It is recommended to edit the default augmentations in the configuration file and not in the CLI as the commands can get quite long.
92+
93+
94+
## Config file
95+
96+
This will train a model using the default configuration. You can also specify a configuration file using the `--config` flag:
97+
98+
```bash
99+
bie_train --config path/to/config.yaml
100+
```

docs/library.md

Whitespace-only changes.

0 commit comments

Comments
 (0)