Merge pull request #77 from ctr26/docs

ctr26 · ctr26 · commit 90a566ac1cd8 · 2024-10-09T10:20:11.000+01:00
[init] docs
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -0,0 +1,19 @@
+# Read the Docs configuration file for MkDocs projects
+# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
+
+# Required
+version: 2
+
+# Set the version of Python and other tools you might need
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.12"
+
+mkdocs:
+  configuration: mkdocs.yml
+
+# Optionally declare the Python requirements required to build your docs
+python:
+  install:
+  - requirements: docs/requirements.txt
diff --git a/docs/cli.md b/docs/cli.md
@@ -0,0 +1,100 @@
+The cli is mostly handled by hydra (https://hydra.cc/docs/intro/). The main commands are:
+
+bie_train: Train a model
+bie_predict: Predict with a model
+
+# Training
+
+To train a model, you can use the following command:
+
+```bash
+bie_train
+```
+
+To see all the available options, you can use the `--help` flag:
+
+```bash
+bie_train --help
+```
+
+## Data
+
+Out of the box bie_train is configured to try to use torchvision.datasets.ImageFolder to load data.
+This can be endlessly overwritte using Hydra's configuration system (e.g. _target_ ).
+However, for most applications using the stock ImageFolder class will work.
+To then point the model to useful data you need to set the 'receipe.data' key like so:
+
+```bash
+bie_train recipe.data=/path/to/data
+```
+
+ImageFolder will use PIL to load images, so you can use any image format that PIL supports, this includes jpg, png, bmp, etc, tif.
+
+More exotic formats will require a custom dataset class, which is not covered here; realisitically you should convert your data to a more common format.
+PNG for instance is a lossless format that loads quickly from disk due to it's efficient compression.
+The bie_train defaults tend to be sane, for instance the data is shuffled, and the data is split into train and validation sets.
+
+It is worth noting that ImageFolder expects the data to be organised into "classes" even though default bie_train does not use the class labels during training.
+To denote these classes, you should organise your data into folders, where each folder is a class, and the images in that folder are instances of that class.
+See here for more information: https://pytorch.org/vision/stable/datasets.html#imagefolder
+
+## Models
+
+The default model backbone a "resnet18" with a "vae" architecture for autoencoding, but you can specify a different model using the `receipe.model` flag:
+
+```bash
+bie_train recipe.model=resnet50_vqvae receipe.data=/path/to/data
+```
+
+N.B. the resnet series of models expect the tensor input to (3,224,224) in shape,
+
+
+### Supervised vs Unsupervised models
+
+By default the model is unsupervised, meaning the class labels are ignored during training.
+However, a (experimental) supervised model can be selected by setting:
+
+```bash
+bie_train lit_model.model=_target_="bioimage_embed.lightning.torch.AutoEncoderSupervised" receipe.data=/path/to/data
+```
+
+This uses contrastive learning using the labelled data, specifically SimCLR: https://arxiv.org/abs/2002.05709
+
+## Reciepes
+
+The major components of the training process are controlled by the "reciepe" schema.
+These values are also what is used for generating the uuid of the training run.
+This means that the model can infact resume from a crash or be retrained with the same configuration aswell as multiple models being trained in parallel using the same directory.
+This is useful for hyperparameter search, or for training multiple models on the same data.
+
+### lr_scheduler and optimizer
+
+The lr_scheduler and optimizer are mimics of the timm library and built using create_optimizer and create_scheduler.
+https://timm.fast.ai/Optimizers
+and
+https://timm.fast.ai/schedulerss
+
+The default optimizer is "adamw" and the default scheduler is "cosine", aswell as some other hyperparameters borrowed from: https://arxiv.org/abs/2110.00476
+
+The way the timm create_* functions work is they receive a generic SimpleNamespace, and only take the keys they need.
+The consequence is that timm creates a controlled vocabulary for the hyperparameters in receipe; this makes it possible to choose from the wide variety of optimizers and schedulers in timm.
+https://timm.fast.ai
+
+## Augmentation
+
+The package includes a default augmentation, which is stored in the configruation file.
+The default augmentation is written using albumentations, which is a powerful library for image augmentation.
+https://albumentations.ai/docs/
+
+
+The default augmentation is a simple set of augmentations that are useful for biological_images, crucially it mostly neglects any RGB and non-physical augmentation effects.
+It is recommended to edit the default augmentations in the configuration file and not in the CLI as the commands can get quite long.
+
+
+## Config file
+
+This will train a model using the default configuration. You can also specify a configuration file using the `--config` flag:
+
+```bash
+bie_train --config path/to/config.yaml
+```
diff --git a/docs/conf.py b/docs/conf.py
@@ -0,0 +1,52 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+# sys.path.insert(0, os.path.abspath('.'))
+
+
+# -- Project information -----------------------------------------------------
+
+project = "Bioimage Embed"
+copyright = "2024, Craig Russell"
+author = "Craig Russell"
+
+
+# -- General configuration ---------------------------------------------------
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = ["myst_parser"]
+
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ["_templates"]
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
+
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = "alabaster"
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ["_static"]
diff --git a/docs/index.md b/docs/index.md
diff --git a/docs/library.md b/docs/library.md
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -0,0 +1 @@
+myst-parser==4.0.0
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -0,0 +1,6 @@
+site_name: "Bioimage Embed"
+site_url: ""
+nav:
+  - 'cli.md'
+  - 'library.md'
+theme: readthedocs