Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation #51

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
version: 2

build:
os: ubuntu-20.04
tools:
python: "3.9"

mkdocs:
configuration: mkdocs.yml

python:
install:
- requirements: docs/requirements.txt
38 changes: 26 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
# Med-Imagetools: Transparent and Reproducible Medical Image Processing Pipelines in Python

[![main-ci](https://github.com/bhklab/med-imagetools/actions/workflows/main-ci.yml/badge.svg)](https://github.com/bhklab/med-imagetools/actions/workflows/main-ci.yml)
![GitHub repo size](https://img.shields.io/github/repo-size/bhklab/med-imagetools)
![GitHub contributors](https://img.shields.io/github/contributors/bhklab/med-imagetools)
![GitHub stars](https://img.shields.io/github/stars/bhklab/med-imagetools?style=social)
![GitHub forks](https://img.shields.io/github/forks/bhklab/med-imagetools?style=social)

### Latest Updates (v0.4.4) - July 27th, 2022
## Latest Updates (v0.4.4) - July 27th, 2022

New features include:

* AutoPipeline CLI
* `nnunet` nnU-Net compatibility mode
* Built-in train/test split for both normal/nnU-Net modes
Expand All @@ -19,42 +22,49 @@ New features include:
Med-Imagetools, a python package offers the perfect tool to transform messy medical dataset folders to deep learning ready format in few lines of code. It not only processes DICOMs consisting of different modalities (like CT, PET, RTDOSE and RTSTRUCTS), it also transforms them into deep learning ready subject based format taking the dependencies of these modalities into consideration.

## Introduction

A medical dataset, typically contains multiple different types of scans for a single patient in a single study. As seen in the figure below, the different scans containing DICOM of different modalities are interdependent on each other. For making effective machine learning models, one ought to take different modalities into account.

<img src="https://github.com/bhklab/med-imagetools/blob/master/images/graph.png" align="center" width="480" ><figcaption>Fig.1 - Different network topology for different studies of different patients</figcaption></a>
<img src="./docs/images/graph.png" align="center" width="480" ><figcaption>Fig.1 - Different network topology for different studies of different patients</figcaption></a>

Med-Imagetools is a unique tool, which focuses on subject based Machine learning. It crawls the dataset and makes a network by connecting different modalities present in the dataset. Based on the user defined modalities, med-imagetools, queries the graph and process the queried raw DICOMS. The processed DICOMS are saved as nrrds, which med-imagetools converts to torchio subject dataset and eventually torch dataloader for ML pipeline.

<img src="https://github.com/bhklab/med-imagetools/blob/master/images/autopipeline.png" align="center" width="500"><figcaption>Fig.2 - Med-Imagetools AutoPipeline diagram</figcaption></a>
<img src="./docs/images/autopipeline.png" align="center" width="500"><figcaption>Fig.2 - Med-Imagetools AutoPipeline diagram</figcaption></a>

## Installing med-imagetools

```
```sh
pip install med-imagetools
```

### (recommended) Create new conda virtual environment
```

``` sh
conda create -n mit
conda activate mit
pip install med-imagetools
```

### (optional) Install in development mode

```
``` sh
conda create -n mit
conda activate mit
pip install -e git+https://github.com/bhklab/med-imagetools.git
```

This will install the package in editable mode, so that the installed package will update when the code is changed.

## Getting Started

Med-Imagetools takes two step approch to turn messy medical raw dataset to ML ready dataset.

1. ***Autopipeline***: Crawls the raw dataset, forms a network and performs graph query, based on the user defined modalities. The relevant DICOMS, get processed and saved as nrrds
```

``` sh
autopipeline\
[INPUT DIRECTORY] \
[OUTPUT DIRECTORY] \
[INPUT_DIRECTORY] \
[OUTPUT_DIRECTORY] \
--modalities [str: CT,RTSTRUCT,PT] \
--spacing [Tuple: (int,int,int)]\
--n_jobs [int]\
Expand All @@ -66,8 +76,10 @@ Med-Imagetools takes two step approch to turn messy medical raw dataset to ML re
--continue_processing [flag]\
--dry_run [flag]
```

2. ***class Dataset***: This class converts processed nrrds to torchio subjects, which can be easily converted to torch dataset
```

```py
from imgtools.io import Dataset

subjects = Dataset.load_from_nrrd(output_directory, ignore_multi=True)
Expand All @@ -76,12 +88,14 @@ Med-Imagetools takes two step approch to turn messy medical raw dataset to ML re
```

## Demo (Outdated as of v0.4)

These google collab notebooks will introduce the main functionalities of med-imagetools. More information can be found [here](https://github.com/bhklab/med-imagetools/blob/master/examples/README.md)
#### Tutorial 1: Forming Dataset with med-imagetools Autopipeline

### Tutorial 1: Forming Dataset with med-imagetools Autopipeline

[![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/skim2257/tcia_samples/blob/main/notebooks/Tutorial_1_Forming_Dataset_with_Med_Imagetools.ipynb)

#### Tutorial 2: Machine Learning with med-imagetools and torchio
### Tutorial 2: Machine Learning with med-imagetools and torchio

[![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/skim2257/tcia_samples/blob/main/notebooks/Tutorial_2_Machine_Learning_with_Med_Imagetools_and_torchio.ipynb)

Expand Down
253 changes: 253 additions & 0 deletions docs/AutoPipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
# AutoPipeline Usage

To use AutoPipeline, follow the installation instructions found at <https://github.com/bhklab/med-imagetools#installing-med-imagetools>.

## Intro to AutoPipeline

AutoPipeline will crawl and process any DICOM dataset. To run the most basic variation of the script, run the following command:

```
autopipeline INPUT_DIRECTORY OUTPUT_DIRECTORY --modalities MODALITY_LIST
```

Replace INPUT_DIRECTORY with the directory containing all your DICOM data, OUTPUT_DIRECTORY with the directory that you want the data to be outputted to.

The `--modalities` option allows you to only process certain modalities that are present in the DICOM data. The available modalities are:

1. CT
2. MR
3. RTSTRUCT
4. PT
5. RTDOSE

Set the modalities you want to use by separating each one with a comma. For example, to use CT and RTSTRUCT, run AutoPipeline with `--modalities CT,RTSTRUCT`

## AutoPipeline Flags
AutoPipeline comes with many built-in features to make your data processing easier:

1. **Spacing**

The spacing for the output image. default = (1., 1., 0.). 0. spacing means maintaining the image's spacing as-is. Spacing of (0., 0., 0.,) will not resample any image.

```sh
--spacing [Tuple: (int,int,int)]
```

2. **Parallel Job Execution**

The number of jobs to be run in parallel. Set -1 to use all cores. default = -1

```sh
--n_jobs [int]
```

3. **Dataset Graph Visualization (not recommended for large datasets)**

Whether to visualize the entire dataset using PyViz.

```sh
--visualize [flag]
```

4. **Continue Pipeline Processing**

Whether to continue the most recent run of AutoPipeline that terminated prematurely for that output directory. Will only work if the `.imgtools` directory was not deleted from previous run. Using this flag will retain the same flags and parameters carried over from the previous run.

```sh
--continue_processing [flag]
```

5. **Processing Dry Run**

Whether to execute a dry run, only generating the .imgtools folder, which includes the crawled index.

```sh
--dry_run [flag]
```

6. **Show Progress**

Whether to print AutoPipeline progress to the standard output.

```sh
--show_progress [flag]
```

7. **Warning on Subject Processing Errors**

Whether to warn instead of error when processing subjects

```sh
--warn_on_error [flag]
```

8. **Overwrite Existing Output Files**

Whether to overwrite existing file outputs

```sh
--overwrite [flag]
```

9. **Update existing crawled index**

Whether to update existing crawled index

```sh
--update [flag]
```



## Flags for parsing RTSTRUCT contours/regions of interest (ROI)
The contours can be selected by creating a YAML file to define a regular expression (regex), or list of potential contour names, or a combination of both. **If none of the flags are set or the YAML file does not exist, the AutoPipeline will default to processing every contour.**

1. **Defining YAML file path for contours**

Whether to read a YAML file that defines regex or string options for contour names for regions of interest (ROI). By default, it will look for and read from `INPUT_DIRECTORY/roi_names.yaml`

```sh
--read_yaml_label_names [flag]
```

Path to the above-mentioned YAML file. Path can be absolute or relative. default = "" (each ROI will have its own label index in dataset.json for nnUNet)

```sh
--roi_yaml_path [str]
```

2. **Defining contour selection behaviour**

A typical ROI YAML file may look like this:
```yaml
GTV: GTV*
LUNG:
- LUNG*
- LNUG
- POUMON*
NODES:
- IL1
- IIL2
- IIIL3
- IVL4
```

By default, **all ROIs** that match any of the regex or strings will be **saved as one label**. For example, GTVn, GTVp, GTVfoo will be saved as GTV. However, this is not always the desirable behaviour.

**Only select the first matching regex/string**

The StructureSet iterates through the regex and string in the order it is written in the YAML. When this flag is set, once any contour matches the regex or string, the ROI search is interrupted and moves to the next ROI. This may be useful if you have a priority order of potentially matching contour names.

```sh
--roi_select_first [flag]
```

If a patient has contours `[GTVp, LNUG, IL1, IVL4]`, with the above YAML file and `--roi_select_first` flag set, it will only process `[GTVp, LNUG, IL1]` contours as `[GTV, LUNG, NODES]`, respectively.

**Process each matching contour as a separate ROI**

Any matching contour will be saved separate with its contour name as a suffix to the ROI name. This will not apply to ROIs that only have one regex/string.

```sh
--roi_separate [flag]
```
If a patient had contours `[GTVp, LNUG, IL1, IVL4]`, with the above YAML file and `--roi_sepearate` flag set, it will process the contours as `[GTV, LUNG_LNUG, NODES_IL1, NODES_IVL4]`, respectively.

3. **Ignore patients with no contours**

Ignore patients with no contours that match any of the defined regex or strings instead of throwing error.

```sh
--ignore_missing_regex [flag]
```

## Additional nnUNet-specific flags

1. **Format Output for nnUNet Training**

Whether to format output for nnUNet training. Modalities must be CT,RTSTRUCT or MR,RTSTRUCT. `--modalities CT,RTSTRUCT` or `--modalities MR,RTSTRUCT`

```sh
--nnunet [flag]
```

```sh
OUTPUT_DIRECTORY
├── nnUNet_preprocessed
├── nnUNet_raw_data_base
│   └── nnUNet_raw_data
│   └── Task500_HNSCC
│   ├── imagesTr
│   ├── imagesTs
│   ├── labelsTr
│   └── labelsTs
└── nnUNet_trained_models
```

2. **Training Size**

Training size of the train-test-split. default = 1.0 (all data will be in imagesTr/labelsTr)

```sh
--train_size [float]
```

3. **Random State**

Random state for the train-test-split. Uses sklearn's train_test_split(). default = 42

```sh
--random_state [int]
```

4. **Custom Train-Test-Split YAML**

Whether to use a custom train-test-split. Must be in a file found at `INPUT_DIRECTORY/custom_train_test_split.yaml`. All subjects not defined in this file will be randomly split to fill the defined value for `--train_size` (default = 1.0). File must conform to:

```yaml
train:
- subject_1
- subject_2
...
test:
- subject_1
- subject_2
...
```

```sh
--custom_train_test_split [flag]
```

## Additional flags for nnUNet Inference

1. **Format Output for nnUNet Inference**

Whether to format output for nnUNet Inference.

```sh
--nnunet_inference [flag]
```

```sh
OUTPUT_DIRECTORY
├── 0_subject1_0000.nii.gz
└── ...
```

2. **Path to `dataset.json`**

The path to the `dataset.json` file for nnUNet inference.

```sh
--dataset_json_path [str]
```

A dataset json file may look like this:
```json
{
"modality":{
"0": "CT"
}
}
```
File renamed without changes
File renamed without changes
Loading