diff --git a/.readthedocs.yaml b/.readthedocs.yaml new file mode 100644 index 0000000..467541c --- /dev/null +++ b/.readthedocs.yaml @@ -0,0 +1,13 @@ +version: 2 + +build: + os: ubuntu-20.04 + tools: + python: "3.9" + +mkdocs: + configuration: mkdocs.yml + +python: + install: + - requirements: docs/requirements.txt \ No newline at end of file diff --git a/README.md b/README.md index c9c0872..29c02dd 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,15 @@ # Med-Imagetools: Transparent and Reproducible Medical Image Processing Pipelines in Python + [![main-ci](https://github.com/bhklab/med-imagetools/actions/workflows/main-ci.yml/badge.svg)](https://github.com/bhklab/med-imagetools/actions/workflows/main-ci.yml) ![GitHub repo size](https://img.shields.io/github/repo-size/bhklab/med-imagetools) ![GitHub contributors](https://img.shields.io/github/contributors/bhklab/med-imagetools) ![GitHub stars](https://img.shields.io/github/stars/bhklab/med-imagetools?style=social) ![GitHub forks](https://img.shields.io/github/forks/bhklab/med-imagetools?style=social) -### Latest Updates (v0.4.4) - July 27th, 2022 +## Latest Updates (v0.4.4) - July 27th, 2022 + New features include: + * AutoPipeline CLI * `nnunet` nnU-Net compatibility mode * Built-in train/test split for both normal/nnU-Net modes @@ -19,21 +22,24 @@ New features include: Med-Imagetools, a python package offers the perfect tool to transform messy medical dataset folders to deep learning ready format in few lines of code. It not only processes DICOMs consisting of different modalities (like CT, PET, RTDOSE and RTSTRUCTS), it also transforms them into deep learning ready subject based format taking the dependencies of these modalities into consideration. ## Introduction + A medical dataset, typically contains multiple different types of scans for a single patient in a single study. As seen in the figure below, the different scans containing DICOM of different modalities are interdependent on each other. For making effective machine learning models, one ought to take different modalities into account. -
Fig.1 - Different network topology for different studies of different patients
+
Fig.1 - Different network topology for different studies of different patients
Med-Imagetools is a unique tool, which focuses on subject based Machine learning. It crawls the dataset and makes a network by connecting different modalities present in the dataset. Based on the user defined modalities, med-imagetools, queries the graph and process the queried raw DICOMS. The processed DICOMS are saved as nrrds, which med-imagetools converts to torchio subject dataset and eventually torch dataloader for ML pipeline. -
Fig.2 - Med-Imagetools AutoPipeline diagram
+
Fig.2 - Med-Imagetools AutoPipeline diagram
## Installing med-imagetools -``` +```sh pip install med-imagetools ``` + ### (recommended) Create new conda virtual environment -``` + +``` sh conda create -n mit conda activate mit pip install med-imagetools @@ -41,20 +47,24 @@ pip install med-imagetools ### (optional) Install in development mode -``` +``` sh conda create -n mit conda activate mit pip install -e git+https://github.com/bhklab/med-imagetools.git ``` + This will install the package in editable mode, so that the installed package will update when the code is changed. ## Getting Started + Med-Imagetools takes two step approch to turn messy medical raw dataset to ML ready dataset. + 1. ***Autopipeline***: Crawls the raw dataset, forms a network and performs graph query, based on the user defined modalities. The relevant DICOMS, get processed and saved as nrrds - ``` + + ``` sh autopipeline\ - [INPUT DIRECTORY] \ - [OUTPUT DIRECTORY] \ + [INPUT_DIRECTORY] \ + [OUTPUT_DIRECTORY] \ --modalities [str: CT,RTSTRUCT,PT] \ --spacing [Tuple: (int,int,int)]\ --n_jobs [int]\ @@ -66,8 +76,10 @@ Med-Imagetools takes two step approch to turn messy medical raw dataset to ML re --continue_processing [flag]\ --dry_run [flag] ``` + 2. ***class Dataset***: This class converts processed nrrds to torchio subjects, which can be easily converted to torch dataset - ``` + + ```py from imgtools.io import Dataset subjects = Dataset.load_from_nrrd(output_directory, ignore_multi=True) @@ -76,12 +88,14 @@ Med-Imagetools takes two step approch to turn messy medical raw dataset to ML re ``` ## Demo (Outdated as of v0.4) + These google collab notebooks will introduce the main functionalities of med-imagetools. More information can be found [here](https://github.com/bhklab/med-imagetools/blob/master/examples/README.md) -#### Tutorial 1: Forming Dataset with med-imagetools Autopipeline + +### Tutorial 1: Forming Dataset with med-imagetools Autopipeline [![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/skim2257/tcia_samples/blob/main/notebooks/Tutorial_1_Forming_Dataset_with_Med_Imagetools.ipynb) -#### Tutorial 2: Machine Learning with med-imagetools and torchio +### Tutorial 2: Machine Learning with med-imagetools and torchio [![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/skim2257/tcia_samples/blob/main/notebooks/Tutorial_2_Machine_Learning_with_Med_Imagetools_and_torchio.ipynb) diff --git a/docs/AutoPipeline.md b/docs/AutoPipeline.md new file mode 100644 index 0000000..d321fc7 --- /dev/null +++ b/docs/AutoPipeline.md @@ -0,0 +1,253 @@ +# AutoPipeline Usage + +To use AutoPipeline, follow the installation instructions found at . + +## Intro to AutoPipeline + +AutoPipeline will crawl and process any DICOM dataset. To run the most basic variation of the script, run the following command: + +``` +autopipeline INPUT_DIRECTORY OUTPUT_DIRECTORY --modalities MODALITY_LIST +``` + +Replace INPUT_DIRECTORY with the directory containing all your DICOM data, OUTPUT_DIRECTORY with the directory that you want the data to be outputted to. + +The `--modalities` option allows you to only process certain modalities that are present in the DICOM data. The available modalities are: + +1. CT +2. MR +3. RTSTRUCT +4. PT +5. RTDOSE + +Set the modalities you want to use by separating each one with a comma. For example, to use CT and RTSTRUCT, run AutoPipeline with `--modalities CT,RTSTRUCT` + +## AutoPipeline Flags +AutoPipeline comes with many built-in features to make your data processing easier: + +1. **Spacing** + + The spacing for the output image. default = (1., 1., 0.). 0. spacing means maintaining the image's spacing as-is. Spacing of (0., 0., 0.,) will not resample any image. + + ```sh + --spacing [Tuple: (int,int,int)] + ``` + +2. **Parallel Job Execution** + + The number of jobs to be run in parallel. Set -1 to use all cores. default = -1 + + ```sh + --n_jobs [int] + ``` + +3. **Dataset Graph Visualization (not recommended for large datasets)** + + Whether to visualize the entire dataset using PyViz. + + ```sh + --visualize [flag] + ``` + +4. **Continue Pipeline Processing** + + Whether to continue the most recent run of AutoPipeline that terminated prematurely for that output directory. Will only work if the `.imgtools` directory was not deleted from previous run. Using this flag will retain the same flags and parameters carried over from the previous run. + + ```sh + --continue_processing [flag] + ``` + +5. **Processing Dry Run** + + Whether to execute a dry run, only generating the .imgtools folder, which includes the crawled index. + + ```sh + --dry_run [flag] + ``` + +6. **Show Progress** + + Whether to print AutoPipeline progress to the standard output. + + ```sh + --show_progress [flag] + ``` + +7. **Warning on Subject Processing Errors** + + Whether to warn instead of error when processing subjects + + ```sh + --warn_on_error [flag] + ``` + +8. **Overwrite Existing Output Files** + + Whether to overwrite existing file outputs + + ```sh + --overwrite [flag] + ``` + +9. **Update existing crawled index** + + Whether to update existing crawled index + + ```sh + --update [flag] + ``` + + + +## Flags for parsing RTSTRUCT contours/regions of interest (ROI) +The contours can be selected by creating a YAML file to define a regular expression (regex), or list of potential contour names, or a combination of both. **If none of the flags are set or the YAML file does not exist, the AutoPipeline will default to processing every contour.** + +1. **Defining YAML file path for contours** + + Whether to read a YAML file that defines regex or string options for contour names for regions of interest (ROI). By default, it will look for and read from `INPUT_DIRECTORY/roi_names.yaml` + + ```sh + --read_yaml_label_names [flag] + ``` + + Path to the above-mentioned YAML file. Path can be absolute or relative. default = "" (each ROI will have its own label index in dataset.json for nnUNet) + + ```sh + --roi_yaml_path [str] + ``` + +2. **Defining contour selection behaviour** + + A typical ROI YAML file may look like this: + ```yaml + GTV: GTV* + LUNG: + - LUNG* + - LNUG + - POUMON* + NODES: + - IL1 + - IIL2 + - IIIL3 + - IVL4 + ``` + + By default, **all ROIs** that match any of the regex or strings will be **saved as one label**. For example, GTVn, GTVp, GTVfoo will be saved as GTV. However, this is not always the desirable behaviour. + + **Only select the first matching regex/string** + + The StructureSet iterates through the regex and string in the order it is written in the YAML. When this flag is set, once any contour matches the regex or string, the ROI search is interrupted and moves to the next ROI. This may be useful if you have a priority order of potentially matching contour names. + + ```sh + --roi_select_first [flag] + ``` + + If a patient has contours `[GTVp, LNUG, IL1, IVL4]`, with the above YAML file and `--roi_select_first` flag set, it will only process `[GTVp, LNUG, IL1]` contours as `[GTV, LUNG, NODES]`, respectively. + + **Process each matching contour as a separate ROI** + + Any matching contour will be saved separate with its contour name as a suffix to the ROI name. This will not apply to ROIs that only have one regex/string. + + ```sh + --roi_separate [flag] + ``` + If a patient had contours `[GTVp, LNUG, IL1, IVL4]`, with the above YAML file and `--roi_sepearate` flag set, it will process the contours as `[GTV, LUNG_LNUG, NODES_IL1, NODES_IVL4]`, respectively. + +3. **Ignore patients with no contours** + + Ignore patients with no contours that match any of the defined regex or strings instead of throwing error. + + ```sh + --ignore_missing_regex [flag] + ``` + +## Additional nnUNet-specific flags + +1. **Format Output for nnUNet Training** + + Whether to format output for nnUNet training. Modalities must be CT,RTSTRUCT or MR,RTSTRUCT. `--modalities CT,RTSTRUCT` or `--modalities MR,RTSTRUCT` + + ```sh + --nnunet [flag] + ``` + + ```sh + OUTPUT_DIRECTORY + ├── nnUNet_preprocessed + ├── nnUNet_raw_data_base + │   └── nnUNet_raw_data + │   └── Task500_HNSCC + │   ├── imagesTr + │   ├── imagesTs + │   ├── labelsTr + │   └── labelsTs + └── nnUNet_trained_models + ``` + +2. **Training Size** + + Training size of the train-test-split. default = 1.0 (all data will be in imagesTr/labelsTr) + + ```sh + --train_size [float] + ``` + +3. **Random State** + + Random state for the train-test-split. Uses sklearn's train_test_split(). default = 42 + + ```sh + --random_state [int] + ``` + +4. **Custom Train-Test-Split YAML** + + Whether to use a custom train-test-split. Must be in a file found at `INPUT_DIRECTORY/custom_train_test_split.yaml`. All subjects not defined in this file will be randomly split to fill the defined value for `--train_size` (default = 1.0). File must conform to: + + ```yaml + train: + - subject_1 + - subject_2 + ... + test: + - subject_1 + - subject_2 + ... + ``` + + ```sh + --custom_train_test_split [flag] + ``` + +## Additional flags for nnUNet Inference + +1. **Format Output for nnUNet Inference** + + Whether to format output for nnUNet Inference. + + ```sh + --nnunet_inference [flag] + ``` + + ```sh + OUTPUT_DIRECTORY + ├── 0_subject1_0000.nii.gz + └── ... + ``` + +2. **Path to `dataset.json`** + + The path to the `dataset.json` file for nnUNet inference. + + ```sh + --dataset_json_path [str] + ``` + + A dataset json file may look like this: + ```json + { + "modality":{ + "0": "CT" + } + } + ``` \ No newline at end of file diff --git a/images/autopipeline.png b/docs/images/autopipeline.png similarity index 100% rename from images/autopipeline.png rename to docs/images/autopipeline.png diff --git a/images/graph.png b/docs/images/graph.png similarity index 100% rename from images/graph.png rename to docs/images/graph.png diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..a06b218 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,107 @@ +# Med-Imagetools: Transparent and Reproducible Medical Image Processing Pipelines in Python + +[![main-ci](https://github.com/bhklab/med-imagetools/actions/workflows/main-ci.yml/badge.svg)](https://github.com/bhklab/med-imagetools/actions/workflows/main-ci.yml) +![GitHub repo size](https://img.shields.io/github/repo-size/bhklab/med-imagetools) +![GitHub contributors](https://img.shields.io/github/contributors/bhklab/med-imagetools) +![GitHub stars](https://img.shields.io/github/stars/bhklab/med-imagetools?style=social) +![GitHub forks](https://img.shields.io/github/forks/bhklab/med-imagetools?style=social) + +## Latest Updates (v0.4.4) - July 27th, 2022 + +New features include: + +* AutoPipeline CLI +* `nnunet` nnU-Net compatibility mode +* Built-in train/test split for both normal/nnU-Net modes +* `random_state` for reproducible seeds +* Region of interest (ROI) yaml dictionary intake for RTSTRUCT processing +* Markdown report output post-processing +* `continue_processing` flag to continue autopipeline +* `dry_run` flag to only crawl the dataset + +Med-Imagetools, a python package offers the perfect tool to transform messy medical dataset folders to deep learning ready format in few lines of code. It not only processes DICOMs consisting of different modalities (like CT, PET, RTDOSE and RTSTRUCTS), it also transforms them into deep learning ready subject based format taking the dependencies of these modalities into consideration. + +## Introduction + +A medical dataset, typically contains multiple different types of scans for a single patient in a single study. As seen in the figure below, the different scans containing DICOM of different modalities are interdependent on each other. For making effective machine learning models, one ought to take different modalities into account. + +
Fig.1 - Different network topology for different studies of different patients
+ +Med-Imagetools is a unique tool, which focuses on subject based Machine learning. It crawls the dataset and makes a network by connecting different modalities present in the dataset. Based on the user defined modalities, med-imagetools, queries the graph and process the queried raw DICOMS. The processed DICOMS are saved as nrrds, which med-imagetools converts to torchio subject dataset and eventually torch dataloader for ML pipeline. + +
Fig.2 - Med-Imagetools AutoPipeline diagram
+ +## Installing med-imagetools + +```sh +pip install med-imagetools +``` + +### (recommended) Create new conda virtual environment + +``` sh +conda create -n mit +conda activate mit +pip install med-imagetools +``` + +### (optional) Install in development mode + +``` sh +conda create -n mit +conda activate mit +pip install -e git+https://github.com/bhklab/med-imagetools.git +``` + +This will install the package in editable mode, so that the installed package will update when the code is changed. + +## Getting Started + +Med-Imagetools takes two step approch to turn messy medical raw dataset to ML ready dataset. + +1. ***Autopipeline***: Crawls the raw dataset, forms a network and performs graph query, based on the user defined modalities. The relevant DICOMS, get processed and saved as nrrds + + ``` sh + autopipeline\ + [INPUT_DIRECTORY] \ + [OUTPUT_DIRECTORY] \ + --modalities [str: CT,RTSTRUCT,PT] \ + --spacing [Tuple: (int,int,int)]\ + --n_jobs [int]\ + --visualize [flag]\ + --nnunet [flag]\ + --train_size [float]\ + --random_state [int]\ + --roi_yaml_path [str]\ + --continue_processing [flag]\ + --dry_run [flag] + ``` + +2. ***class Dataset***: This class converts processed nrrds to torchio subjects, which can be easily converted to torch dataset + + ```py + from imgtools.io import Dataset + + subjects = Dataset.load_from_nrrd(output_directory, ignore_multi=True) + data_set = tio.SubjectsDataset(subjects) + data_loader = torch.utils.data.DataLoader(data_set, batch_size=4, shuffle=True, num_workers=4) + ``` + + +### Contributors + +Thanks to the following people who have contributed to this project: + +* [@mkazmier](https://github.com/mkazmier) +* [@skim2257](https://github.com/skim2257) +* [@fishingguy456](https://github.com/fishingguy456) +* [@Vishwesh4](https://github.com/Vishwesh4) +* [@mnakano](https://github.com/mnakano) + +## Contact + +If you have any questions/concerns, you can reach the main development team at sejin.kim@uhnresearch.ca or open an issue on our [GitHub repository](https://github.com/bhklab/med-imagetools) + +## License + +This project uses the following license: [Apache License 2.0](http://www.apache.org/licenses/) diff --git a/docs/nnUNet.md b/docs/nnUNet.md new file mode 100644 index 0000000..f46cf55 --- /dev/null +++ b/docs/nnUNet.md @@ -0,0 +1,116 @@ +# Preparing Data for nnUNet + +nnUNet repo can be found at: + +## Processing DICOM Data with Med-ImageTools + +Ensure that you have followed the steps in before proceeding. + +To convert your data from DICOM to NIfTI for training an nnUNet auto-segmentation model, run the following command: + +```sh +autopipeline\ + [INPUT_DIRECTORY] \ + [OUTPUT_DIRECTORY] \ + --modalities CT,RTSTRUCT \ + --nnunet +``` + +Modalities can also be set to `--modalities MR,RTSTRUCT` + +AutoPipeline offers many more options and features for you to customize your outputs: . + +## nnUNet Preprocess and Train + +### One-Step Preprocess and Train + +Med-ImageTools generates a file in your output folder called `nnunet_preprocess_and_train.sh` that combines all the commands needed for preprocessing and training your nnUNet model. Run that shell script to get a fully trained nnUNet model. + +Alternatively, you can go through each step individually as follows below: + +### nnUNet Preprocessing + +Follow the instructions for setting up your paths for nnUNet: + +Med-ImageTools generates the dataset.json that nnUNet requires in the output directory that you specify. + +The generated output directory structure will look something like: + +```sh +OUTPUT_DIRECTORY +├── nnUNet_preprocessed +├── nnUNet_raw_data_base +│   └── nnUNet_raw_data +│   └── Task500_HNSCC +│   ├── nnunet_preprocess_and_train.sh +│   └── ... +└── nnUNet_trained_models +``` + +nnUNet requires that environment variables be set before any commands are executed. To temporarily set them, run the following: + +```sh +export nnUNet_raw_data_base="/OUTPUT_DIRECTORY/nnUNet_raw_data_base" +export nnUNet_preprocessed="/OUTPUT_DIRECTORY/nnUNet_preprocessed" +export RESULTS_FOLDER="/OUTPUT_DIRECTORY/nnUNet_trained_models" +``` + +To permanently set these environment variables, make sure that in your `~/.bashrc` file, these environment variables are set for nnUNet. The `nnUNet_preprocessed` and `nnUNet_trained_models` folders are generated as empty folders for you by Med-ImageTools. `nnUNet_raw_data_base` is populated with the required raw data files. Add this to the file: + +```sh +export nnUNet_raw_data_base="/OUTPUT_DIRECTORY/nnUNet_raw_data_base" +export nnUNet_preprocessed="/OUTPUT_DIRECTORY/nnUNet_preprocessed" +export RESULTS_FOLDER="/OUTPUT_DIRECTORY/nnUNet_trained_models" +``` + +Then, execute the command: + +```sh +source ~/.bashrc +``` + +Too allow nnUNet to preprocess your data for training, run the following command. Set XXX to the ID that you want to preprocess. This is your task ID. For example, for Task500_HNSCC, the task ID is 500. Task IDs must be between 500 and 999, so Med-ImageTools can run 500 instances with the `--nnunet` flag in a single output folder. + +```sh +nnUNet_plan_and_preprocess -t XXX --verify_dataset_integrity +``` + +### nnUNet Training + +Once nnUNet has finished preprocessing, you may begin training your nnUNet model. To train your model, run the following command. Learn more about nnUNet's options here: + +```sh +nnUNet_train CONFIGURATION TRAINER_CLASS_NAME TASK_NAME_OR_ID FOLD +``` + +## nnUNet Inference + +For inference data, nnUNet requires data to be in a different output format. To run AutoPipeline for nnUNet inference, run the following command: + +```sh +autopipeline\ + [INPUT_DIRECTORY] \ + [OUTPUT_DIRECTORY] \ + --modalities CT \ + --nnunet_inference \ + --dataset_json_path [DATASET_JSON_PATH] +``` +To execute this command AutoPipeline needs a json file with the image modality definitions. + +Modalities can also be set to `--modalities MR`. + +The directory structue will look like: + +```sh +OUTPUT_DIRECTORY +├── 0_subject1_0000.nii.gz +└── ... +``` + +To run inference, run the command: + +```sh +nnUNet_predict -i INPUT_FOLDER -o OUTPUT_FOLDER -t TASK_NAME_OR_ID -m CONFIGURATION +``` + +In this case, the `INPUT_FOLDER` of nnUNet is the `OUTPUT_DIRECTORY` of Med-ImageTools. diff --git a/docs/requirements.txt b/docs/requirements.txt new file mode 100644 index 0000000..ed776c6 --- /dev/null +++ b/docs/requirements.txt @@ -0,0 +1,20 @@ +h5py +joblib +matplotlib +numpy +matplotlib +pandas +pydicom +pynrrd +scikit-image +SimpleITK +tqdm +torch +torchio +scikit-learn +pyyaml +dill +attr +jinja2==3.0.3 +mkdocs +mkdocs-material \ No newline at end of file diff --git a/imgtools/autopipeline.py b/imgtools/autopipeline.py index 3e7b79e..0448edc 100644 --- a/imgtools/autopipeline.py +++ b/imgtools/autopipeline.py @@ -4,7 +4,7 @@ import glob import pickle import struct -from attr import has +# from attr import has from matplotlib.style import available import numpy as np import sys @@ -145,7 +145,7 @@ def __init__(self, roi_yaml_path = "" custom_train_test_split = False is_nnunet = False - if modalities != "CT" or modalities != "MR": + if modalities != "CT" and modalities != "MR": raise ValueError("nnUNet inference can only be run on image files. Please set modalities to 'CT' or 'MR'") if is_nnunet: self.base_output_directory = self.output_directory @@ -621,20 +621,20 @@ def save_data(self): tuple(self.nnunet_info["modalities"].keys()), {v:k for k, v in self.existing_roi_names.items()}, os.path.split(self.input_directory)[1]) - _, child = os.path.split(self.output_directory) - shell_path = pathlib.Path(self.output_directory, child.split("_")[1]+".sh").as_posix() + # _, child = os.path.split(self.output_directory) + shell_path = pathlib.Path(self.output_directory, "nnunet_preprocess_and_train.sh").as_posix() if os.path.exists(shell_path): os.remove(shell_path) with open(shell_path, "w", newline="\n") as f: output = "#!/bin/bash\n" - output += "set -e" + output += "set -e\n\n" output += f'export nnUNet_raw_data_base="{self.base_output_directory}/nnUNet_raw_data_base"\n' output += f'export nnUNet_preprocessed="{self.base_output_directory}/nnUNet_preprocessed"\n' output += f'export RESULTS_FOLDER="{self.base_output_directory}/nnUNet_trained_models"\n\n' output += f'nnUNet_plan_and_preprocess -t {self.task_id} --verify_dataset_integrity\n\n' output += 'for (( i=0; i<5; i++ ))\n' output += 'do\n' - output += f' nnUNet_train 3d_fullres nnUNetTrainerV2 {os.path.split(self.output_directory)[1]} $i --npz\n' + output += f' nnUNet_train 3d_fullres nnUNetTrainerV2 {os.path.split(self.output_directory)[1]} $i\n' output += 'done' f.write(output) markdown_report_images(self.output_directory, self.total_modality_counter) #images saved to the output directory @@ -648,7 +648,7 @@ def save_data(self): formatted_list = "\n\t".join(self.broken_patients) output += f"{formatted_list}\n" output += "\n\n" - if self.is_nnunet: + else: output += "## Train Test Split\n\n" # pie_path = pathlib.Path(self.output_directory, "markdown_images", "nnunet_train_test_pie.png").as_posix() pie_path = pathlib.Path("markdown_images", "nnunet_train_test_pie.png").as_posix() @@ -702,7 +702,7 @@ def run(self): # for subject_id in subject_ids: # self._process_wrapper(subject_id) self.broken_patients = [] - if not self.is_nnunet: + if not self.is_nnunet and not self.is_nnunet_inference: all_patient_names = glob.glob(pathlib.Path(self.input_directory, "*"," ").as_posix()[0:-1]) all_patient_names = [os.path.split(os.path.split(x)[0])[1] for x in all_patient_names] for e in all_patient_names: @@ -730,7 +730,7 @@ def main(): ignore_missing_regex=args.ignore_missing_regex, roi_yaml_path=args.roi_yaml_path, custom_train_test_split=args.custom_train_test_split, - is_nnunet_inference=args.is_nnunet_inference, + is_nnunet_inference=args.nnunet_inference, dataset_json_path=args.dataset_json_path, continue_processing=args.continue_processing, dry_run=args.dry_run) diff --git a/imgtools/io/loaders.py b/imgtools/io/loaders.py index 7ddcb4d..266ee13 100644 --- a/imgtools/io/loaders.py +++ b/imgtools/io/loaders.py @@ -110,7 +110,7 @@ def read_dicom_auto(path, series=None): rtstruct.metadata.update(all_modality_metadata) return rtstruct elif modality == 'RTDOSE': - rtdose = read_dicom_rtdose(path) + rtdose = read_dicom_rtdose(dcm) rtdose.metadata.update(all_modality_metadata) return rtdose else: diff --git a/imgtools/modules/dose.py b/imgtools/modules/dose.py index ee7d87a..e1a48ca 100644 --- a/imgtools/modules/dose.py +++ b/imgtools/modules/dose.py @@ -38,7 +38,10 @@ def from_dicom_rtdose(cls, path): ''' Reads the data and returns the data frame and the image dosage in SITK format ''' - dcms = glob.glob(pathlib.Path(path, "*.dcm").as_posix()) + if isinstance(path, str) or len(path) == 1: + dcms = [path] + else: + dcms = glob.glob(pathlib.Path(path, "*.dcm").as_posix()) if len(dcms) < 2: dose = sitk.ReadImage(dcms[0]) diff --git a/imgtools/ops/README.md b/imgtools/ops/README.md index 491bec9..d679262 100644 --- a/imgtools/ops/README.md +++ b/imgtools/ops/README.md @@ -1,32 +1,34 @@ # IO module ## Input + * ImageAutoInput * ImageCSVInput * ImageFileInput ### 1. Initialize loader with required parameters + * ImageCSVInput - * `csv_path_or_dataframe` (str): Path to CSV or pandas DataFrame instance - * `colnames` (List[str]): Columns that store path to images. Passed to `pd.read_csv(usecols=colnames)` - * `id_column` (int): Index of index column. Passed to `pd.read_csv(index_col=id_column)` - * `expand_paths` (bool): Expand paths deeper. - * `readers` (List[Callable]): List of function used to read individual images. + * `csv_path_or_dataframe` (str): Path to CSV or pandas DataFrame instance + * `colnames` (List[str]): Columns that store path to images. Passed to `pd.read_csv(usecols=colnames)` + * `id_column` (int): Index of index column. Passed to `pd.read_csv(index_col=id_column)` + * `expand_paths` (bool): Expand paths deeper. + * `readers` (List[Callable]): List of function used to read individual images. * ImageFileInput - * `root_directory` (str): Parent directory of all subject directories - * `get_subject_id_from` (str): Specify how to derive subject_id of a sample. ['filename', 'subject_directory'] - * `subdir_path` (str): If images are stored in a subdirectory of the subject diretory. Accepts glob expressions for flexible subdirectory names. - * `reader` (List[Callable]): Function used to read individual images + * `root_directory` (str): Parent directory of all subject directories + * `get_subject_id_from` (str): Specify how to derive subject_id of a sample. ['filename', 'subject_directory'] + * `subdir_path` (str): If images are stored in a subdirectory of the subject diretory. Accepts glob expressions for flexible subdirectory names. + * `reader` (List[Callable]): Function used to read individual images * ImageAutoInput - * `dir_path` (str): Path to dataset top-level directory. - * `modality` (str): List of modalities to process. Only samples with ALL modalities will be processed. Make sure there are no space between list elements as it is parsed as a string. - * `n_jobs` (Optional(int)): Number of threads to use for multiprocessing. - + * `dir_path` (str): Path to dataset top-level directory. + * `modality` (str): List of modalities to process. Only samples with ALL modalities will be processed. Make sure there are no space between list elements as it is parsed as a string. + * `n_jobs` (Optional(int)): Number of threads to use for multiprocessing. ### 2. Call loader with subject_id ### Code Examples -``` + +```py input = ImageCSVInput("folder/to/dataset/indexing.csv", colnames=['ct_path', 'rt_path'], index_col=0 @@ -36,7 +38,7 @@ input = ImageCSVInput("folder/to/dataset/indexing.csv", image, structureset = input(subject_id) ``` -``` +```py input = ImageFileInput("folder/to/dataset/images", get_subject_id_from='subject_directory', subdir_path="*/structures/RTSTRUCT.dcm", @@ -44,11 +46,10 @@ input = ImageFileInput("folder/to/dataset/images", image = input(subject_id) ``` - -``` + +```py input = ImageAutoInput("folder/to/dataset", modality="CT,RTSTRUCT") image, structureset = input(subject_id) ``` - \ No newline at end of file diff --git a/imgtools/utils/args.py b/imgtools/utils/args.py index 217ba22..6b442f0 100644 --- a/imgtools/utils/args.py +++ b/imgtools/utils/args.py @@ -52,7 +52,7 @@ def parser(): parser.add_argument("--custom_train_test_split", default=False, action="store_true", help="Whether to use a custom train-test-split, stored in custom_train_test_split.yaml in the input directory.") - parser.add_argument("--is_nnunet_inference", default=False, action="store_true", + parser.add_argument("--nnunet_inference", default=False, action="store_true", help="Whether to generate data for nnUNet inference.") parser.add_argument("--dataset_json_path", type=str, @@ -67,4 +67,4 @@ def parser(): # parser.add_argument("--custom_train_test_split_path", type=str, # help="Path to the YAML file defining the custom train-test-split.") - return parser.parse_known_args()[0] \ No newline at end of file + return parser.parse_args() \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml new file mode 100644 index 0000000..19500c8 --- /dev/null +++ b/mkdocs.yml @@ -0,0 +1,18 @@ +site_name: Med-ImageTools Documentation +site_url: https://med-imagetools.readthedocs.io/ +nav: + - Home: index.md + - AutoPipeline: AutoPipeline.md + - nnUNet: nnUNet.md + +markdown_extensions: + - pymdownx.highlight: + anchor_linenums: true + - pymdownx.inlinehilite + - pymdownx.snippets + - pymdownx.superfences + +theme: + name: material + features: + - content.code.annotate \ No newline at end of file diff --git a/requirements.txt b/requirements.txt index eb8bd33..bf54e49 100644 --- a/requirements.txt +++ b/requirements.txt @@ -14,3 +14,4 @@ torchio scikit-learn pyyaml dill +attr diff --git a/setup.py b/setup.py index ae047ef..4d4e01b 100644 --- a/setup.py +++ b/setup.py @@ -8,7 +8,7 @@ setup( name="med-imagetools", - version="0.4.4", + version="1.0.0", author="Michal Kazmierski, Sejin Kim, Kevin Qu, Vishwesh Ramanathan, Benjamin Haibe-Kains", author_email="benjamin.haibe.kains@utoronto.ca", description="Transparent and reproducible image processing pipelines in Python.",