Merge branch 'dev'

aramis-lab · Apr 13, 2021 · f8b707b · f8b707b
2 parents fdbe09e + fad1735
commit f8b707b
Show file tree

Hide file tree

Showing 370 changed files with 1,428 additions and 76,067 deletions.
diff --git a/.gitignore b/.gitignore
@@ -35,4 +35,3 @@ clinicadl/notebooks/
 # Mask and other files
 clinicadl/clinicadl/resources/masks/*.nii
 clinicadl/clinicadl/resources/masks/*.nii.gz
-
diff --git a/CHANGELOG b/CHANGELOG
@@ -26,6 +26,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Security
 
+## ClinicaDL 0.2.1
+
+### Added
+
+- the `multi_cohort` flag in train allows to train on several CAPS at the same time.
+
+### Changed
+
+- `clinicadl train roi` now allows any ROI defined by a mask.
+- Update README.md to avoid duplicates.
+- JSON files are added for `clinicadl classify` and `clinicadl tsvtool getlabels|split|kfold`
+
+### Removed
+
+- Scripts and data related to MedIA publication.
+
+
 ## ClinicaDL 0.2.0
 
 ### Added

diff --git a/README.md b/README.md
@@ -39,23 +39,13 @@
 
 This repository hosts the source code of a **framework for the reproducible
 evaluation of deep learning classification experiments using anatomical MRI
-data for the computer-aided diagnosis of Alzheimer's disease (AD)**. This work
-has been published in [Medical Image
-Analysis](https://doi.org/10.1016/j.media.2020.101694) and is also available on
-[arXiv](https://arxiv.org/abs/1904.07773).
-
-Automatic classification of AD using classical machine learning approaches can
-be performed using the framework available here:
-<https://github.com/aramis-lab/AD-ML>.
+data for the computer-aided diagnosis of Alzheimer's disease (AD)**.
 
 > **Disclaimer:** this software is **under development**. Some features can
-change between different commits. A stable version is planned to be released
-soon. The release v.0.0.1 corresponds to the date of submission of the
-publication but in the meantime important changes are being done to facilitate
-the use of the package.
+change between different releases and/or commits.
 
-The complete documentation of the project can be found on 
-this [page](https://clinicadl.readthedocs.io/). 
+To access the full documentation of the project, follow the link 
+[https://clinicadl.readthedocs.io/](https://clinicadl.readthedocs.io/). 
 If you find a problem when using it or if you want to provide us feedback,
 please [open an issue](https://github.com/aramis-lab/ad-dl/issues) or write on
 the [forum](https://groups.google.com/forum/#!forum/clinica-user).
@@ -78,58 +68,18 @@ pip install clinicadl
 site](https://aramislab.paris.inria.fr/clinicadl/tuto/intro.html) to start
 using **ClinicaDL** directly in a Google Colab instance!
 
-## Overview
-
-### How to use ClinicaDL?
-
-`clinicadl` is an utility that is used through the command line. Several tasks
-can be performed:
-
-- **Preparation of your imaging data**
-    * **T1w-weighted MR image preprocessing.** The `preprocessing` task
-      processes a dataset of T1 images stored in BIDS format and prepares to
-      extract the tensors (see paper for details on the preprocessing). Output
-      is stored using the [CAPS](https://aramislab.paris.inria.fr/clinica/docs/public/latest/CAPS/Introduction/)
-      hierarchy.
-    * **Quality check of preprocessed data.** The `quality_check` task uses a
-      pretrained network [(Fonov et al,
-      2018)](https://www.biorxiv.org/content/10.1101/303487v1) to classify
-      adequately registered images.
-    * **Tensor extraction from preprocessed data.** The `extract` task allows
-      to create files in PyTorch format (`.pt`) with different options: the
-      complete MRI, 2D slices and/or 3D patches. This files are also stored in
-      the [CAPS](https://aramislab.paris.inria.fr/clinica/docs/public/latest/CAPS/Introduction/) hierarchy.
-
-- **Train & test your classifier**
-    * **Train neural networks.** The `train` task is designed to perform
-      training of CNN models using different kind of inputs, e.g., a full MRI
-      (3D-image), patches from a MRI (3D-patch), specific regions of a MRI
-      (ROI-based) or slices extracted from the MRI (2D-slices). Parameters used
-      during the training are configurable. This task allow also to train
-      autoencoders.
-    * **MRI classification.** The `classify` task uses previously trained models
-      to perform the inference of a particular or a set of MRI.
-
-
-- **Utilitaries used for the preparation of imaging data and/or training your
-  classifier**
-    * **Process TSV files**. `tsvtool` includes many functions to get labels
-      from BIDS, perform k-fold or single splits, produce demographic analysis
-      of extracted labels and reproduce the restrictions made on AIBL and OASIS
-      in the original paper.
-    * **Generate a synthetic dataset.** The `generate` task is useful to obtain
-      synthetic datasets frequently used in functional tests.
-
-## Pretrained models
-
-Some of the pretained models for the CNN networks described in 
-([Wen et al., 2020](https://doi.org/10.1016/j.media.2020.101694)) 
-are available on Zenodo:
-<https://zenodo.org/record/3491003>
-
-Updated versions of the models will be published soon.
-
 ## Related Repositories
 
 - [Clinica: Software platform for clinical neuroimaging studies](https://github.com/aramis-lab/clinica)
 - [AD-ML: Framework for the reproducible classification of Alzheimer's disease using machine learning](https://github.com/aramis-lab/AD-ML)
+
+## Citing us
+
+- Wen, J., Thibeau-Sutre, E., Samper-González, J., Routier, A., Bottani, S., Durrleman, S., Burgos, N., and Colliot, O.: ‘Convolutional Neural Networks for Classification of Alzheimer’s Disease: Overview and Reproducible Evaluation’, *Medical Image Analysis*, 63: 101694, 2020. [doi:10.1016/j.media.2020.101694](https://doi.org/10.1016/j.media.2020.101694)
+- Routier, A., Burgos, N., Díaz, M., Bacci, M., Bottani, S., El-Rifai O., Fontanella, S., Gori, P., Guillon, J., Guyot, A., Hassanaly, R., Jacquemont, T.,  Lu, P., Marcoux, A.,  Moreau, T., Samper-González, J., Teichmann, M., Thibeau-Sutre, E., Vaillant G., Wen, J., Wild, A., Habert, M.-O., Durrleman, S., and Colliot, O.: ‘Clinica: An Open Source Software Platform for Reproducible Clinical Neuroscience Studies’, 2021. [hal-02308126](https://hal.inria.fr/hal-02308126)
+
+
+## Reproducibility
+
+To reproduce the results published in [Wen et al., MedIA, 2020](https://doi.org/10.1016/j.media.2020.101694) ([arXiv version](https://arxiv.org/abs/1904.07773))
+please use the version of ClinicaDL tagged `[v0.0.1](https://github.com/aramis-lab/AD-DL/tree/v.0.0.1)`.
diff --git a/clinicadl/clinicadl/VERSION b/clinicadl/clinicadl/VERSION
@@ -1 +1 @@
-0.2.0
+0.2.1
diff --git a/clinicadl/clinicadl/classify/inference.py b/clinicadl/clinicadl/classify/inference.py
@@ -1,14 +1,13 @@
 # coding: utf8
 
-from os.path import isdir, join, abspath, exists
+from os.path import join, exists
 from os import strerror, makedirs, listdir
 import errno
 import pathlib
-from clinicadl.tools.deep_learning import create_model, load_model, read_json
+from clinicadl.tools.deep_learning import create_model, load_model, read_json, commandline_to_json
 from clinicadl.tools.deep_learning.iotools import return_logger, translate_parameters
 from clinicadl.tools.deep_learning.data import return_dataset, get_transforms, compute_num_cnn, load_data_test
 from clinicadl.tools.deep_learning.cnn_utils import test, soft_voting_to_tsvs, mode_level_to_tsvs, get_criterion
-import torch.nn as nn
 from torch.utils.data import DataLoader
 
 
@@ -23,7 +22,8 @@ def classify(caps_dir,
              prepare_dl=True,
              selection_metrics=None,
              diagnoses=None,
-             verbose=0):
+             verbose=0,
+             multi_cohort=False):
     """
     This function verifies the input folders, and the existence of the json file
     then it launch the inference stage from a specific model.
@@ -44,30 +44,13 @@ def classify(caps_dir,
         selection_metrics: list of metrics to find best models to be evaluated.
         diagnoses: list of diagnoses to be tested if tsv_path is a folder.
         verbose: level of verbosity.
+        multi_cohort (bool): If True caps_directory is the path to a TSV file linking cohort names and paths.
 
     """
     logger = return_logger(verbose, "classify")
 
-    # Verify that paths exist
-    caps_dir = abspath(caps_dir)
-    model_path = abspath(model_path)
-    tsv_path = abspath(tsv_path)
-
-    if not isdir(caps_dir):
-        logger.error("Folder containing MRIs was not found, please verify its location.")
-        raise FileNotFoundError(
-            errno.ENOENT, strerror(errno.ENOENT), caps_dir)
-    if not isdir(model_path):
-        logger.error("A valid model in the path was not found. Donwload them from aramislab.inria.fr")
-        raise FileNotFoundError(
-            errno.ENOENT, strerror(errno.ENOENT), model_path)
-    if not exists(tsv_path):
-        raise FileNotFoundError(
-            errno.ENOENT, strerror(errno.ENOENT), tsv_path)
-
     # Infer json file from model_path (suppose that json file is at the same
     # folder)
-
     json_file = join(model_path, 'commandline.json')
 
     if not exists(json_file):
@@ -88,7 +71,8 @@ def classify(caps_dir,
         prepare_dl,
         selection_metrics,
         diagnoses,
-        logger
+        logger,
+        multi_cohort
     )
 
 
@@ -104,7 +88,8 @@ def inference_from_model(caps_dir,
                          prepare_dl=False,
                          selection_metrics=None,
                          diagnoses=None,
-                         logger=None):
+                         logger=None,
+                         multi_cohort=False):
     """
     Inference from previously trained model.
 
@@ -131,6 +116,7 @@ def inference_from_model(caps_dir,
         selection_metrics: list of metrics to find best models to be evaluated.
         diagnoses: list of diagnoses to be tested if tsv_path is a folder.
         logger: Logger instance.
+        multi_cohort (bool): If True caps_directory is the path to a TSV file linking cohort names and paths.
 
     Returns:
         Files written in the output folder with prediction results and metrics. By
@@ -160,7 +146,6 @@ def inference_from_model(caps_dir,
     options.use_cpu = not gpu
     options.nproc = num_workers
     options.batch_size = batch_size
-    options.prepare_dl = prepare_dl
     if diagnoses is not None:
         options.diagnoses = diagnoses
 
@@ -178,48 +163,57 @@ def inference_from_model(caps_dir,
     # loop depending the number of folds found in the model folder
     for fold_dir in currentDirectory.glob(currentPattern):
         fold = int(str(fold_dir).split("-")[-1])
-        fold_path = join(model_path, fold_dir)
-        model_path = join(fold_path, 'models')
+        out_path = join(fold_dir, 'models')
 
         for selection_metric in selection_metrics:
 
             if options.mode_task == 'multicnn':
-                for cnn_dir in listdir(model_path):
-                    if not exists(join(model_path, cnn_dir, "best_%s" % selection_metric, 'model_best.pth.tar')):
+                for cnn_dir in listdir(out_path):
+                    if not exists(join(out_path, cnn_dir, "best_%s" % selection_metric, 'model_best.pth.tar')):
                         raise FileNotFoundError(
                             errno.ENOENT,
                             strerror(errno.ENOENT),
-                            join(model_path,
+                            join(out_path,
                                  cnn_dir,
                                  "best_%s" % selection_metric,
                                  'model_best.pth.tar')
                         )
 
             else:
-                full_model_path = join(model_path, "best_%s" % selection_metric)
+                full_model_path = join(out_path, "best_%s" % selection_metric)
                 if not exists(join(full_model_path, 'model_best.pth.tar')):
                     raise FileNotFoundError(
                         errno.ENOENT,
                         strerror(errno.ENOENT),
                         join(full_model_path, 'model_best.pth.tar'))
 
-            performance_dir = join(fold_path, 'cnn_classification', 'best_%s' % selection_metric)
+            performance_dir = join(fold_dir, 'cnn_classification', 'best_%s' % selection_metric)
 
             makedirs(performance_dir, exist_ok=True)
 
+            commandline_to_json({
+                "output_dir": model_path,
+                "caps_dir": caps_dir,
+                "tsv_path": tsv_path,
+                "prefix": prefix,
+                "labels": labels
+            }, filename=f"commandline_classify-{prefix}")
+
             # It launch the corresponding function, depending on the mode.
             inference_from_model_generic(
                 caps_dir,
                 tsv_path,
-                model_path,
+                out_path,
                 options,
                 prefix,
                 currentDirectory,
                 fold,
                 "best_%s" % selection_metric,
                 labels=labels,
                 num_cnn=num_cnn,
-                logger=logger
+                logger=logger,
+                multi_cohort=multi_cohort,
+                prepare_dl=prepare_dl
             )
 
             # Soft voting
@@ -241,7 +235,8 @@ def inference_from_model(caps_dir,
 
 def inference_from_model_generic(caps_dir, tsv_path, model_path, model_options,
                                  prefix, output_dir, fold, selection,
-                                 labels=True, num_cnn=None, logger=None):
+                                 labels=True, num_cnn=None, logger=None,
+                                 multi_cohort=False, prepare_dl=True):
     from os.path import join
     import logging
 
@@ -252,7 +247,7 @@ def inference_from_model_generic(caps_dir, tsv_path, model_path, model_options,
 
     _, all_transforms = get_transforms(model_options.mode, model_options.minmaxnormalization)
 
-    test_df = load_data_test(tsv_path, model_options.diagnoses)
+    test_df = load_data_test(tsv_path, model_options.diagnoses, multi_cohort=multi_cohort)
 
     # Define loss and optimizer
     criterion = get_criterion(model_options.loss)
@@ -270,7 +265,9 @@ def inference_from_model_generic(caps_dir, tsv_path, model_path, model_options,
                 all_transformations=all_transforms,
                 params=model_options,
                 cnn_index=n,
-                labels=labels
+                labels=labels,
+                prepare_dl=prepare_dl,
+                multi_cohort=multi_cohort
             )
 
             test_loader = DataLoader(
@@ -315,7 +312,9 @@ def inference_from_model_generic(caps_dir, tsv_path, model_path, model_options,
             train_transformations=None,
             all_transformations=all_transforms,
             params=model_options,
-            labels=labels
+            labels=labels,
+            prepare_dl=prepare_dl,
+            multi_cohort=multi_cohort
         )
 
         # Load the data
Original file line number	Diff line number	Diff line change
Expand Up		@@ -35,4 +35,3 @@ clinicadl/notebooks/
		# Mask and other files
		clinicadl/clinicadl/resources/masks/*.nii
		clinicadl/clinicadl/resources/masks/*.nii.gz