Evaluation Pipeline for Models #271

WGierke · 2018-01-03T09:40:37Z

As discussed in #131 it would be helpful to have a consistent pipeline to evaluate prediction models. This way we get to know how well the currently implemented models are, which ones need to be improved and how well a new model performs. The pipeline should calculate the appropriate metrics that have been specified in #221 while some of the are already available here.

Acceptance Criteria

all evaluation methods take a model and a parameter whether to use 5-fold validation or a test set (default)
for classification calculate accuracy, Log Loss
for identification calculate accuracy, Log Loss
for segmentation calculate Dice coefficient, Hausdorff distance, Jaccard index, sensitivity, specificity
for all of them calculate data IO, disk space usage, memory usage, prediction time (if not (easily) possible, specify why and how to manually measure it)
for all of them calculate the training time

Please refer to the PR template for further explanations of the metrics.

vessemer · 2018-01-04T23:29:09Z

For the both identification and classification (false positive reduction) tasks was proposed a handy evaluation framework by the LUNA16 authors.
They employed Free-Response Receiver Operating Characteristic (FROC) and competition performance metric (CPM). It computes an average of the seven sensitivities measured at several false positives per scan (FPPS) thresholds, more concretely, at each FPPS ∈ {0.125, 0.25, 0.5, 1, 2, 4, 8} true positive rate was computed. Mean of which forms the CPM. From my point of view, it worth to pay attention to the CPM neither the logloss.
I can work on that to adjust their pipeline, if no one mind.

reubano · 2018-01-05T16:17:33Z

@vessemer great observation! I see that they provide evaluation code as well. So yes, adjusting it to fit our use case will be extremely useful!

reubano added enhancement official prediction labels Jan 5, 2018

reubano added this to the 3-packaging milestone Jan 5, 2018

isms removed this from the 3-packaging milestone Jan 5, 2018

WGierke mentioned this issue Jan 19, 2018

#271 Classification Evaluation #290

Merged

1 task

vessemer mentioned this issue Jan 22, 2018

#271 LUNA16 evaluation #292

Merged

1 task

WGierke mentioned this issue Jan 24, 2018

#271 Add segmentation evaluation #299

Merged

1 task

swarm-ai mentioned this issue Jan 25, 2018

Added detection evaluation method for detection #301

Merged

1 task

reubano added the medium label Jan 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation Pipeline for Models #271

Evaluation Pipeline for Models #271

WGierke commented Jan 3, 2018 •

edited by reubano

Loading

vessemer commented Jan 4, 2018

reubano commented Jan 5, 2018

Evaluation Pipeline for Models #271

Evaluation Pipeline for Models #271

Comments

WGierke commented Jan 3, 2018 • edited by reubano Loading

Acceptance Criteria

vessemer commented Jan 4, 2018

reubano commented Jan 5, 2018

WGierke commented Jan 3, 2018 •

edited by reubano

Loading