diff --git a/MANIFEST.in b/MANIFEST.in new file mode 100644 index 0000000..959d48e --- /dev/null +++ b/MANIFEST.in @@ -0,0 +1,13 @@ +include maps.txt +include README.md +include README.rst +recursive-include data * +README.txt +setup.cfg +setup.py +maps.txt +ivtmetrics/__init__.py +ivtmetrics/detection.py +ivtmetrics/disentangle.py +ivtmetrics/recognition.py +ivtmetrics/maps.txt diff --git a/README.md b/README.md index ba28050..93797af 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,3 @@ - - [![PyPI version](https://badge.fury.io/py/motmetrics.svg)](https://pypi.org/project/ivtmetrics/0.0.1/) # ivtmetrics @@ -7,39 +5,41 @@ The **ivtmetrics** library provides a Python implementation of metrics for benchmarking surgical action triplet detection and recognition. ## Features at a glance -- *Recognition Evaluation*
-Provides AP metrics to measure the performance of a model on action triplet recognition. -- *Detection Evaluation*
-Supports Intersection over Union distances measure of the triplet localization with respect to the instruments. -- *Flexible Analysis*
- - Supports for switching between frame-wise to video-wise averaging of the AP. - - Supports disentangle prediction and obtained filtered performance for the various components of the triplets as well as their association performances at various levels. + +The following are available with ivtmetrics: +1. **Recognition Evaluation**: Provides AP metrics to measure the performance of a model on action triplet recognition. +2. **Detection Evaluation**: Supports Intersection over Union distances measure of the triplet localization with respect to the instruments. +3. **Flexible Analysis**: (1) Supports for switching between frame-wise to video-wise averaging of the AP. +(2) Supports disentangle prediction and obtained filtered performance for the various components of the triplets as well as their association performances at various levels. ## Installation + ### Install via PyPi + To install **ivtmetrics** use `pip` ``` pip install ivtmetrics ``` -Python 3.5-3.9 and numpy and scikit-learn are required. ### Install via Conda + ``` conda install -c nwoye ivtmetrics ``` +Python 3.5-3.9 and numpy and scikit-learn are required. + ## Metrics The metrics have been aligned with what is reported by [CholecT50](https://arxiv.org/abs/2109.03223) benchmark. **ivtmetrics** can be imported in the following way: -```python +``` python import ivtmetrics - ``` The metrics implement both **recognition** and **detection** evaluation. @@ -49,51 +49,55 @@ The metrics internally implement a disentangle function to help filter the tripl **Recognition ivtmetrics** can be used in the following ways: -```python +``` python metric = ivtmetrics.Recognition(num_class) - ``` This takes an argument `num_class` which is default to `100` -The following function are possible with the 'Recognition` class: -Name|Description -:---|:--- +The following function are possible with the `Recognition` class: + +Name | Description +:--- | :--- update(`targets, predictions`)|takes in a (batch of) vector predictions and their corresponding groundtruth. vector size must match `num_class` in the class initialization. video_end()|Call to make the end of one video sequence. reset()|Reset current records. Useful during training and can be called at the begining of each epoch to avoid overlapping epoch performances. reset_global()|Reset all records. Useful for switching between training/validation/testing or can be called at the begining of new experiment. -compute_AP(`component, ignore_null`)|Obtain the average precision on the fly. This gives the AP only on examples cases after the last `reset()` call. Useful for epoch performance during training. +compute_AP(`component, ignore_null`)|Obtain the average precision on the fly. This gives the AP only on examples cases after the last `reset()` call. Useful for epoch performance during training. compute_video_AP(`component, ignore_null`)|(RECOMMENDED) compute video-wise AP performance as used in CholecT50 benchmarks. compute_global_AP(`component, ignore_null`)|compute frame-wise AP performance for all seen samples. topK(`k, component`) | Obtain top K performance on action triplet recognition for all seen examples. args `k` can be any int between 1-99. k = [5,10,15,20] have been used in benchmark papers. topClass(`k, component`)|Obtain top K recognized classes on action triplet recognition for all seen examples. args `k` can be any int between 1-99. k = 10 have been used in benchmark papers. +### args: +- args `component` can be any of the following ('i', 'v', 't', 'iv', 'it','ivt') to compute performance for (instrument, verb, target, instrument-verb, instrument-target, instrument-verb-target) respectively. default is 'ivt' for triplets. +- args `ignore_null` (optional, default=False): to ignore null triplet classes in the evaluation. This option is enabled in CholecTriplet2021 challenge. +- the output is a `dict` with keys("AP", "mAP") for per-class and mean AP respectively. + + + + #### Example usage + + ```python import ivtmetrics recognize = ivtmetrics.Recognition(num_class=100) - -network = MyModel(...) # your model here - +network = MyModel(...) # your model here # training -for epoch in number of epochs: +for epoch in number-of-epochs: recognize.reset() for images, labels in dataloader(...): # your data loader - predictions = network(image) - recognize.update(labels, predictions) - + predictions = network(image) + recognize.update(labels, predictions) results_i = recognize.compute_AP('i') print("instrument per class AP", results_i["AP"]) print("instrument mean AP", results_i["mAP"]) - results_ivt = recognize.compute_AP('ivt') print("triplet mean AP", results_ivt["mAP"]) - - # evaluation recognize.reset_global() @@ -130,30 +134,46 @@ metric = ivtmetrics.Detection(num_class, num_tool) ``` This takes an argument `num_class` which is default to `100` and `num_tool` which is default to `6` -The following function are possible with the 'Recognition` class: -Name|Description -:---|:--- -update(`targets, predictions, format`)|input: takes in a (batch of) list/dict predictions and their corresponding groundtruth. Each frame prediction/groundtruth can be either as a `list of list` or `list of dict`.