Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] inpainting metrics are not computed correctly #957

Closed
neuronflow opened this issue Oct 10, 2024 · 21 comments · Fixed by #981
Closed

[BUG] inpainting metrics are not computed correctly #957

neuronflow opened this issue Oct 10, 2024 · 21 comments · Fixed by #981

Comments

@neuronflow
Copy link

neuronflow commented Oct 10, 2024

GaNDLF produces metrics that are different from our official inpainting package (https://pypi.org/project/inpainting/).

To compute metrics with the official package:

pip install inpainting
from inpainting.challenge_metrics_2023 import generate_metrics, read_nifti_to_tensor


def compute_image_quality_metrics(
    prediction: str,
    healthy_mask: str,
    reference_t1: str,
    voided_t1: str,
) -> dict:
    print("computing metrics!")
    print("prediction:", prediction)
    print("healthy_mask:", healthy_mask)
    print("reference_t1:", reference_t1)
    print("voided_t1:", voided_t1)

    prediction_data = read_nifti_to_tensor(prediction)
    healthy_mask_data = read_nifti_to_tensor(healthy_mask).bool()
    reference_t1_data = read_nifti_to_tensor(reference_t1)
    voided_t1_data = read_nifti_to_tensor(voided_t1)

    metrics = generate_metrics(
        prediction=prediction_data,
        target=reference_t1_data,
        normalization_tensor=voided_t1_data,
        mask=healthy_mask_data,
    )

    return metrics


if __name__ == "__main__":
    official_metrics = compute_image_quality_metrics(
    prediction="path_to_prediction.nii.gz",
    healthy_mask"path_to_healthy_mask.nii.gz",
    reference_t1"path_to_reference.nii.gz",
    voided_t1"path_to_voided.nii.gz",
    )
    
    print(official_metrics)


@MarcelRosier will upload some test data to reproduce.

@MarcelRosier
Copy link

MarcelRosier commented Oct 10, 2024

Test data: INP-BraTS-GLI-00000-000.zip
(The Prediction was generated using last years winning algorithm)

Copy link
Contributor

Stale issue message

@sarthakpati
Copy link
Collaborator

I am getting an error with this. Here are the steps I followed:

> conda create -p ./venv python=3.11 -y
[SNIP!]
> conda activate ./venv
> pip install inpainting
[SNIP!]
> python
>>> from inpainting.challenge_metrics_2023 import generate_metrics, read_nifti_to_tensor
>>> pred=read_nifti_to_tensor(r"C:\Users\sarth\Downloads\INP-BraTS-GLI-00000-000\INP-BraTS-GLI-00000-000\prediction.nii.gz")
>>> mask=read_nifti_to_tensor(r"C:\Users\sarth\Downloads\INP-BraTS-GLI-00000-000\INP-BraTS-GLI-00000-000\mask-healthy.nii.gz")
>>> reft1=read_nifti_to_tensor(r"C:\Users\sarth\Downloads\INP-BraTS-GLI-00000-000\INP-BraTS-GLI-00000-000\t1n-reference.nii.gz")
>>> voit1=read_nifti_to_tensor(r"C:\Users\sarth\Downloads\INP-BraTS-GLI-00000-000\INP-BraTS-GLI-00000-000\t1n-voided.nii.gz")
>>> generate_metrics(pred,reft1,voit1,mask)
Error: tensors used as indices must be long, int, byte or bool tensors
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Projects\temp_brats_synthesis_metrics\venv\Lib\site-packages\inpainting\challenge_metrics_2023.py", line 266, in generate_metrics
    output["ssim"] = _structural_similarity_index(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\temp_brats_synthesis_metrics\venv\Lib\site-packages\inpainting\challenge_metrics_2023.py", line 92, in _structural_similarity_index
    return ssim_idx.mean()
           ^^^^^^^^
UnboundLocalError: cannot access local variable 'ssim_idx' where it is not associated with a value

@neuronflow
Copy link
Author

can you try with Python 3.10? I believe we used that. not sure whether this your problem though

@sarthakpati
Copy link
Collaborator

Unsure if this has anything to do with the python version, but I will give it a go.

@sarthakpati
Copy link
Collaborator

sarthakpati commented Dec 16, 2024

Same error with 3.10. For reference, here is the result from GaNDLF:

(C:\Projects\GaNDLF\venv) PS C:\Projects\GaNDLF> gandlf generate-metrics -c "C:\Users\sarth\Downloads\INP-BraTS-GLI-00000-000\INP-BraTS-GLI-00000-000\config_synthesis.yaml" -i "C:\Users\sarth\Downloads\INP-BraTS-GLI-00000-000\INP-BraTS-GLI-00000-000\metrics_data_csv_gandlf.csv"
The ``converters`` are currently experimental. It may not support operations including (but not limited to) Functions in ``torch.nn.functional`` that involved data dimension
C:\Projects\GaNDLF\venv\Lib\site-packages\_distutils_hack\__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
2024-12-16 13:26:55 - INFO - The logs are saved in C:\Users\sarth\.gandlf\20241216_132655.log
WARNING: Initializing 'norm_type' as 'batch'
WARNING: This is a special case for multi-class computation, where different labels are processed together, `reverse_one_hot` will need mapping information to work correctly
WARNING: Defining 'patch_sampler' as a string will be deprecated in a future release, please use a dictionary instead
WARNING: Initializing 'verbose' as False
WARNING: Initializing 'medcam_enabled' as False
WARNING: Initializing 'save_training' as False
WARNING: Initializing 'save_output' as False
WARNING: Initializing 'in_memory' as False
WARNING: Initializing 'pin_memory_dataloader' as False
WARNING: Initializing 'scaling_factor' as 1
WARNING: Initializing 'clip_grad' as None
WARNING: Initializing 'track_memory_usage' as False
WARNING: Initializing 'memory_save_mode' as False
WARNING: Initializing 'print_rgb_label_warning' as True
WARNING: Initializing 'grid_aggregator_overlap' as crop
WARNING: Initializing 'determinism' as False
WARNING: Initializing 'previous_parameters' as None
WARNING: Initializing 'clip_mode' as None
WARNING: Setting default step_size to: 0.1
  0%|                                                                                                         | 0/1 [00:00<?, ?it/s]2024-12-16 13:26:56 - py.warnings - WARNING - warnings:_showwarnmsg:109 - C:\Projects\GaNDLF\venv\Lib\site-packages\torchmetrics\utilities\prints.py:62: FutureWarning: Importing `StructuralSimilarityIndexMeasure` from `torchmetrics` was deprecated and will be removed in 2.0. Import `StructuralSimilarityIndexMeasure` from `torchmetrics.image` instead.
  _future_warning(

2024-12-16 13:26:56 - py.warnings - WARNING - warnings:_showwarnmsg:109 - C:\Projects\GaNDLF\GANDLF\metrics\synthesis.py:34: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen/native/IndexingUtils.h:28.)
  ssim_idx = ssim_idx_full_image[mask]

2024-12-16 13:33:14 - py.warnings - WARNING - warnings:_showwarnmsg:109 - C:\Projects\GaNDLF\GANDLF\cli\generate_metrics.py:382: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen/native/IndexingUtils.h:28.)
  gt_image_infill = gt_image_infill[mask]

2024-12-16 13:33:14 - py.warnings - WARNING - warnings:_showwarnmsg:109 - C:\Projects\GaNDLF\GANDLF\cli\generate_metrics.py:383: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen/native/IndexingUtils.h:28.)
  output_infill = output_infill[mask]

2024-12-16 13:33:14 - py.warnings - WARNING - warnings:_showwarnmsg:109 - C:\Projects\GaNDLF\venv\Lib\site-packages\torchmetrics\utilities\prints.py:62: FutureWarning: Importing `PeakSignalNoiseRatio` from `torchmetrics` was deprecated and will be removed in 2.0. Import `PeakSignalNoiseRatio` from `torchmetrics.image` instead.
  _future_warning(

100%|████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [06:18<00:00, 378.84s/it]
{1: {'mae': 0.0005344419041648507,
     'mse': 5.740170649914944e-07,
     'msle': 5.466881134452706e-07,
     'ncc_max': 0.9996438986031039,
     'ncc_mean': 1.736029613960155e-06,
     'ncc_min': -0.0003618681524009567,
     'ncc_std': 0.0016800671663601758,
     'psnr': 32.42278289794922,
     'psnr_01': 62.41074752807617,
     'psnr_01_eps': 62.41075134277344,
     'psnr_eps': 32.422786712646484,
     'ssim': 0.9964786171913147}}
Finished.

Relevant files:

SubjectID,Target,Prediction,Mask
001,"C:\Users\sarth\Downloads\INP-BraTS-GLI-00000-000\INP-BraTS-GLI-00000-000\t1n-reference.nii.gz","C:\Users\sarth\Downloads\INP-BraTS-GLI-00000-000\INP-BraTS-GLI-00000-000\prediction.nii.gz","C:\Users\sarth\Downloads\INP-BraTS-GLI-00000-000\INP-BraTS-GLI-00000-000\mask-healthy.nii.gz"

@neuronflow
Copy link
Author

neuronflow commented Dec 16, 2024

I checked and for me the code works without issues, maybe it is not compatible with Windows?

This is a Python3.10 env on an Ubuntu machine:

from inpainting.challenge_metrics_2023 import generate_metrics, read_nifti_to_tensor


def compute_image_quality_metrics(
    prediction: str,
    healthy_mask: str,
    reference_t1: str,
    voided_t1: str,
) -> dict:
    print("computing metrics!")
    print("prediction:", prediction)
    print("healthy_mask:", healthy_mask)
    print("reference_t1:", reference_t1)
    print("voided_t1:", voided_t1)

    prediction_data = read_nifti_to_tensor(prediction)
    healthy_mask_data = read_nifti_to_tensor(healthy_mask).bool()
    reference_t1_data = read_nifti_to_tensor(reference_t1)
    voided_t1_data = read_nifti_to_tensor(voided_t1)

    metrics = generate_metrics(
        prediction=prediction_data,
        target=reference_t1_data,
        normalization_tensor=voided_t1_data,
        mask=healthy_mask_data,
    )

    return metrics


if __name__ == "__main__":
    official_metrics = compute_image_quality_metrics(
    prediction="/home/florian/flow/inpainting_test/INP-BraTS-GLI-00000-000/prediction.nii.gz",
    healthy_mask="/home/florian/flow/inpainting_test/INP-BraTS-GLI-00000-000/mask-healthy.nii.gz",
    reference_t1="/home/florian/flow/inpainting_test/INP-BraTS-GLI-00000-000/t1n-reference.nii.gz",
    voided_t1="/home/florian/flow/inpainting_test/INP-BraTS-GLI-00000-000/t1n-voided.nii.gz",
    )
    
    print(official_metrics)

(ipt) florian@a4000-21an1:~/flow/inpainting_test$ python check.py 
computing metrics!
prediction: /home/florian/flow/inpainting_test/INP-BraTS-GLI-00000-000/prediction.nii.gz
healthy_mask: /home/florian/flow/inpainting_test/INP-BraTS-GLI-00000-000/mask-healthy.nii.gz
reference_t1: /home/florian/flow/inpainting_test/INP-BraTS-GLI-00000-000/t1n-reference.nii.gz
voided_t1: /home/florian/flow/inpainting_test/INP-BraTS-GLI-00000-000/t1n-voided.nii.gz
{'ssim': 0.9964787364208368, 'mse': 0.000503217859659344, 'rmse': 0.022432517260313034, 'msle': 0.0001831933914218098, 'mae': 0.01582399569451809, 'psnr': 32.57281062728802, 'psnr_eps': 32.57281284650872, 'psnr_01': 32.98243713378906, 'psnr_01_eps': 32.98244094848633}

@neuronflow
Copy link
Author

What is interesting here is that you get the same/very similar values, though. However, we get different results from Synapse when they run GNDLF.

@sarthakpati
Copy link
Collaborator

This is using the latest version [ref]. Perhaps they were using an older version? Could you tag the person from Synapse who was running this?

@neuronflow
Copy link
Author

neuronflow commented Dec 16, 2024

I don't know their GitHub accounts. Rong and Verena both ran this code base.

Or perhaps some algorithms save files in a different format, and then something goes wrong with the file loading in GaNDLF?

The issue must be somewhere in this direction.

@neuronflow
Copy link
Author

@vpchung, can you please have a look? Sarthak is unable to reproduce the issue.

@sarthakpati
Copy link
Collaborator

sarthakpati commented Dec 16, 2024

GaNDLF is letting SimpleITK do its thing WRT loading, so there is nothing special going on there.

Update: sent email to Rong and Verena.

@vpchung
Copy link

vpchung commented Dec 17, 2024

I was able to generate the same scores as shared above:

$ conda create -n inpainting python=3.10 -y && conda activate inpainting
$ pip install inpainting numpy==1.26.4
>>> from inpainting.challenge_metrics_2023 import generate_metrics, read_nifti_to_tensor
>>> def compute_image_quality_metrics(
...     ... (truncated for readability)
...     ...
... )
>>>
>>> scores = compute_image_quality_metrics(
...     prediction=os.path.join(parent, "prediction.nii.gz"),
...     healthy_mask=os.path.join(parent, "mask-healthy.nii.gz"),
...     reference_t1=os.path.join(parent, "t1n-reference.nii.gz"),
...     voided_t1=os.path.join(parent, "t1n-voided.nii.gz")
... )
computing metrics!
prediction: /Users/vchung/Downloads/INP-BraTS-GLI-00000-000/prediction.nii.gz
healthy_mask: /Users/vchung/Downloads/INP-BraTS-GLI-00000-000/mask-healthy.nii.gz
reference_t1: /Users/vchung/Downloads/INP-BraTS-GLI-00000-000/t1n-reference.nii.gz
voided_t1: /Users/vchung/Downloads/INP-BraTS-GLI-00000-000/t1n-voided.nii.gz
>>> 
>>> pprint(scores)
{'mae': 0.01582399569451809,
 'mse': 0.000503217801451683,
 'msle': 0.00018319336231797934,
 'psnr': 32.57281494140625,
 'psnr_01': 32.98244094848633,
 'psnr_01_eps': 32.98244094848633,
 'psnr_eps': 32.57281494140625,
 'rmse': 0.022432517260313034,
 'ssim': 0.996478796005249}
>>>

As mentioned in my email response, I think the mismatch of scores may actually be due to the challenge re-using the BraTS 2023 metrics MLCube, rather than this being a GaNDLF issue.

@neuronflow
Copy link
Author

As mentioned in my email response, I think the mismatch of scores may actually be due to the challenge re-using the BraTS 2023 metrics MLCube, rather than this being a GaNDLF issue.

thanks, what is running under the hood there?

Would it be possible to have an ML cube wrapping around our metric pkg for the upcoming light house challenge?

@vpchung
Copy link

vpchung commented Dec 17, 2024

what is running under the hood there?

I'm not sure, as I did not create any of the metrics MLCubes. My best guess is that this is the source used to create the inpainting metrics MLCube, which from the setup README, uses this branch from Felix's GaNDLF fork.

Would it be possible to have an ML cube wrapping around our metric pkg for the upcoming light house challenge?

Yes, in my opinion, it would be best to create a new metrics MLCube. But perhaps @sarthakpati (or someone from MLCommons) has a better suggestion.

@sarthakpati
Copy link
Collaborator

We are currently working on having a common solution for all metrics (see #942). This would allow a single "source of truth" for all metrics, and organizers would only need to incorporate their implementations in GaNDLF. The mlcube generation and subsequent steps will be automatically taken care of.

Any feedback/help would be much appreciated!

@vpchung
Copy link

vpchung commented Dec 18, 2024

@sarthakpati that sounds amazing! Is there a proposed timeline for this effort? i.e. would it be ready in time for the 2025 Lighthouse challenge? I don't know of any of the dates yet (maybe @neuronflow does) but I imagine the MLCube portion would start around July/August again, like the previous BraTS challenges.

@sarthakpati
Copy link
Collaborator

The goal is for us to have this PR ready for public testing around the end of Jan.

Since this PR ties in with another major effort, I am tagging @hasan7n for more clarification regarding the specific timeline.

@neuronflow
Copy link
Author

neuronflow commented Dec 19, 2024

should the inpainting metrics package be incorporated into GaNDLF then?

@vpchung I don't know the exact dates, but from my understanding it will be similar to 2023/2024. Spyros should know.

@sarthakpati
Copy link
Collaborator

should the inpainting metrics package be incorporated into GNDLF then?

Since the outputs are basically the same [1, 2], does it make sense to do so? Might as well have one less package to support from your end, right?

This does raise the question about the segmentation metrics, though.

Regardless, I believe this issue is now resolved, and any further discussion should be done on a separate thread. Thoughts @vpchung @neuronflow?

@sarthakpati
Copy link
Collaborator

FYI, I discovered a significant source of variation between the metrics calculated by inpainting and GaNDLF: the use of the voided image. This was not something that was communicated to the original developer of the synthesis metrics before, hence they had only put the "Mask" option for normalization. Anyway, I just added it in sarthakpati@1c4afd7, and here are the results:

'mae': 0.01582399569451809,
'mse': 0.000503217801451683,
'msle': 0.00018319336231797934,
'psnr': 32.42278289794922,
'psnr_01': 32.98244094848633,
'psnr_01_eps': 32.98244094848633,
'psnr_eps': 32.422786712646484,
'rmse': 0.022432517260313034,
'ssim': 0.996478796005249

As you can see, the results are pretty much the same as what inpainting calculates. The added advantage with the GaNDF metrics is that the normalization can also be done on the basis of a reference brain mask as well as a voided image [ref].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants