-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] inpainting metrics are not computed correctly #957
Comments
Test data: INP-BraTS-GLI-00000-000.zip |
Stale issue message |
I am getting an error with this. Here are the steps I followed: > conda create -p ./venv python=3.11 -y
[SNIP!]
> conda activate ./venv
> pip install inpainting
[SNIP!]
> python
|
can you try with Python 3.10? I believe we used that. not sure whether this your problem though |
Unsure if this has anything to do with the python version, but I will give it a go. |
Same error with 3.10. For reference, here is the result from GaNDLF:
Relevant files:
|
I checked and for me the code works without issues, maybe it is not compatible with Windows? This is a Python3.10 env on an Ubuntu machine:
|
What is interesting here is that you get the same/very similar values, though. However, we get different results from Synapse when they run GNDLF. |
This is using the latest version [ref]. Perhaps they were using an older version? Could you tag the person from Synapse who was running this? |
I don't know their GitHub accounts. Rong and Verena both ran this code base. Or perhaps some algorithms save files in a different format, and then something goes wrong with the file loading in GaNDLF? The issue must be somewhere in this direction. |
@vpchung, can you please have a look? Sarthak is unable to reproduce the issue. |
GaNDLF is letting SimpleITK do its thing WRT loading, so there is nothing special going on there. Update: sent email to Rong and Verena. |
I was able to generate the same scores as shared above:
As mentioned in my email response, I think the mismatch of scores may actually be due to the challenge re-using the BraTS 2023 metrics MLCube, rather than this being a GaNDLF issue. |
thanks, what is running under the hood there? Would it be possible to have an ML cube wrapping around our metric pkg for the upcoming light house challenge? |
I'm not sure, as I did not create any of the metrics MLCubes. My best guess is that this is the source used to create the inpainting metrics MLCube, which from the setup README, uses this branch from Felix's GaNDLF fork.
Yes, in my opinion, it would be best to create a new metrics MLCube. But perhaps @sarthakpati (or someone from MLCommons) has a better suggestion. |
We are currently working on having a common solution for all metrics (see #942). This would allow a single "source of truth" for all metrics, and organizers would only need to incorporate their implementations in GaNDLF. The mlcube generation and subsequent steps will be automatically taken care of. Any feedback/help would be much appreciated! |
@sarthakpati that sounds amazing! Is there a proposed timeline for this effort? i.e. would it be ready in time for the 2025 Lighthouse challenge? I don't know of any of the dates yet (maybe @neuronflow does) but I imagine the MLCube portion would start around July/August again, like the previous BraTS challenges. |
The goal is for us to have this PR ready for public testing around the end of Jan. Since this PR ties in with another major effort, I am tagging @hasan7n for more clarification regarding the specific timeline. |
should the inpainting metrics package be incorporated into GaNDLF then? @vpchung I don't know the exact dates, but from my understanding it will be similar to 2023/2024. Spyros should know. |
Since the outputs are basically the same [1, 2], does it make sense to do so? Might as well have one less package to support from your end, right? This does raise the question about the segmentation metrics, though. Regardless, I believe this issue is now resolved, and any further discussion should be done on a separate thread. Thoughts @vpchung @neuronflow? |
FYI, I discovered a significant source of variation between the metrics calculated by inpainting and GaNDLF: the use of the voided image. This was not something that was communicated to the original developer of the synthesis metrics before, hence they had only put the
As you can see, the results are pretty much the same as what inpainting calculates. The added advantage with the GaNDF metrics is that the normalization can also be done on the basis of a reference brain mask as well as a voided image [ref]. |
GaNDLF produces metrics that are different from our official inpainting package (https://pypi.org/project/inpainting/).
To compute metrics with the official package:
@MarcelRosier will upload some test data to reproduce.
The text was updated successfully, but these errors were encountered: