Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to chosse the cut-off value for delete images with bad quality? #20

Open
EdithGaspar opened this issue Sep 15, 2023 · 7 comments
Open
Labels
question Further information is requested

Comments

@EdithGaspar
Copy link

I have the MRIQC results from 1000 subjects, but i dont understand how can i choose the cut-off value for choose my best images or that ones to delete

@celprov
Copy link
Collaborator

celprov commented Oct 12, 2023

Hi @EdithGaspar,
Could you define specifically what value are you referring to?

@oesteban
Copy link
Member

I have the MRIQC results from 1000 subjects, but i dont understand how can i choose the cut-off value for choose my best images or that ones to delete

There's no rule of thumb to do this. As we introduced in our MRIQC paper (https://doi.org/10.1371/journal.pone.0184661), you can train a classifier on a subset of your data (that you manually annotate) to then apply it on the remainder of the dataset. The original code for the classifier was moved into the nipreps/mriqc-learn repo.

Perhaps @jaimebarran or @t-sanchez, who have recently worked with mriqc-learn, can give you some insights or share their experience.

@oesteban oesteban added the question Further information is requested label Nov 15, 2023
@oesteban oesteban transferred this issue from nipreps/mriqc Nov 15, 2023
@jaimebarran
Copy link

Hi @EdithGaspar,

You can use the baseline model https://github.com/nipreps/mriqc-learn as follows:
First you have to load it:

from joblib import load
# Load the trained model
model = load("/mriqc_learn/mriqc_learn/data/classifier.joblib") # check your path

And then you can use y_pred = model.predict(your_loaded_dataset) which will return binary values (cutoff is 0.5), or alternatively, you can use y_scores = model.predict_proba(your_loaded_dataset)[:, 0] which will return the probabilities for each image to belong to class '0' in this case (negative class = excluded quality). Then you can decide a threshold and get the indices of the values under or over that threshold, for example:

threshold = 0.7
y_pred_idx = (y_scores > threshold).nonzero()[0]

I would recommend you to retrain the model with updated Python libraries (numpy, sklearn, etc.) before getting directly the model from the repo. You can do that following the tutorial https://github.com/nipreps/mriqc-learn/blob/main/docs/notebooks/Tutorial.ipynb, saving the trained model using:

from joblib import dump
dump(model, "/mriqc-learn/mriqc_learn/data/your_new_classifier.joblib")

In addition, you could train the model with your data as long as you have subjective ratings, loading your prepared data using load_dataset function.

Let me know if you need additional help!

Cheers!

@andrew-yian-sun
Copy link

@jaimebarran
I was trying to see if I could run the baseline model. A couple of issues:

  1. When I install mriqc-learn (pip install mriqc-learn), the classifier.joblib file didn't come with the install (neither did production.py).

So I downloaded the raw classifier.joblib file from this repo and added it to where I thought it should be:
C:\Users\Andrew\anaconda3\Lib\site-packages\mriqc_learn\data

  1. However, when I try running your first code block to load the model:
from joblib import load
# Load the trained model
model = load(r"C:\Users\Andrew\anaconda3\Lib\site-packages\mriqc_learn\data\classifier.joblib") # check your path

I get the following error. Any ideas?

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], [line 3](vscode-notebook-cell:?execution_count=5&line=3)
      [1](vscode-notebook-cell:?execution_count=5&line=1) from joblib import load
      [2](vscode-notebook-cell:?execution_count=5&line=2) # Load the trained model
----> [3](vscode-notebook-cell:?execution_count=5&line=3) model = load(r"C:\Users\Andrew\anaconda3\Lib\site-packages\mriqc_learn\data\classifier.joblib")

File [c:\Users\Andrew\anaconda3\Lib\site-packages\joblib\numpy_pickle.py:658](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:658), in load(filename, mmap_mode)
    [652](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:652)             if isinstance(fobj, str):
    [653](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:653)                 # if the returned file object is a string, this means we
    [654](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:654)                 # try to load a pickle file generated with an version of
    [655](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:655)                 # Joblib so we load it with joblib compatibility function.
    [656](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:656)                 return load_compatibility(fobj)
--> [658](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:658)             obj = _unpickle(fobj, filename, mmap_mode)
    [659](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:659) return obj

File [c:\Users\Andrew\anaconda3\Lib\site-packages\joblib\numpy_pickle.py:577](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:577), in _unpickle(fobj, filename, mmap_mode)
    [575](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:575) obj = None
    [576](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:576) try:
--> [577](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:577)     obj = unpickler.load()
    [578](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:578)     if unpickler.compat_mode:
    [579](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:579)         warnings.warn("The file '%s' has been generated with a "
    [580](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:580)                       "joblib version less than 0.10. "
    [581](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:581)                       "Please regenerate this pickle file."
    [582](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:582)                       % filename,
...
File sklearn\tree\_tree.pyx:1418, in sklearn.tree._tree._check_node_ndarray()

ValueError: node array from the pickle has an incompatible dtype:
- expected: {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['<i8', '<i8', '<i8', '<f8', '<f8', '<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}
- got     : [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]

@jaimebarran
Copy link

jaimebarran commented May 1, 2024

Hi @andrew-yian-sun !

When I install mriqc-learn (pip install mriqc-learn), the classifier.joblib file didn't come with the install (neither did production.py).

Is this how is should be? @oesteban @celprov

when I try running your first code block to load the model... I get the following error

I see you are using model = load(r"C:\Users\Andrew\anaconda3\Lib\site-packages\mriqc_learn\data\classifier.joblib") # check your path. The mmap_mode parameter in joblib load is used to control the memory-mapping behavior of the loaded object. Memory-mapping is a method used to load data into memory more efficiently, which can be useful when working with large datasets.
Here's what each option means:

  • None: No memory-mapping. The data is loaded into memory normally.
  • 'r': Memory-map the file in read-only mode. The data is not loaded into memory, but instead a memory-mapped object is returned that behaves like a read-only array.
  • 'r+': Memory-map the file in read-write mode. The data is not loaded into memory, but instead a memory-mapped object is returned that behaves like a read-write array.
  • 'w+': Memory-map the file in write mode. The data is not loaded into memory, but instead a memory-mapped object is returned that behaves like a writeable array.
  • 'c': Memory-map the file in copy-on-write mode. The data is not loaded into memory, but instead a memory-mapped object is returned. Any modifications to the object will not be saved to the original file, but will instead create a copy in memory.
    If mmap_mode is not specified, the default behavior is None, meaning the data is loaded into memory normally.
    I think that the cause of your error is that you are loading the data in read-only mode. Try other modes and see if your problem persists.

PS: I didn't install mriqc-learn, I forked the repo and modify it my own way.

@andrew-yian-sun
Copy link

Hi @jaimebarran, thanks for the tip - but it seems like either way (forking the repo, trying different options for mmap_mode) result in the same error message. I wonder if it's because the model was created with an older version of joblib but my version is too recent? My version 1.4.0

@jaimebarran
Copy link

Hi @andrew-yian-sun,

Yes, it seems from your error code that

The file '%s' has been generated with a joblib version less than 0.10. Please regenerate this pickle file. % filename

You can try to regenerate the .joblib file with your up-to-date python libraries running /scripts/train_model.py. You can modify the columns (= IQMs) to drop in /models/production.py/init_pipeline()/pp.DropColumns(...). This will regenerate the classifier.joblib with your libraries. Then you can try to load it to see if it works now.

I was using joblib v1.2.0 and it worked with some warnings. I updated it to v1.4.0 and it worked without warnings.

Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants