-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy image #1232
base: main
Are you sure you want to change the base?
Lazy image #1232
Conversation
for more information, see https://pre-commit.ci
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1232 +/- ##
==========================================
- Coverage 84.51% 84.16% -0.35%
==========================================
Files 92 92
Lines 5966 6013 +47
==========================================
+ Hits 5042 5061 +19
- Misses 924 952 +28 ☔ View full report in Codecov by Sentry. |
Hi @romainVala. I'm not sure I understand what's going on here. Can you maybe share a usage example? Is this what @csparker247 had in mind? Do you know why the tests are failing? Can this be tested? Codecov is complaining that most of the new lines are not covered by the unit tests. |
Hi @fepegar I always thought there is no point to work on this direction because most transform are expecting the whole image to be loaded. But talking with @csparker247 make me realize two things I was excited to code the point 2, but now I am not sure this is the main need. First point is already a big improvement for torchio, (and it should be enougth for training) Any way this PR focus only on the first point. About a usage example, what is not clear in the one I give above ? it shows exactly the use case we are interested in : How to parse the whole volume with a Sorry for this PR which is not ready at all. I did not try to add tests since I am not yet sure this is the best way to go. In this comment #1224 (comment) I give alternative changes to do in torchio in order to make it possible. As you can see their are only two small issue for current code to work first one is to allow the Image construction with data containing other type than numppy or tensor (https://github.com/fepegar/torchio/blob/ec95cd31b5587cd3d1717b7b64f48e92a2f4f175/src/torchio/data/image.py#L490). This is a major change because the all logic of torchio is to rely on tensor. But since you the current code already handle the possibility to construct an Image without reading the data, I thinks it will be not a big change for torchio ... The current PR is an alternative coding, which make the choice to separate the 'usual' Image loading from a "virtual" loading by introducing a new Image class (LazzyImage) . But the drawback is that I need to duplicate part of the code with the same logic as Image Class. (there new test need to be added) buto I am not sure this is the best way to go May be the initial change I proposed here is a better solution Sorry for the (again) too long message, but I hope it still helps to understand ... |
As you say, this isn't the full thing I mentioned, but it certainly is a logical first step as far as I can tell. Another way to think about this is that these changes represent a minimal set of modifications one could make to the Image class (and certain library functions) which seem to make TorchIO more or less agnostic to the type of the array being wrapped by Image. I think this is generally a nice solution since it doesn't add any new imports or ever try to guess what those types could be. It does make me wonder, why does this need to be a new class and not just changes to the Image class? Is there a strong reason why Tensor was used as the internal storage type in the first place that might be getting overlooked here? The read_and_check() and clone() calls that still remain would make me wonder about the internal library assumptions. Maybe this class should get deployed and tested in the wild as an experimental feature for a bit, then its functionality will get merged back into Image? Lots of options. |
@csparker247 what part of the full thing is missing here ? Do I understand correctly that the missing part is what I mentioned in 2): ie having a full spatial (itk) transform coherent at on the whole volume but that is processed by patches ? If yes, can you explain some use case ? As mentioned in the previous discussion I have a solution for the Affine (and elastic) transform but this imply much more changes and my feeling is that it would be better to start with code cleaning to regroup all itk-related functions, and try to make a uninque Itk transform class. About the changes proposed here, I do agree that the extra class (LazyImage) may not be necessary. I would prefer changing the torchio assumption that every data need to be a tensor. (@fepegar we need you advice here ) Doing so the needed changes are minimal, but we need to be careful since it may induce some incoherence May be an in-between solution would be to just add an extra attribute, like "VirtualData" so that we can easily check and adapt the current logic where needed at least to give an explicit error message, if the user try to use a "full volume" transform on lazy Images, (since it will blow up the cpu memory) |
Yes, this is what I meant. What I eventually want is to apply a random transform to the whole volume and then sample random patches from it: transform = tio.RandomAffine(
scales=(0.9, 1.2),
degrees=15,
)
transformed = transform(subject)
for patch in sampler(transformed):
do_train(patch) I don't see how applying at the patch level gives you the same thing without changes to the internals of the transform classes here: for patch in sampler(subject):
# get new patch origin and dimensions to add buffer
origin, dims = ...
# reload the patch from the volume with this new size
patch_new = ...
# apply transform
# **random transform of patch is not the same as one transform of whole volume**
transformed = transform(patch_new)
# crop to expected size
patch = crop(patch_new, patch.shape)
do_train(patch) Anyway, like I say, I think this is a good first step. |
Yes it does. As I said it needs much bigger, changes (in the transform and in the patch sampler ) |
Then I misunderstood what you were proposing for Step 2, and that you were saying that Step 2 was already covered with this change. I see now that's not what you meant. |
Fixes #328 #1224
Description
Allow lazy loading, as describe previously (especially recently in #1224)
The previous PR #328 was intended to do so, but was never merged to the main, instead this pull request #454 was added in order to let the user defined is onw format (so including hdf5 for instance),
This is nice, since it avoid to add extra dependency (like h5py ). But it assume, the custom reader, will end up with a full tensor. Here we need to avoid reading the whole image,
The idea with this lazyImage class is not to use any transform (because most tranforms would require to load the whole volume) but just to use a sampler.
Once we get patches, one can then use any transform
every thing was almost already working out of the box, the important change I had to make is for the Image.load function: We do not allow multiple path since it would require a torch.cat
changes related to _parse_tensor _parse_tensor_shape are just to avoid line expecting torch.tensor so we may make less check here but it is a good start
same for repr function, change are only for memory and dtype attribute
finally important change in crop function to make it work without tensor (just indexed array), (almost nothing since we are no more dependent on sitk here !!!)
Checklist
CONTRIBUTING
docs and have a developer setup (especially important arepre-commit
andpytest
)pytest
make html
inside thedocs/
folder