-
Notifications
You must be signed in to change notification settings - Fork 11
feature/plugin-priority #162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
kmitcham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An explicitly defined default order would improve the clarity of behavior.
bioio/bio_image.py
Outdated
| extension against the known plugins. | ||
| - For each matching plugin, it tries to instantiate a reader and checks | ||
| if it supports the image. | ||
| - If multiple plugins match, optional ``plugin_priority`` takes precedence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an explicit definition of what happens with multiple matches, but plugin_priority is not set? I think we want that, even if the defined order is the first plug alphabetically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of an order but alphabetical has some specific issues. one being that we would be reccomending bioio-bioformats as our primary reader, which is not what we want to do. A thought that came out of a conversation between kevin and I was that we sort default plugin priority by the length of the extension list. This would move more specific readers to the top and more general readers to the bottom. Then if a user wants something specific they define a priority list or a single reader. This could have some pitfalls one I can think of is that ome-tiff has length 4 and tifffile has length 3. @toloudis what are your thoughts about a default order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this priority order?
- Check extensions in order of length (e.g., if plugins support
.c,.b.c, and.a.b.c, then ingestingfile.a.b.cwill first look for plugins supporting.a.b.c, then.b.c)- Among plugins that support the same extension, choose the one that supports the fewest extensions total.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im working on an implementation now for bioio-tiff-glob, bioio-ome-tiff, bioio-tifffile,bioio-bioformats our priority list would be
Extension: .tiff
----------------
[0] entrypoint: bioio-tiff-glob
Reader: bioio_tiff_glob.reader.Reader
Supported extensions: ['.tif', '.tiff']
[1] entrypoint: bioio-ome-tiff
Reader: bioio_ome_tiff.reader.Reader
Supported extensions: ['.ome.tiff', '.tiff', 'ome.tif', '.tif']
[2] entrypoint: bioio-tifffile
Reader: bioio_tifffile.reader.Reader
Supported extensions: ['tif', 'tiff', 'lsm']
[3] entrypoint: bioio-bioformats
Reader: bioio_bioformats.reader.Reader
Supported extensions: ['.1sc', '.2fl', '.acff', '.afi', '.afm', '.aim', '.al3d', '.am', '.amiramesh', '.apl', '.arf', '.avi', '.bin', '.bip', '.bmp', '.btf', '.c01', '.cfg', '.ch5', '.cif', '.cr2', '.crw', '.csv', '.cxd', '.czi', '.dat', '.dcm', '.dib', '.dm2', '.dm3', '.dm4', '.dti', '.dv', '.eps ', '.exp', '.fdf', '.fff', '.ffr', '.fits', '.flex', '.fli', '.frm', '.gel', '.gif', '.grey', '.hdf', '.hdr', '.hed', '.his', '.htd', '.html', '.hx', '.i2i', '.ics', '.ids', '.im3', '.img', '.ims', '.inr', '.ipl', '.ipm', '.ipw', '.jp2', '.jpg', '.jpk', '.jpx', '.l2d', '.labels', '.lei', '.lif', '.liff', '.lim', '.lms', '.lsm', '.mdb', '.mnc', '.mng', '.mod', '.mov', '.mrc', '.mrw', '.msr', '.mtb', '.mvd2', '.naf', '.nd', '.nd2', '.ndpi', '.ndpis', '.nef', '.nhdr', '.nii', '.nii.gz', '.nrrd', '.obf', '.obsep', '.oib', '.oif', '.oir', '.ome', '.ome.btf', '.ome.tf2', '.ome.tf8', '.ome.tif', '.ome.tiff', '.ome.xml', '.par', '.pbm', '.pcoraw', '.pcx', '.pds', '.pgm', '.pic', '.pict', '.png', '.pnl', '.ppm', '.pr3', '.psd', '.qptiff', '.r3d', '.raw', '.rcpnl', '.rec', '.scn', '.sdt', '.seq', '.sif', '.sld', '.sm2', '.sm3', '.spc', '.spe', '.spi', '.stk', '.stp', '.svs', '.sxm', '.tf2', '.tf8', '.tfr', '.tga', '.tif', '.tiff', '.tnb', '.top', '.txt', '.v', '.vms', '.vsi', '.vws', '.wat', '.wlz', '.xdce', '.xml', '.xqd', '.xqf', '.xv', '.xys', '.zfp', '.zfr', '.zvi']
This would essentially say the more specific a plugin you install the higher in the priority list ill be. ome-tiff-glob is extremely specific and has 2 families of support .tiff and .tif. bioio-ome-tifff also has 2 families but a larger extension list bioio-tifffile has 3 families and bioio-bioformats has a ton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is the intention to have a different priority list for every extension? That seems like a lot of effort for little reward (given the number of relevant extensions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we determine priority from ext list for each plugin based on (at bioio level):
- Fewer extension families first — Plugins that support fewer groups of suffix-related extensions (e.g., .ome.tiff + .tiff is one family) are considered more specific and are ranked higher.
- If tied, fewer raw extensions first — A plugin advertising fewer total extensions in its get_supported_extensions() list is treated as more focused and ranked higher.
- If still tied, alphabetical by plugin name — A stable, deterministic tie-breaker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the above description, the index of the plugin is its priority (0 being highest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The priority approach of @pgarrison resonates with how I like to think about the various plugins, especially where I want the (presumed) most specific plugin to be prioritized first, and seeing how @BrianWhitneyAI laid it out makes me more certain that its both logical and easily understood with minimal documentation (which, right now is actually quite hard). I've followed y'alls discussions for a while, and while I'm unhappy about the current prioritization (i.e. time-of-install) nothing else has stood out as a compelling alternative until this.
I think this proposed prioritization logic would cover all edge cases that I keep running in to, but it is nice that there still is this proposed API for a priority override. It will make troubleshooting a set of reader behaviors more quickly than just trying to pass the correct reader function.
bioio/plugins.py
Outdated
| Prioritized plugin list. | ||
| """ | ||
| if not plugin_priority: | ||
| return plugins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we do a default sort here? just to make it deterministic, and not based on whatever drives it now?
pgarrison
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I like this approach
bioio/bio_image.py
Outdated
| extension against the known plugins. | ||
| - For each matching plugin, it tries to instantiate a reader and checks | ||
| if it supports the image. | ||
| - If multiple plugins match, optional ``plugin_priority`` takes precedence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this priority order?
- Check extensions in order of length (e.g., if plugins support
.c,.b.c, and.a.b.c, then ingestingfile.a.b.cwill first look for plugins supporting.a.b.c, then.b.c)- Among plugins that support the same extension, choose the one that supports the fewest extensions total.
bioio/tests/test_plugins.py
Outdated
| monkeypatch.setattr( | ||
| bio_image, | ||
| "get_plugins", | ||
| lambda use_cache: fake_plugins_by_ext, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we define a default plugin order more explicitly, maybe this test can rely on that instead of modifying get_plugins
Co-authored-by: Philip Garrison <pgarrison@users.noreply.github.com>
|
I have a few thoughts I'd like to add.
|
bioio/bio_image.py
Outdated
| image: biob.types.ImageLike, | ||
| fs_kwargs: Dict[str, Any] = {}, | ||
| use_plugin_cache: bool = False, | ||
| plugin_priority: Optional[Sequence[Type[biob.reader.Reader]]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a thought, how difficult would it be to accept the names of the entrypoints here, rather than the Reader? That way the sequence could be constructed without having to do imports. It's just, simpler perhaps? I suppose the same would be nice if I could pass the entry point name to the reader parameter as well, maybe I should make a new issue? Using the entrypoint name would be much easier for colleagues that are less familiar with finding python objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be hesitant to allow a string parameter here. Importing the Readers directly allows us to tighten our typing and doesn't require a users knowledge of entrypoint logic. Im also not sure we want to point users towards entrypoint logic from a documentation standpoint as we have pushed so hard for the plugin reader structure.
|
I already create my own preferred ordering system as a workaround so this all makes sense to me as something that should be supported. I think I would be able to remove near all of it. The only value of this code that I would keep is having a configurable global setting override, but that would just become this new plugin priority list instead, so still using the new API. I mean maybe its worth looking at briefly just to make sure I haven't created something kinda useful? 🫣 |
This makes sense to me; fewer paramaters to learn and the same functionality.
@toloudis I don't think "reinstall in the priority order you want" is a great user experience. I haven't encountered any other library where install order is part of the API. For projects where you install from a lockfile, controlling install order is nearly impossible. And on larger projects it's complicated to keep plugin install order consistent across developers or machines. I'm surprised we haven't seen more "doesn't happen on my machine" issues related to this. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #162 +/- ##
==========================================
+ Coverage 83.14% 85.40% +2.25%
==========================================
Files 11 13 +2
Lines 724 911 +187
==========================================
+ Hits 602 778 +176
- Misses 122 133 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I added df9a7bd to show a possible priority policy. We can revert it if/when we decide a different route |
So, I think this is the cause of how I even identified bioio-devs/bioio-imageio#29 I haven't had time to fully debug, but I'm pretty sure the behavior is different between I hadn't made an issue because maybe I was going crazy, or maybe you had changed resolution order, but I did want to show where I ran into this. |
I have some suspicions about version management as well. Particularly around envs sharing installed package versions. |
I understand. I am not in love with "install order", and realize how silly it sounded to document how to install things in a certain order. In a world where users can write their own plugins which report supported file extensions, ties will have to be broken somehow. (e.g. There can be N different plugins installed, all claiming to support ".tiff") The feasibility report thing is a nice way to diagnose (assuming it works predictably and consistently). Anyway, I don't want to block progress on a better way to determine plugin precedence. It sounds like install order is truly unreliable due to differences in the sequence of processing pyproject.toml or lock files. |
kmitcham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more doc nits, but looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted time to do a more full review, and I think the biggest opinion I have is in https://github.com/bioio-devs/bioio/pull/162/files#r2595957308 about how to think about the code pathways and trying to avoid confusion.
I did have a go at testing out how this would change things for my design (i.e. my own ordering of plugin priority to overcome the current issues) and really like it! ndev-kit/ndevio#42. I especially like that I can pass reader=[Reader] with just one, and it will still use the fallbacks of bioio. This way I can emit a warning to users that their intended reader isn't working, but if they expect certain behavior they can easily "choose" a reader. This is just a better experience for someone working in napari.
If the proposed deterministic ordering is used (fewer suffixes prioritized as more specific) I will be very happy and overall these changes would be a big improvement to my experience with bioio, and I hope for others.
Thanks for the great work and continued tuning to get, what I think, is a key and fantastic feature. If it goes it as is, that will also work for me :)
bioio/bio_image.py
Outdated
|
|
||
| # Determine reader class based on available plugins | ||
| # Determine plugin with priority | ||
| plugin_priority: Optional[Sequence[Type[biob.reader.Reader]]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll offer my biggest opinion here. I think this discussion shows the challenge of condensing these behaviors to just the reader arg.
First, I find that type-dependent behavior, as here with reader=Reader and reader=[Reader], taking different code paths to be very confusing and requires clear documentation as to why. From my perspective passing a list of just 1 item should have the same behavior as passing that item outside a list.
oh I thought that if a user passed a list of readers, then we should only try those readers and then stop. that seems more predictable from an api standpoint but I can see why you might want to do it the other way too.
Second, I think @toloudis points out that perhaps this makes even more room for separating this functionality into two different args. With this in mind I would propose for BioImage:
reader: Reader | Sequence[Reader] | None = Noneshould try only those plugins if not None. This would be an extension of current API and result in ordered passing of each Reader, only.plugin_priority: Reader | Sequence[Reader] | None = Noneshould try these plugins "at the top of the list" of bioio's ordering, otherwise fallback to bioio's ordering, with the warning emitted as now during fallback- If
readerandplugin_priorityare both not None, then emit a warning to the user, but use thereaderpathway
This allows clear separation of the pathways and intent of each.
bioio/bio_image.py
Outdated
| image: biob.types.ImageLike, | ||
| reader: biob.reader.Reader, | ||
| reader: Optional[ | ||
| Union[Type[biob.reader.Reader], Sequence[Type[biob.reader.Reader]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe at this layer you could simplify by always passing in a sequence of Reader (or None)
| image: biob.types.ImageLike, | ||
| reader: Optional[Type[biob.reader.Reader]] = None, | ||
| reader: Optional[ | ||
| Union[Type[biob.reader.Reader], Sequence[Type[biob.reader.Reader]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inside this function, could convert single reader to [reader] as early as possible and then treat it as a sequence for the get_reader function call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved in 3d3f727
bioio/tests/test_bio_image.py
Outdated
| def test_bioimage_attempts_s3_read_with_anon_attr( | ||
| sample_text_file: pathlib.Path, dummy_plugin: str | ||
| sample_text_file: pathlib.Path, | ||
| plugin_factory: Callable[[Iterable[TestPluginSpec]], list[EntryPoint]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be naive about how to write Python tests, but the code looks weird having plugin_factory be an input argument to all the functions. A type definition to shorten Callable[[Iterable[TestWriterSpec]], list[EntryPoint]] would probably be nice for readability. Maybe name it as plugin_factory: PluginFactoryFixture ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in 3d3f727 Type definition doesn't quite work without some aliasing so I added some protocol support should work the same way
| ), | ||
| ], | ||
| ["list_reader_plugin_a", "list_reader_plugin_b"], | ||
| 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make one where the expected winner_idx is not zero. (add a plugin spec that has the wrong extension or something?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added two new cases, removing the test below this one and moving its case into the params see 3d3f727
| # Check if package name is in the output | ||
| assert dummy_plugin in output | ||
| # Assert: plugin name appears in dump | ||
| assert "dummy_plugin" in output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a good enough test of dump_plugins? what else could you assert here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test assertions are pre-existing. We could add some specific assertions about the wording but Im not sure what else we would want.
toloudis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one small thing that should be easy to update; see if you agree.
| self._plugin: Optional[PluginEntry] = None | ||
|
|
||
| # Normalize Single Reader | ||
| if reader is not None and isinstance(reader, type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this type check could be a little bit stronger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We go on to do another check in _get_reader so I think its ok.
if not readers or not all(
isinstance(r, type) and issubclass(r, biob.reader.Reader) for r in readers
):
raise TypeError(
"BioImage(reader=...) must be a Reader subclass, "
"a non-empty sequence of Reader subclasses, or None."
)
# References and relevant issues Depends on bioio-devs/bioio#162 / bioio v3.2.0 # Description 1. Remove bespoke prioritization system 2. Just detect installed plugins and offer suggestions to install known bioio plugins if the file can't be read by the currently installed 3. Depends on bioio's new deterministic orderering (fewest suffixes are prioritized as more specific) 4. Changes default "preferred_reader" setting to None, and passes this now as `reader=[Reader]` to allow fallback to bioio's ordering (though this fallback is in nImage now) 5. updates extensions used by directly copying from bioio libraries 6. simplifies napari reader check to just pass forward with less checks, expecting that our extensions check by napari will cover it
Description
The purpose of this PR is to improve our plugin selection. It aims to resolve bioio-devs/bioio-imageio#29 , bioio-devs/bioio-bioformats#45 and #161.
We have increasingly become aware that users like to have a lot of plugins installed at the same time. The priority of usage of a readers that support the same file (such as our tiff suites and imageio/bioformats catch-alls) didn't have an easy way for users to control what was used for their files other than specific reader definition. This PR allows for a priority list.
There was some hidden nuance around the plugin_feasibility_report (see #) The list of extensions is actually a approved list by bioio, the reader itself can support more file types (and often will). At runtime BioImage will only read files with the specific instruction.
plugin_feasibility_reportis not bound by extension and will try all readers regardless. This shows things likebioio-imageiosupporting tiff images. This is intended but not explained. This pr adds a warning to describe the case.The use case for this is if you have a file with a strange or new ext. you want to see if any of the readers can parse it. you use the
plugin_feasibility_reportto assess the file and see thatbioio-bioformatscan read it! you can then override the extension restriction by passing the thebioio-bioformatsreader directly.Description of Changes
plugin_priorityparameter allowing users to specify a reader priority schemaplugin_priority