-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using CH5 Files in Python #160
Comments
@roshankern: indeed the raw data for Importantly, Aspera will only give you access to the data in it original submitted format. For this study, all we have is CellH5 data for each well (as discussed in #158) and we introduced our custom For 1, I would also naively have expected For 2, as mentioned above, we only have the original data in CellH5 format so anything else would require some form of export. As you might be aware, we are actively using IDR to drive the OME-NGFF specification and a subset of images and plates have been converted into the cloud-optimized OME-Zarr format - see idr.github.io/ome-ngff-samples/idr.github.io/ome-ngff-samples. Would this be something of interest for your use case? If so, it should very easy to convert and upload a test plate from |
Thank you for the help @sbesson! I reached out to the CellProfiler team (CellProfiler/python-bioformats#159), but would still be interested in downloading/using the |
Thanks for the interest, I converted the first plate from
The sample plater above is hosted on EMBL-EBI Embassy object store. The Aspera service gives access to the data stored on the EMBL-EBI NFS servers so the two storage and download mechanisms are separate at the moment. The mass conversion of an entire study is not something we have done so far although we have talked about it several times internally. It raises several interesting questions including terms of storage & accessibility which will need to be resolved together with our partners at EMBL-EBI providing the underlying infrastructure. In this context, it would be useful to hear about your experience accessing this data e.g. what is the typical access speed when using |
Thanks for your work supporting us @sbesson! It has been a real learning experience trying to wrangle publicly available data! Lots of challenges and opportunities, which I'm sure you're well aware of :) I'll respond to many of your points below based on what @roshankern and I discussed. I'll also point you to this issue WayScience/mitocheck_data/issues/1, where we've outlined our decision process to pursue the
Roshan was successfully able to access and use the file format - but, given time constraints, this is no longer useful for us in the immediate term (see WayScience/mitocheck_data/issues/1)
Roshan and I tried to figure out the implications of this, and we landed on this explanation: If you indeed pursue the file format transfer, the original
I agree that all these questions are quite interesting... My lab intends to use IDR data heavily, so I am very much interested in helping to resolve these issues! I admire the IDR team effort on this front, especially in regards to the emphasis on metadata and ome-ngff. We have not tested either |
Yes the transformation into a cloud-optimized format might happen for some IDR studies in the future but for users the expectation is that the original data will remain available for download. Assuming you guys have settled on using Aspera for downloading the raw data, I think any remaining issue remains only at the python-bioformats front and I'll close this issue. |
After @pwalczysko's great help with #158, I am able to use the Aspera download client to download well data for
idr0013
in the form of a .ch5 file. From what I have read, the CH5(Cellh5) format is quite outdated and does not have much support. I have been able to open the files in ImageJ with the Bio Formats Plugin so the data is readable. However, when I try to install the CellH5 python library I have the same issue described in CellH5/cellh5#14. My attempts to load a CD5 file with python-bioformats result in the error:OSError: [Errno 22] Could not load the file as an image (see log for details)
.What is the best way to either:
idr0013
well data in a different format (e.g. tiff)Thanks!
CC @gwaygenomics to keep you in the loop
The text was updated successfully, but these errors were encountered: