-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does IGV support non amazon s3 buckets? #1636
Comments
I don't have enough information to help you debug your problem. For starters what version of IGV are you using? |
And what is running that S3 server - MinIO or something similar? We have some on-prem S3-compatible object stores that we could test against if needed. |
@mrvollger If you could setup a test instance and give me credentials I will look further, but first verify you are using the latest IGV. There were S3 issues with some versions of 2.18.x |
Thanks for the reply! And sorry, I wanted first to establish that this should work before troubleshooting, but it sounds like it should. I downloaded the latest version of IGV MacOS Apple yesterday during my test, 2.19.1. I am not sure about the details of what is running the s3 server. It is a service provided by UW, so I will inquire about it with them and get back to you. They also administer credentials, so I will request those to establish a test case. Thanks, |
If you're able to create test credentials you can email them privately to igv-team@broadinstitute.org. I don't know what I will be able to determine but I will look. There might also be information in your igv log file (usually named igv0.log in /igv. |
Hi Jim, I just sent an email with the credentials! This is an example error log I get when I try doing this:
Finally, our IT team shared that this s3 endpoint is hosted by a "Ceph RADOS Gateway"; hopefully, that is useful information, but I can always ask for more details if this does not help. I really appreciate the help. Thanks, |
AFAICT, currently non-Amazon buckets shouldn't work since there's no provision for so-called "custom endpoints" in the current
The change could potentially be relatively straightforward by (conditionally via BUT, as you can see in the issue above, this usecase doesn't seem to be fully supported by the Java2 AWS SDK, so caution must be exercised to not hit the wrong buttons (i.e avoiding parsing non-S3 URIs)... and/or find suitable workarounds. I'll leave Jim putting those bits together, but let me know if you need assistance, @jrobinso ;) |
Thanks @brainstorm, but they are not configuring oAuth. They are using .aws/credentials. |
Oh, I didn't know the S3Client code paths became independent (imho they shouldn't), I will revisit the implementation... anyway, here's something that could help you as well, Jim: |
@mrvollger Thanks for the test credentials and file. No success yet. The link @brainstorm includes above seemed somewhat promising, I tried setting the endpoint in the "config" file as suggested there but no luck, I got an exception that "region" was not set. I'm not sure what the region should be for a non-AWS endpoint, but setting it to us-east-1 did not work either (not surprisingly). There are probably IGV bugs as well, or implicit assumptions that the provider is AWS. So this is going to take more time than I have this evening, or even the next couple of days, but keep the issue open. I'm hopeful there will be a solution, but not certain. If you could ask the powers-that-be to leave those test credentials in place for a while that would be helpful. @brainstorm I'm not sure what you are referring to wrt "S3Client code paths became independent" but its probably not relevant at the moment given other issues I've found. |
@mrvollger One more question, it might be relevant or at least helpful to know what s3 compatible server you are running (e.e. minio). Putting this here as possibly relevant, for myself mostly when I return to this. https://stackoverflow.com/questions/76780500/how-to-using-aws-s3-java-v2-sdk-to-talk-to-s3-compatible-storage-minio |
Thanks so much for helping! We are really glad that you are willing to put time into this. We will keep the credentials valid as long as you need them, and if something happens, we can get another set up. I think when using a custom endpoint, the region doesn't matter or isn't used. But I am only guessing that based of this AWS CLI test:
Last time, they told us the underlying vendor was Ceph RADOS Gateway, but I will ask more specifically what server is running the s3 instance and get back to you. Thanks again! |
CCing @sjneph. |
@mrvollger A workaround for the interim, if its allowed there, would be to create signed URLs with the AWS command line tools. The signed URLs should work. |
Thanks for the suggestion! We have been doing that for things in a real-time crunch, but we have a couple of applications where we cannot do that since the URLs are posted publicly. (I also hate when the links in my painfully constructed IGV sessions (XMLs) die). We are still waiting for our IT to share details about what runs the s3 server. |
Another potentially relevant stackoverflow but it does look like stuff covered already by the other two linked here. |
@mrvollger Thanks, that looks like potentially new info, and from a vendor (Cloudfare). Amazon is not interested in fixing bugs on this or making this easy, understandably. I'm in the middle of a block of work I must finish before concentrating on this, hopefully next week. Feel free to continue posting links here. |
BTW it would be really helpful, possibly essential, to know what backend you are running. Cloudfare has some instructions here but I won't waste my time with this if you are running something else. https://developers.cloudflare.com/r2/examples/aws/aws-sdk-java/ |
We have a Ceph RADOS Gateway and I believe that is built on a service called librados (https://docs.ceph.com/en/latest/radosgw/) but I am still waiting to get official confirmation... If I am right about this, I just found some docs that look relevant: UW also has some docs on programmatically accessing data on kopah (our s3 server), but it is focused on an API called boto3: I will send a reminder email to make sure they follow up with me. Thanks for the help! |
I have confirmed that our s3 server is based on Ceph Rados Gateway, and the Ceph version is 17.2.7 quincy. |
@mrvollger OK thanks, I'll get to this soon. |
@mrvollger OK I hardcoded the access keys and endpoint URL and tried to connect as described here https://docs.ceph.com/en/latest/radosgw/s3/java/ I'm getting the following error when trying to load the bam url you sent to igv-team, where endpoint url is the value you sent us. I have no idea where the "sjn." prefix is coming from, it is not in the code. That is the first part of the path for the bam file.
When trying to use the AWS command line tools with the credentials you sent, I get the following error
My credentials file consist of the following, is this all you have in yours?
|
Darn, that is no good. I double-checked, and that is all I have in my credentials file. Can you confirm for me that the AWS CLI (or another s3 CLI tool) can access these files?
I just want to make sure the problem is in the Java SDK and not something else. That is weird, sjn, is the name of the bucket, but as you said, that shouldn't matter until later.
|
I cannot access this file (see below)
|
@mrvollger Could you give |
Well this is interesting. I was able to get as far as creating presigned URLs with a standalone java program. The presigned URLs start with
So that's where the "sjn" prefix is coming from. However I get an error when trying to use them. Its the same error I am getting with IGV
So I don't think the credentials I'm using has permission to access that object. When we can get |
Hi @jrobinso, Really sorry, I think the IT team set up the wrong permissions with the credenitals I shared and I got confused becuase I had some env variables that were taking precident making me think those keys were working. I sent you a new email with my keys, and I checked that they have access to this bam file. Sorry for my mistake and taking up your time with the wrong credentials. |
I haven't received the new email yet. However it occurred to me there could be another problem. Is it possible that this host is not available outside your internal network? Do you have to be on campus or VPNed in to use it?
|
I just resent the email in case I did something wrong. Should I continue to use igv-team@broadinstitute.org? I can send somewhere else as well if you want. I just used a generic VPN to access from Germany (server is in Seattle) to make sure it isn't a VPN or region issue and it seems to be working so I dont think that is the issue, but could be wrong. |
I got the second email. Yes address is correct. |
I made some progress. I can't really test the menu because the buckets in the account I'm using seem to be empty of readable files, but I can list the bucket names. I can now load the test bam with some hardcoded values in the java code. Obviously I can't leave those there but it works in principal. I'll have to give some thought now about how to make this work for your organization without breaking it, or changing behavior, for everyone else. One issue I had to overcome, the Java SDK creates presigned URLs in the virtual host style by default, of the form
These do not work with your service. I found a configuration option to force path style urls which do work, e.g.
However it doesn't feel right to force this arbitrarily, and I am definitely not going to make a change like this that affects all users. So this might become an obscure IGV configuration option. |
Awesome great to hear! I can add some files and sub-directories so that you see something in the menu; I will let you know when that happens. Regarding the hardcoded values, would using an env variable for a non-default endpoint be reasonable? For example, most recent AWS CLI tools seem to respect Re the two styles of URLs, I am totally happy to click an obscure box, but I wanted to suggest an alternative. If it is quick and easy to test if the URL is valid, could you use this style by default Thanks for your help! |
Checking for an AWS_ENDPOINT_URL environment variable seems like a reasonable thing to do. RE the url validity test, I'll think about it. yes that would for for the wash u site but its a complication and feels like something that should not be necessary. Is the real problem here with the CEPH service? I also find it odd the the AWS CLI defaults to one style, and the Java SDK the other. Neither of these things (CEPH or Amazon tools) are under our control, of course, so I will give your suggestion some thought. |
@mrvollger Are you on a Mac or Windows machine? If Windows where is the .aws directory? |
I think we can switch everything on the presence (or absence) of an endpoint_url. IGV can look for this in .aws/credentials, if the file exists, or it can be supplied as an IGV property. I suppose we can look for it as an environment property also. Environment properties are fine if they are defined in the shell IGV is running in. At the end after all of this it was pretty simple. The SDK seems to do the right thing with credentials with no hardcoding, but ignores the endpoint url, so we have to get that manually somehow. If an endpoint url is found I'm just assuming that path style is used for the bucket in the url. If something more than that is needed it can be dealt with when it arrives. |
That sounds perfect, thanks! I am using a Linux/Mac but I think these are the locations on various OS In my eyes, the environment variable is inferior to these config files ^, just thought it would be easier to parse. But I'd prefer using the config files, so you can skip my env suggestion. Thanks again! |
@mrvollger When I'm able to test the Amazon menu (bucket explorer) I will be ready to merge this to the main branch. |
@jrobinso sorry for the delay; under this new subdirectory, I quickly tested presigned URLs for the indexs and I think all the premissions are set correctly so they will appear and work in IGV like the Thanks! |
That did expose more issues with virtual vs path style URLs but they should be fixed now. This should be working now in the snapshot build, available here: https://igv.org/doc/desktop/#DownloadSnapshot/. The required special casing is triggered by the presence of an endpoint url in any of the following, searched in this order
I did have to make a change to how bams with .csi indexes are loaded. Selecting bam file alone would result in IGV searching for a ".bai" index, which would fail preventing loading the bam. This is unrelated to the current issue, it would affect anyone using the s3 loader. I'm surprised this hasn't been reported before. To fix this I enabled selection of both bam and index, as seen in screenshot. |
Fantastic, we will start testing right away! |
Hi @jrobinso, Everything seems to be working on our end (save some weird s3 issues that are our own). Really appreciate it! Cheers, |
Hi Jim,
I have been trying to determine whether IGV can work with non-AWS S3 buckets. For example, the University of Washington hosts its own s3 endpoint (endpoint_url = https://s3.kopah.orci.washington.edu), on which we host a lot of data we want to view with IGV.
I can go into my .aws/credentials and .aws/config and set my defaults such that I can see into these buckets by default with the aws cli (or any other s3 cli):
But when I try opening up IGV and viewing these buckets, nothing happens when I click the "Load from S3 bucket" menu item.
Any advice would be much appreciated! And if IGV doesn't support this it would be awesome if it could be added. I would think the change should be small, though I don't know much about the API for s3.
Thanks,
Mitchell
The text was updated successfully, but these errors were encountered: