Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does IGV support non amazon s3 buckets? #1636

Open
mrvollger opened this issue Jan 7, 2025 · 41 comments
Open

Does IGV support non amazon s3 buckets? #1636

mrvollger opened this issue Jan 7, 2025 · 41 comments
Assignees
Milestone

Comments

@mrvollger
Copy link

Hi Jim,

I have been trying to determine whether IGV can work with non-AWS S3 buckets. For example, the University of Washington hosts its own s3 endpoint (endpoint_url = https://s3.kopah.orci.washington.edu), on which we host a lot of data we want to view with IGV.

I can go into my .aws/credentials and .aws/config and set my defaults such that I can see into these buckets by default with the aws cli (or any other s3 cli):

$ aws s3 ls s3://stergachis/data/
                           PRE UDN/
                           PRE assemblies/
                           PRE bulk/
                           PRE iso-seq/

But when I try opening up IGV and viewing these buckets, nothing happens when I click the "Load from S3 bucket" menu item.

Any advice would be much appreciated! And if IGV doesn't support this it would be awesome if it could be added. I would think the change should be small, though I don't know much about the API for s3.

Thanks,
Mitchell

@jrobinso
Copy link
Contributor

jrobinso commented Jan 7, 2025

I don't have enough information to help you debug your problem. For starters what version of IGV are you using?

@ohofmann
Copy link

ohofmann commented Jan 7, 2025

And what is running that S3 server - MinIO or something similar? We have some on-prem S3-compatible object stores that we could test against if needed.

@jrobinso
Copy link
Contributor

jrobinso commented Jan 7, 2025

@mrvollger If you could setup a test instance and give me credentials I will look further, but first verify you are using the latest IGV. There were S3 issues with some versions of 2.18.x

@mrvollger
Copy link
Author

mrvollger commented Jan 7, 2025

Thanks for the reply!

And sorry, I wanted first to establish that this should work before troubleshooting, but it sounds like it should.

I downloaded the latest version of IGV MacOS Apple yesterday during my test, 2.19.1.

I am not sure about the details of what is running the s3 server. It is a service provided by UW, so I will inquire about it with them and get back to you. They also administer credentials, so I will request those to establish a test case.
https://hyak.uw.edu/docs/storage/kopah

Thanks,
Mitchell

@jrobinso
Copy link
Contributor

jrobinso commented Jan 7, 2025

If you're able to create test credentials you can email them privately to igv-team@broadinstitute.org. I don't know what I will be able to determine but I will look.

There might also be information in your igv log file (usually named igv0.log in /igv.

@mrvollger
Copy link
Author

mrvollger commented Jan 8, 2025

Hi Jim,

I just sent an email with the credentials!

This is an example error log I get when I try doing this:

INFO [Jan 06,2025 17:26] [Main] Startup  IGV Version 2.19.1 12/04/2024 02:15 PM
INFO [Jan 06,2025 17:26] [Main] Java 21.0.5 (build 21.0.5+11-LTS) 2024-10-15
INFO [Jan 06,2025 17:26] [Main] Java Vendor: Eclipse Adoptium https://adoptium.net/
INFO [Jan 06,2025 17:26] [Main] JVM: OpenJDK 64-Bit Server VM Temurin-21.0.5+11
INFO [Jan 06,2025 17:26] [Main] OS: Mac OS X 15.1.1 aarch64
INFO [Jan 06,2025 17:26] [Main] IGV Directory: /Users/mrvollger/igv
INFO [Jan 06,2025 17:26] [Main] Resoluction scale = 0.0
INFO [Jan 06,2025 17:26] [OAuthUtils] Loading Google oAuth properties
INFO [Jan 06,2025 17:26] [CommandListener] Listening on port 60151
INFO [Jan 06,2025 17:26] [AmazonUtils] AWS default credentials found. AWS support enabled.
INFO [Jan 06,2025 17:26] [GenomeManager] Loading genome: /Users/mrvollger/igv/genomes/hg38.json
INFO [Jan 06,2025 17:26] [TrackLoader] Loading resource:  https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/ncbiRefSeq.txt.gz
SEVERE [Jan 06,2025 17:26] [DefaultExceptionHandler] Unhandled exception
SEVERE [Jan 06,2025 17:26] [DefaultExceptionHandler] software.amazon.awssdk.services.s3.model.S3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: S3, Status Code: 403, Request ID: 1YD1BWPNB43A8ZBP, Extended Request ID: LFAY19Xg5oNJl0qlTAPQZdv/zeJz8QNNrltcBwNL7tCuO2RKed3+vOSSLAmzWPlMWxamM16RhbmXVjOLgucj4gYMm/9exK+dyTkPoF0T9xs=)
	at software.amazon.awssdk.protocols.xml@2.27.4/software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
	at software.amazon.awssdk.protocols.xml@2.27.4/software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)
	at software.amazon.awssdk.protocols.xml@2.27.4/software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85)
	at software.amazon.awssdk.protocols.xml@2.27.4/software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:43)
	at software.amazon.awssdk.awscore@2.27.4/software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:93)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$7(BaseClientHandler.java:279)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:50)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:38)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:74)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:43)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:79)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:41)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:55)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:39)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage2.executeRequest(RetryableStage2.java:93)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage2.execute(RetryableStage2.java:56)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage2.execute(RetryableStage2.java:36)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:53)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:35)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:82)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:210)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
	at software.amazon.awssdk.core@2.27.4/software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
	at software.amazon.awssdk.awscore@2.27.4/software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
	at software.amazon.awssdk.services.s3@2.27.4/software.amazon.awssdk.services.s3.DefaultS3Client.listBuckets(DefaultS3Client.java:6786)
	at org.igv/org.broad.igv.util.AmazonUtils.ListBucketsForUser(AmazonUtils.java:275)
	at org.igv/org.broad.igv.ui.IGVMenuBar.lambda$createAWSMenu$13(IGVMenuBar.java:1018)
	at java.desktop/javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
	at java.desktop/javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
	at java.desktop/javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
	at java.desktop/javax.swing.DefaultButtonModel.setPressed(Unknown Source)
	at java.desktop/javax.swing.AbstractButton.doClick(Unknown Source)
	at java.desktop/com.apple.laf.ScreenMenuItem.actionPerformed(Unknown Source)
	at java.desktop/java.awt.MenuItem.processActionEvent(Unknown Source)
	at java.desktop/java.awt.MenuItem.processEvent(Unknown Source)
	at java.desktop/java.awt.MenuComponent.dispatchEventImpl(Unknown Source)
	at java.desktop/java.awt.MenuComponent.dispatchEvent(Unknown Source)
	at java.desktop/java.awt.EventQueue.dispatchEventImpl(Unknown Source)
	at java.desktop/java.awt.EventQueue$4.run(Unknown Source)
	at java.desktop/java.awt.EventQueue$4.run(Unknown Source)
	at java.base/java.security.AccessController.doPrivileged(Unknown Source)
	at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
	at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
	at java.desktop/java.awt.EventQueue$5.run(Unknown Source)
	at java.desktop/java.awt.EventQueue$5.run(Unknown Source)
	at java.base/java.security.AccessController.doPrivileged(Unknown Source)
	at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
	at java.desktop/java.awt.EventQueue.dispatchEvent(Unknown Source)
	at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
	at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
	at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
	at java.desktop/java.awt.EventDispatchThread.pumpEvents(Unknown Source)
	at java.desktop/java.awt.EventDispatchThread.pumpEvents(Unknown Source)
	at java.desktop/java.awt.EventDispatchThread.run(Unknown Source)

Finally, our IT team shared that this s3 endpoint is hosted by a "Ceph RADOS Gateway"; hopefully, that is useful information, but I can always ask for more details if this does not help.

I really appreciate the help.

Thanks,
Mitchell

@brainstorm
Copy link
Contributor

AFAICT, currently non-Amazon buckets shouldn't work since there's no provision for so-called "custom endpoints" in the current S3Client builder:

s3Client = S3Client.builder().credentialsProvider(s3CredsProvider).region(region).build();

The change could potentially be relatively straightforward by (conditionally via oauth-config.json) using the .endpointOverride() builder method, see the following as an example:

aws/aws-sdk-java-v2#4996

BUT, as you can see in the issue above, this usecase doesn't seem to be fully supported by the Java2 AWS SDK, so caution must be exercised to not hit the wrong buttons (i.e avoiding parsing non-S3 URIs)... and/or find suitable workarounds.

I'll leave Jim putting those bits together, but let me know if you need assistance, @jrobinso ;)

@jrobinso
Copy link
Contributor

jrobinso commented Jan 8, 2025

Thanks @brainstorm, but they are not configuring oAuth. They are using .aws/credentials.

@brainstorm
Copy link
Contributor

Thanks @brainstorm, but they are not configuring oAuth. They are using .aws/credentials.

Oh, I didn't know the S3Client code paths became independent (imho they shouldn't), I will revisit the implementation... anyway, here's something that could help you as well, Jim:

https://stackoverflow.com/questions/52494196/is-there-any-way-to-specify-endpoint-url-in-aws-cli-config-file

@jrobinso
Copy link
Contributor

jrobinso commented Jan 9, 2025

@mrvollger Thanks for the test credentials and file. No success yet. The link @brainstorm includes above seemed somewhat promising, I tried setting the endpoint in the "config" file as suggested there but no luck, I got an exception that "region" was not set. I'm not sure what the region should be for a non-AWS endpoint, but setting it to us-east-1 did not work either (not surprisingly). There are probably IGV bugs as well, or implicit assumptions that the provider is AWS. So this is going to take more time than I have this evening, or even the next couple of days, but keep the issue open. I'm hopeful there will be a solution, but not certain. If you could ask the powers-that-be to leave those test credentials in place for a while that would be helpful.

@brainstorm I'm not sure what you are referring to wrt "S3Client code paths became independent" but its probably not relevant at the moment given other issues I've found.

@jrobinso jrobinso self-assigned this Jan 9, 2025
@jrobinso jrobinso added this to the 2.19.2 milestone Jan 9, 2025
@jrobinso
Copy link
Contributor

jrobinso commented Jan 9, 2025

@mrvollger One more question, it might be relevant or at least helpful to know what s3 compatible server you are running (e.e. minio).

Putting this here as possibly relevant, for myself mostly when I return to this. https://stackoverflow.com/questions/76780500/how-to-using-aws-s3-java-v2-sdk-to-talk-to-s3-compatible-storage-minio

@mrvollger
Copy link
Author

Thanks so much for helping! We are really glad that you are willing to put time into this.

We will keep the credentials valid as long as you need them, and if something happens, we can get another set up.

I think when using a custom endpoint, the region doesn't matter or isn't used. But I am only guessing that based of this AWS CLI test:

[07:34:35 AM]➜  aws s3 ls  s3://userprod/web/private/hashed.PacBio-Fiber-seq/PS00272/d194a9281237ccb68a548092fb434216/hg38/phased/PS00272.phased.bam --profile k_stergachislab --region 'us-west-2'
2025-01-01 07:47:48 93166395386 PS00272.phased.bam
2025-01-01 07:40:15   23788032 PS00272.phased.bam.bai
mvollger in n3459 in fire-figures on  main [!?]
[07:34:45 AM]➜  aws s3 ls  s3://userprod/web/private/hashed.PacBio-Fiber-seq/PS00272/d194a9281237ccb68a548092fb434216/hg38/phased/PS00272.phased.bam --profile k_stergachislab --region 'us-east-2'
2025-01-01 07:47:48 93166395386 PS00272.phased.bam
2025-01-01 07:40:15   23788032 PS00272.phased.bam.bai
mvollger in n3459 in fire-figures on  main [!?]
[07:34:49 AM]➜  aws s3 ls  s3://userprod/web/private/hashed.PacBio-Fiber-seq/PS00272/d194a9281237ccb68a548092fb434216/hg38/phased/PS00272.phased.bam --profile k_stergachislab --region 'us-east-1'
2025-01-01 07:47:48 93166395386 PS00272.phased.bam
2025-01-01 07:40:15   23788032 PS00272.phased.bam.bai

Last time, they told us the underlying vendor was Ceph RADOS Gateway, but I will ask more specifically what server is running the s3 instance and get back to you.

Thanks again!

@mrvollger
Copy link
Author

CCing @sjneph.

@jrobinso
Copy link
Contributor

jrobinso commented Jan 9, 2025

@mrvollger A workaround for the interim, if its allowed there, would be to create signed URLs with the AWS command line tools. The signed URLs should work.

@mrvollger
Copy link
Author

Thanks for the suggestion! We have been doing that for things in a real-time crunch, but we have a couple of applications where we cannot do that since the URLs are posted publicly. (I also hate when the links in my painfully constructed IGV sessions (XMLs) die).

We are still waiting for our IT to share details about what runs the s3 server.

@mrvollger
Copy link
Author

https://stackoverflow.com/questions/68005239/how-do-you-configure-the-endpoint-for-amazon-s3-by-using-the-aws-sdk-v2

Another potentially relevant stackoverflow but it does look like stuff covered already by the other two linked here.

@jrobinso
Copy link
Contributor

@mrvollger Thanks, that looks like potentially new info, and from a vendor (Cloudfare). Amazon is not interested in fixing bugs on this or making this easy, understandably. I'm in the middle of a block of work I must finish before concentrating on this, hopefully next week.

Feel free to continue posting links here.

@jrobinso
Copy link
Contributor

BTW it would be really helpful, possibly essential, to know what backend you are running. Cloudfare has some instructions here but I won't waste my time with this if you are running something else. https://developers.cloudflare.com/r2/examples/aws/aws-sdk-java/

@mrvollger
Copy link
Author

mrvollger commented Jan 11, 2025

We have a Ceph RADOS Gateway and I believe that is built on a service called librados (https://docs.ceph.com/en/latest/radosgw/) but I am still waiting to get official confirmation...

If I am right about this, I just found some docs that look relevant:
https://docs.ceph.com/en/latest/radosgw/s3/java/

UW also has some docs on programmatically accessing data on kopah (our s3 server), but it is focused on an API called boto3:
https://hyak.uw.edu/docs/storage/boto3

I will send a reminder email to make sure they follow up with me.

Thanks for the help!

@mrvollger
Copy link
Author

I have confirmed that our s3 server is based on Ceph Rados Gateway, and the Ceph version is 17.2.7 quincy.

@jrobinso
Copy link
Contributor

@mrvollger OK thanks, I'll get to this soon.

@jrobinso
Copy link
Contributor

jrobinso commented Jan 14, 2025

@mrvollger OK I hardcoded the access keys and endpoint URL and tried to connect as described here

https://docs.ceph.com/en/latest/radosgw/s3/java/

I'm getting the following error when trying to load the bam url you sent to igv-team, where endpoint url is the value you sent us. I have no idea where the "sjn." prefix is coming from, it is not in the code. That is the first part of the path for the bam file.

Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: sjn.s3....endpoint url...

When trying to use the AWS command line tools with the credentials you sent, I get the following error

An error occurred (InvalidAccessKeyId) when calling the ListBuckets operation: The AWS Access Key Id you provided does not exist in our records.

My credentials file consist of the following, is this all you have in yours?

[default]
aws_access_key_id=...
aws_secret_access_key=...
endpoint_url = https://s3....

@mrvollger
Copy link
Author

Darn, that is no good. I double-checked, and that is all I have in my credentials file.

Can you confirm for me that the AWS CLI (or another s3 CLI tool) can access these files?

aws s3 ls s3://sjn/web/private/hashed.PacBio-Fiber-seq/PS00272/4253ac0ccbc88914b67e543bbbfb94a7/hg38/phased/PS00272.phased.bam
2024-12-29 05:29:17 93166395386 PS00272.phased.bam
2024-12-29 05:22:52   23788032 PS00272.phased.bam.bai

I just want to make sure the problem is in the Java SDK and not something else.

That is weird, sjn, is the name of the bucket, but as you said, that shouldn't matter until later.

sjn is also @sjneph's user ID. @sjneph do you have any ideas?

@jrobinso
Copy link
Contributor

I cannot access this file (see below)

aws s3 ls s3://sjn/web/private/hashed.PacBio-Fiber-seq/PS00272/4253ac0ccbc88914b67e543bbbfb94a7/hg38/phased/PS00272.phased.bam


An error occurred (InvalidAccessKeyId) when calling the ListObjectsV2 operation: The AWS Access Key Id you provided does not exist in our records.

@jrobinso
Copy link
Contributor

jrobinso commented Jan 14, 2025

@mrvollger Could you give aws s3 ls s3://... a try with the credentials and endpoint you sent?

@jrobinso
Copy link
Contributor

jrobinso commented Jan 14, 2025

Well this is interesting. I was able to get as far as creating presigned URLs with a standalone java program. The presigned URLs start with

https://sjn.s3.kopah.orci.washington.e....

So that's where the "sjn" prefix is coming from. However I get an error when trying to use them. Its the same error I am getting with IGV

Error loading BAM file: java.net.UnknownHostException: sjn.s3.kopah.orci.washington.edu

So I don't think the credentials I'm using has permission to access that object. When we can get aws s3 ls s3:// to work I think the Java will also work.

@mrvollger
Copy link
Author

Hi @jrobinso,

Really sorry, I think the IT team set up the wrong permissions with the credenitals I shared and I got confused becuase I had some env variables that were taking precident making me think those keys were working.

I sent you a new email with my keys, and I checked that they have access to this bam file. Sorry for my mistake and taking up your time with the wrong credentials.

@jrobinso
Copy link
Contributor

I haven't received the new email yet. However it occurred to me there could be another problem. Is it possible that this host is not available outside your internal network? Do you have to be on campus or VPNed in to use it?

sjn.s3.kopah.orci.washington.edu

@mrvollger
Copy link
Author

mrvollger commented Jan 14, 2025

I just resent the email in case I did something wrong. Should I continue to use igv-team@broadinstitute.org? I can send somewhere else as well if you want.

I just used a generic VPN to access from Germany (server is in Seattle) to make sure it isn't a VPN or region issue and it seems to be working so I dont think that is the issue, but could be wrong.

@jrobinso
Copy link
Contributor

I got the second email. Yes address is correct.

@jrobinso
Copy link
Contributor

jrobinso commented Jan 15, 2025

I made some progress. I can't really test the menu because the buckets in the account I'm using seem to be empty of readable files, but I can list the bucket names.

I can now load the test bam with some hardcoded values in the java code. Obviously I can't leave those there but it works in principal. I'll have to give some thought now about how to make this work for your organization without breaking it, or changing behavior, for everyone else.

One issue I had to overcome, the Java SDK creates presigned URLs in the virtual host style by default, of the form

https://mvollger-test-igv.s3.kopah.orci.washington.edu/chr20.bam?X-Amz-Algorithm=AWS...

These do not work with your service. I found a configuration option to force path style urls which do work, e.g.

https://s3.kopah.orci.washington.edu/mvollger-test-igv/chr20.bam?X-Amz-Algorithm=AWS...

However it doesn't feel right to force this arbitrarily, and I am definitely not going to make a change like this that affects all users. So this might become an obscure IGV configuration option.

@mrvollger
Copy link
Author

Awesome great to hear!

I can add some files and sub-directories so that you see something in the menu; I will let you know when that happens.

Regarding the hardcoded values, would using an env variable for a non-default endpoint be reasonable? For example, most recent AWS CLI tools seem to respect AWS_ENDPOINT_URL=https://s3.kopah.orci.washington.edu.

Re the two styles of URLs, I am totally happy to click an obscure box, but I wanted to suggest an alternative. If it is quick and easy to test if the URL is valid, could you use this style by default https://mvollger-test-igv.s3.kopah.orci.washington.edu... test its validity, and if it fails, try the style https://s3.kopah.orci.washington.edu/mvollger-test-igv/... before giving up? Just a thought, but only if it would be easy to implement and fast.

Thanks for your help!

@jrobinso
Copy link
Contributor

Checking for an AWS_ENDPOINT_URL environment variable seems like a reasonable thing to do.

RE the url validity test, I'll think about it. yes that would for for the wash u site but its a complication and feels like something that should not be necessary. Is the real problem here with the CEPH service? I also find it odd the the AWS CLI defaults to one style, and the Java SDK the other. Neither of these things (CEPH or Amazon tools) are under our control, of course, so I will give your suggestion some thought.

@jrobinso
Copy link
Contributor

@mrvollger Are you on a Mac or Windows machine? If Windows where is the .aws directory?

@jrobinso
Copy link
Contributor

I think we can switch everything on the presence (or absence) of an endpoint_url. IGV can look for this in .aws/credentials, if the file exists, or it can be supplied as an IGV property. I suppose we can look for it as an environment property also. Environment properties are fine if they are defined in the shell IGV is running in.

At the end after all of this it was pretty simple. The SDK seems to do the right thing with credentials with no hardcoding, but ignores the endpoint url, so we have to get that manually somehow. If an endpoint url is found I'm just assuming that path style is used for the bucket in the url. If something more than that is needed it can be dealt with when it arrives.

@mrvollger
Copy link
Author

mrvollger commented Jan 15, 2025

That sounds perfect, thanks!

I am using a Linux/Mac but I think these are the locations on various OS
Linux and macOS: ~/.aws/config and ~/.aws/credentials
Windows: %USERPROFILE%.aws\config and %USERPROFILE%.aws\credentials
Taken from:
https://docs.aws.amazon.com/sdkref/latest/guide/file-location.html

In my eyes, the environment variable is inferior to these config files ^, just thought it would be easier to parse. But I'd prefer using the config files, so you can skip my env suggestion.

Thanks again!

@jrobinso
Copy link
Contributor

@mrvollger When I'm able to test the Amazon menu (bucket explorer) I will be ready to merge this to the main branch.

@mrvollger
Copy link
Author

mrvollger commented Jan 16, 2025

@jrobinso sorry for the delay; under this new subdirectory, s3://mvollger-test-igv/more-test-bams/, there should be three new bams, a.bam, b.bam, and c.bam, along with their indexes.

I quickly tested presigned URLs for the indexs and I think all the premissions are set correctly so they will appear and work in IGV like the chr20.bam.

Thanks!

@jrobinso
Copy link
Contributor

That did expose more issues with virtual vs path style URLs but they should be fixed now.

This should be working now in the snapshot build, available here: https://igv.org/doc/desktop/#DownloadSnapshot/. The required special casing is triggered by the presence of an endpoint url in any of the following, searched in this order

  1. IGV user preference (advanced tab)
  2. oauth config json if using Cognito (property "endpoint_url")
  3. environment variable AWS_ENDPOINT_URL
  4. .aws/credentials file
  5. .aws/config file

I did have to make a change to how bams with .csi indexes are loaded. Selecting bam file alone would result in IGV searching for a ".bai" index, which would fail preventing loading the bam. This is unrelated to the current issue, it would affect anyone using the s3 loader. I'm surprised this hasn't been reported before. To fix this I enabled selection of both bam and index, as seen in screenshot.

Image

@mrvollger
Copy link
Author

Fantastic, we will start testing right away!

@mrvollger
Copy link
Author

Hi @jrobinso,

Everything seems to be working on our end (save some weird s3 issues that are our own).

Really appreciate it!

Cheers,
Mitchell

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants