feat(nf-tower): add dataset:// FileSystem provider#6866
feat(nf-tower): add dataset:// FileSystem provider#6866edmundmiller wants to merge 8 commits intonextflow-io:masterfrom
Conversation
✅ Deploy Preview for nextflow-docs-staging canceled.
|
DatasetFileSystemProvider: NIO SPI for 'dataset' scheme, read-only. Delegates I/O to the resolved cloud path's provider. Write ops throw ReadOnlyFileSystemException. DatasetFileSystem: minimal read-only FileSystem implementation. DatasetPath: Path wrapping dataset name + optional version. Parses dataset://name?version=N URIs. Lazy resolution to backing cloud path. Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
Resolves dataset name → cloud storage Path via Platform API:
1. GET /datasets?workspaceId=X → match by name → dataset ID
2. GET /datasets/{id}/versions → latest or specific version → cloud URL
3. FileHelper.asPath(cloudUrl) → concrete S3/GCS/Azure Path
Uses java.net.http.HttpClient with Bearer token auth. Config from
existing tower.* settings (endpoint, accessToken, workspaceId).
Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
DatasetPathFactory: FileSystemPathFactory extension point that intercepts dataset:// URIs in parseUri(), making FileHelper.asPath() and Nextflow.file() work transparently. Register DatasetFileSystemProvider via META-INF/services SPI. Add DatasetPathFactory to plugin extensionPoints in build.gradle. Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
DatasetPathTest: URI/string parsing, Path interface, equality DatasetFileSystemProviderTest: scheme, FS creation, read-only enforcement DatasetPathFactoryTest: URI matching, toUriString DatasetResolverTest: WireMock API error cases, auth, workspace param Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
WireMock Platform API + local file:// as resolved storage. Tests: - Full resolve → read file contents - Specific version selection - Latest version selection (picks highest) - Provider newInputStream/readAttributes delegation - Resolved path caching (API called once across multiple reads) Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
7c93913 to
8e16e49
Compare
Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
|
Neat idea but I dislike the name This will remove the requirement for an ephemeral URI for the dataset when running the pipeline and make it easier to make datasets a general data store. |
|
I think it would be better to do Maybe some of these can be omitted. But the extra scoping would keep the URI open to future extensions, like accessing data links. |
|
I like Ben's suggestion; it allows future extensions. Currently we have an experimental Fusion version that maps multiple Seqera resources to a path like this: I think that this new filesystem provider, even if it's only used for datasets now, can be extended to fetch other platform resources in the future. |
d9fa5cd to
d752bc2
Compare
Summary
NIO
FileSystemProviderfordataset://URIs in the nf-tower plugin. Resolves Seqera Platform dataset references to their backing cloud storage paths transparently — pipelines usefile('dataset://my-samplesheet')with zero code changes.Replaces the rejected
Channel.fromDataset()approach (seqeralabs/nextflow#19) with a platform-agnostic file system abstraction.Resolution flow
Components
DatasetFileSystemProvider.javaDatasetFileSystem.javaDatasetPath.javadataset://name?version=NDatasetResolver.groovyDatasetPathFactory.groovyPhase 1 scope
ReadOnlyFileSystemExceptionfile(),Channel.fromPath(), nf-schemasamplesheetToList()tower.*settings (endpoint,accessToken,workspaceId)Tests
Related
Channel.fromDataset)