Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data explorer] basic data links operations support #411

Merged
merged 27 commits into from
Aug 7, 2024

Conversation

weronikasosnowskaseqera
Copy link
Contributor

@weronikasosnowskaseqera weronikasosnowskaseqera commented May 22, 2024

Description

Closes #405 #406 #407 #413

Output

When tw data-links list -w 98363125922927 -n adrian -p aws -r us-east-1:

  Data links at [Org / Wsp] workspace:

 ID                                        | Provider | Name                | Resource ref             | Region    
-------------------------------------------+----------+---------------------+--------------------------+-----------
 v1-cloud-b89b60014c225c11f59048294354d174 | aws      | adrian-navarro-test | s3://adrian-navarro-test | us-east-1 

  Showing from 0 to 99 from a total of 1 entries. 

@weronikasosnowskaseqera weronikasosnowskaseqera changed the base branch from feature/381-data-explorer-implementation to master June 20, 2024 10:29
@weronikasosnowskaseqera weronikasosnowskaseqera changed the title [Data explorer] WIP: List data links support [Data explorer] WIP: basic data links operations support Jul 10, 2024
@weronikasosnowskaseqera weronikasosnowskaseqera changed the title [Data explorer] WIP: basic data links operations support [Data explorer] basic data links operations support Jul 17, 2024
@robnewman
Copy link
Member

Validation tests:

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links list -w seqeralabs/showcase

  Data links at [seqeralabs / showcase] workspace:

Data links are being fetched: result might be incomplete, launch the command again to check the status
 ID                                       | Provider | Name                           | Resource ref                                                    | Region    
------------------------------------------+----------+--------------------------------+-----------------------------------------------------------------+-----------
 v1-user-09705781697816b62f9454bc4b9434b4 | aws      | vscode-analysis-demo           | s3://seqera-development-permanent-bucket/studios-demo/vscode/   | eu-west-2 
 v1-user-0dede00fabbc4b9e2610261822a2d6ae | aws      | seqeralabs-showcase            | s3://seqeralabs-showcase                                        | eu-west-1 
 v1-user-171aa8801cabe4af71500335f193d649 | aws      | projectA-rnaseq-analysis       | s3://seqeralabs-showcase/demo/nf-core-rnaseq/                   | eu-west-1 
 v1-user-29786f04b38afd495ca55352eccb7f86 | aws      | openfold                       | s3://openfold/                                                  | us-east-1 
 v1-user-3489f8107ca32d21258f4528bee1c52b | aws      | ngi-igenomes                   | s3://ngi-igenomes                                               | eu-west-1 
 v1-user-579c2887de5b4d442c85d0eda5eb27e6 | aws      | The_Cancer_Genome_Atlas        | s3://tcga-2-open/                                               | us-east-1 
 v1-user-65f9d07f792db42810ebe6ddc97f38cf | aws      | rnaseq_testfull_diffab_results | s3://seqeralabs-showcase/nf-core-differentialabundance/results/ | eu-west-1 
 v1-user-69c990ae23ee22ff1951e10e06309486 | aws      | nextflow-summit                | s3://nextflow-summit                                            | eu-west-1 
 v1-user-6d8f44c239e2a098b3e02e918612452a | aws      | 1000genomes                    | s3://1000genomes                                                | us-east-1 
 v1-user-898a2b1b249777cb70649d4783c624c7 | aws      | jupyter-analysis-demo          | s3://seqera-development-permanent-bucket/studios-demo/jupyter/  | eu-west-2 
 v1-user-90b0d72e0c7b13f72cd3898811f056dc | aws      | rstudio-analysis-demo          | s3://seqera-development-permanent-bucket/studios-demo/rstudio/  | eu-west-2 
 v1-user-bb4fa9625a44721510c47ac1cb97905b | aws      | genome-in-a-bottle             | s3://giab                                                       | us-east-1 
 v1-user-e7bf26921ba74032bd6ae1870df381fc | aws      | NCBI_Sequence_Read_Archive_SRA | s3://sra-pub-src-1/                                             | us-east-1 

  Showing from 0 to 99 from a total of 13 entries.

@robnewman
Copy link
Member

robnewman commented Aug 2, 2024

Error adding a data link for a public bucket (no creds required):

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links add -w seqeralabs/showcase -n Common_Crawl -u s3://commoncrawl/ -p aws

 ERROR: Credentials for 'commoncrawl' are missing or not valid

If I specify valid creds from the workspace, same error:

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links add -w seqeralabs/showcase -n Common_Crawl -u s3://commoncrawl/ -p aws -c seqera_aws_development_credentials

 ERROR: Unknown. Check that the provided identifier is correct.

Hmmm. But with other buckets it works:

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links add -w seqeralabs/showcase -n TCGA -u s3://tcga-2-open -p aws                   

  Data link created:

 ID                                       | Provider | Name | Resource ref     | Region    
------------------------------------------+----------+------+------------------+-----------
 v1-user-b97a303c349802a874fd4fa423b69f8f | aws      | TCGA | s3://tcga-2-open | us-east-1 

Browse command doesn't return expected results:

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links browse -w seqeralabs/showcase -i v1-user-b97a303c349802a874fd4fa423b69f8f

 ERROR: Missing the required parameter 'path' when calling exploreDataLink1

@robnewman
Copy link
Member

Delete command works as expected:

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links delete -w seqeralabs/showcase -i v1-user-b97a303c349802a874fd4fa423b69f8f

  Data link 'v1-user-b97a303c349802a874fd4fa423b69f8f' deleted at '138659136604200' workspace.

@robnewman
Copy link
Member

Update command works as expected:

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links add -w seqeralabs/showcase -n TCGA -u s3://tcga-2-open -p aws            

  Data link created:

 ID                                       | Provider | Name | Resource ref     | Region    
------------------------------------------+----------+------+------------------+-----------
 v1-user-2801e3780af40e52bcfcebed1ea13e60 | aws      | TCGA | s3://tcga-2-open | us-east-1 

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links update -n FooBarBaz -w seqeralabs/showcase -i v1-user-2801e3780af40e52bcfcebed1ea13e60

  Data link updated:

 ID                                       | Provider | Name      | Resource ref     | Region    
------------------------------------------+----------+-----------+------------------+-----------
 v1-user-2801e3780af40e52bcfcebed1ea13e60 | aws      | FooBarBaz | s3://tcga-2-open | us-east-1 

@robnewman
Copy link
Member

robnewman commented Aug 2, 2024

Would be nice if browse and delete actions accepted the data link name arg instead of just the id because the name is also unique to the wksp, correct? You can't get the id from the GUI, but you can get the name

@weronikasosnowskaseqera
Copy link
Contributor Author

@robnewman I'm afraid that getting data link by name is not possible because name is unique only in context of custom or cloud data links. ID is visible in address, same thing we do for credentials if I'm not mistaken.

@weronikasosnowskaseqera
Copy link
Contributor Author

@robnewman Hey Rob, thank you a lot for testing!
From what I checked there were 2 issues:

  • Non-optional credentials
  • Credentials reference only by id not name

I did couple of tests to confirm that those cases work but I would appreciate you checking again.
Thanks!

@robnewman
Copy link
Member

add (for public buckets with no creds) and browse working as expected:

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links add -w seqeralabs/showcase -n POWER -u s3://power-analysis-ready-datastore/ -p aws       

  Data link created:

 ID                                       | Provider | Name  | Resource ref                         | Region    
------------------------------------------+----------+-------+--------------------------------------+-----------
 v1-user-af0669f727f67e04e0fa64b0c2d5dcb4 | aws      | POWER | s3://power-analysis-ready-datastore/ | us-west-2 

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links browse -w seqeralabs/showcase -i v1-user-af0669f727f67e04e0fa64b0c2d5dcb4

  Content of 's3://power-analysis-ready-datastore/' and path 'null':

 Type   | Name                                    | Size   
--------+-----------------------------------------+--------
 FILE   | ceres.json                              | 28     
 FILE   | extra_last_data_processing.json         | 28     
 FILE   | extra_last_data_sync.json               | 28     
 FILE   | extra_processing_allowed.json           | 23     
 FILE   | flashflux.json                          | 28     
 FILE   | geos5124.json                           | 28     
 FILE   | imerg-final.json                        | 28     
 FILE   | imerg-late.json                         | 28     
 FILE   | index.html                              | 147275 
 FILE   | last_data_processing.json               | 28     
 FILE   | last_data_sync.json                     | 28     
 FILE   | merra2.json                             | 28     
 FILE   | processing_allowed.json                 | 23     
 FOLDER | power_901_annual_meteorology_utc.zarr/  | 0      
 FOLDER | power_901_annual_radiation_utc.zarr/    | 0      
 FOLDER | power_901_constants.zarr/               | 0      
 FOLDER | power_901_daily_meteorology_lst.zarr/   | 0      
 FOLDER | power_901_daily_meteorology_utc.zarr/   | 0      
 FOLDER | power_901_daily_precipitation_utc.zarr/ | 0      
 FOLDER | power_901_daily_radiation_lst.zarr/     | 0      
 FOLDER | power_901_daily_radiation_utc.zarr/     | 0      
 FOLDER | power_901_hourly_meteorology_utc.zarr/  | 0      
 FOLDER | power_901_hourly_radiation_utc.zarr/    | 0      
 FOLDER | power_901_monthly_meteorology_utc.zarr/ | 0      
 FOLDER | power_901_monthly_radiation_utc.zarr/   | 0

@robnewman
Copy link
Member

Works for GCP public buckets:

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links add -w seqeralabs/showcase -n public-data-landsat -u gs://gcp-public-data-landsat -p google

  Data link created:

 ID                                       | Provider | Name                | Resource ref                 | Region 
------------------------------------------+----------+---------------------+------------------------------+--------
 v1-user-1b4824bd9560a45acc88e569028b0bb3 | google   | public-data-landsat | gs://gcp-public-data-landsat | us     

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links browse -w seqeralabs/showcase -i v1-user-1b4824bd9560a45acc88e569028b0bb3                  

  Content of 'gs://gcp-public-data-landsat' and path 'null':

 Type   | Name         | Size      
--------+--------------+-----------
 FILE   | index.csv.gz | 767466786 
 FOLDER | LC08/        | 0         
 FOLDER | LE07/        | 0         
 FOLDER | LM01/        | 0         
 FOLDER | LM02/        | 0         
 FOLDER | LM03/        | 0         
 FOLDER | LM04/        | 0         
 FOLDER | LM05/        | 0         
 FOLDER | LO08/        | 0         
 FOLDER | LT04/        | 0         
 FOLDER | LT05/        | 0         
 FOLDER | LT08/        | 0

@robnewman
Copy link
Member

robnewman commented Aug 5, 2024

@weronikasosnowskaseqera Azure command doesn't work as expected (works from the UI) when using the credential name but works just fine when using the id:

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links add -w seqeralabs/showcase -n azure-test -u az://seqeralabs.azure-benchmarking -p azure -c seqera_azure_credentials

 ERROR: Credentials seqera_azure_credentials not found

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw credentials list -w seqeralabs/showcase | grep seqera_azure_credentials
     4B0YNvZcTEh1mLZuRbMRbq | azure         | seqera_azure_credentials               | Tue, 6 Aug 2024 11:03:12 GMT

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links add -w seqeralabs/showcase -n azure-test -u az://seqeralabs.azure-benchmarking -p azure -c 4B0YNvZcTEh1mLZuRbMRbq  

  Data link created:

 ID                                       | Provider | Name       | Resource ref                       | Region 
------------------------------------------+----------+------------+------------------------------------+--------
 v1-user-6bf157041871dedf7a15523ec6c48cf0 | azure    | azure-test | az://seqeralabs.azure-benchmarking |    

@robnewman
Copy link
Member

Same problem with the browse command for custom Azure data-link:

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links browse -w seqeralabs/showcase -i v1-user-6bf157041871dedf7a15523ec6c48cf0 -c seqera_azure_credentials

 ERROR: Unknown. Check that the provided identifier is correct.

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links browse -w seqeralabs/showcase -i v1-user-6bf157041871dedf7a15523ec6c48cf0 -c 4B0YNvZcTEh1mLZuRbMRbq  

  Content of 'az://seqeralabs.azure-benchmarking' and path 'null':

 Type   | Name    | Size 
--------+---------+------
 FOLDER | .cache/ | 0

@robnewman
Copy link
Member

robnewman commented Aug 6, 2024

Interesting! The same issue appears with GCP and AWS:

GCP

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links browse -w seqeralabs/showcase -i v1-cloud-450a6c3a56f40669e0cd6453746df5cd -c seqera_gcp_credentials

 ERROR: Unknown. Check that the provided identifier is correct.

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links browse -w seqeralabs/showcase -i v1-cloud-450a6c3a56f40669e0cd6453746df5cd -c 1EB98PuvwvUCiOkp05b6Ba

  Content of 'gs://leak-test' and path 'null':

 Type | Name                        | Size     
------+-----------------------------+----------
 FILE | flight_recording_rnaseq.jfr | 36384783 

AWS:

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links browse -w seqeralabs/showcase -i v1-cloud-ed9093a21bd1eca20e0c46b16ab48edc -c seqera_aws_development_credentials

 ERROR: Unknown. Check that the provided identifier is correct.

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links browse -w seqeralabs/showcase -i v1-cloud-ed9093a21bd1eca20e0c46b16ab48edc -c 3q1zkL3Qptdlz94Az3Nuzn            

  Content of 's3://nf-oregon' and path 'null':

 Type   | Name     | Size 
--------+----------+------
 FOLDER | .cache/  | 0    
 FOLDER | graham/  | 0    
 FOLDER | scratch/ | 0  

Perhaps I'm using the CLI wrong and only the credentials ID is supported (you cannot use the credentials name)

@robnewman
Copy link
Member

robnewman commented Aug 6, 2024

Ack - was using the old version! 🙀 Recompiled, tested and works as expected:

AWS:

./build/native/nativeCompile/tw data-links browse -w seqeralabs/showcase -i v1-cloud-ed9093a21bd1eca20e0c46b16ab48edc -c seqera_aws_development_credentials

  Content of 's3://nf-oregon' and path 'null':

 Type   | Name     | Size 
--------+----------+------
 FOLDER | .cache/  | 0    
 FOLDER | graham/  | 0    
 FOLDER | scratch/ | 0    

GCP:

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links browse -w seqeralabs/showcase -i v1-cloud-450a6c3a56f40669e0cd6453746df5cd -c seqera_gcp_credentials            

  Content of 'gs://leak-test' and path 'null':

 Type | Name                        | Size     
------+-----------------------------+----------
 FILE | flight_recording_rnaseq.jfr | 36384783 

Azure:

(base) ➜  tower-cli git:(task/405-list-data-links-support) ./build/native/nativeCompile/tw data-links browse -w seqeralabs/showcase -i v1-user-6bf157041871dedf7a15523ec6c48cf0 -c seqera_azure_credentials           

  Content of 'az://seqeralabs.azure-benchmarking' and path 'null':

 Type   | Name    | Size 
--------+---------+------
 FOLDER | .cache/ | 0   

@robnewman robnewman self-requested a review August 6, 2024 11:56
@weronikasosnowskaseqera
Copy link
Contributor Author

@JaimeSeqLabs or @jordeu can you merge this PR? I don't have permissions.

@robnewman robnewman merged commit ba9b0d2 into master Aug 7, 2024
7 checks passed
@robnewman robnewman deleted the task/405-list-data-links-support branch August 7, 2024 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Data explorer] Add list data links support to CLI
4 participants