Skip to content

Latest commit

 

History

History
41 lines (33 loc) · 3.67 KB

moving-a-bucket.md

File metadata and controls

41 lines (33 loc) · 3.67 KB

Moving the contents of your workspace's Google bucket elsewhere

The scripts copy_bucket.sh and copy_bucket-mirror.sh can be used to copy your workspace's bucket, or really any Google bucket, to another location, either on locally or on Google Cloud. If you want the contents of your original bucket to appear in the root directory of your destination, use copy_bucket-mirror.sh. Otherwise, you can use copy_bucket.sh and the contents will be copied to the provided destination inside of a folder named after the source bucket, with structure in tact. In both cases, the gsutil copy log will be produced and named as {SOURCE}-to-{DESTINATION}.gsutil_copy_log.csv. There have been issues with gsutil not exiting the process upon completion, so, if it is done, just exit.

Required arguments:

    SOURCE                  <string>    Source Google bucket, with or without the gs:// prefix
    DESTINATION             <string>    Destination Google bucket, with or without the gs:// prefix

If you want to delete the files successfully copied from the source location, you can extract a list of successfully copied files with list_source_files.py.

Required arguments:

    --input                 <string>    Path to log file from gsutil
    --output                <string>    Prefix for output, output will be written to '{output}.files_to_remove.txt'

remove_files.sh will simply pass the provided file, which should be a simple text file listing one file on google cloud per row, to gsutil for deletion. You will be able to watch the progress of your files being deleted. Required arguments:

    HANDLE                  <string>    File path to list of files for removal

Example

To move the workspace vanallen-firecloud-dfci/Robinson2015_dev to vanallen-firecloud-nih/Robinson2015_dev, the buckets of both workspaces are both noted: gs://fc-f804af42-ded2-4bf2-af24-99d8b6d3b969 and gs://fc-7894f74c-6827-40ab-a26e-57d0bcb295ce, respectively. copy_bucket.sh is then run to copy the contents of on to the other, though copy_bucket-mirror.sh could be run instead if you want the root directories to be exactly the same. The gsutil copy log is then passed to list_source_files.py and then passed to remove_files.sh.

copy_bucket.sh is run, allowing me to see the status of the copying and recording the progress to fc-f804af42-ded2-4bf2-af24-99d8b6d3b969-to-fc-7894f74c-6827-40ab-a26e-57d0bcb295ce.gsutil_copy_log.csv.

bash copy_bucket.sh gs://fc-f804af42-ded2-4bf2-af24-99d8b6d3b969 gs://fc-7894f74c-6827-40ab-a26e-57d0bcb295ce

The gsutil copy log is then passed to list_source_files.py to extract the list of files successfully copied. The output prefix "vanallen-firecloud-dfci.Robinson2015_dev" is passed to specify the prefix for the output file, vanallen-firecloud-dfci.Robinson2015_dev.files_to_remove.txt.

python list_source_files.py --input fc-f804af42-ded2-4bf2-af24-99d8b6d3b969-to-fc-7894f74c-6827-40ab-a26e-57d0bcb295ce.gsutil_copy_log.csv --output "vanallen-firecloud-dfci.Robinson2015_dev"

vanallen-firecloud-dfci.Robinson2015_dev.files_to_remove.txt is then passed to remove_files.sh. Any files passed to remove_files.sh will be deleted, but you can pass without review with reasonable confidence because list_source_files.py will only list those that were successfully copied.

bash remove_files.sh vanallen-firecloud-dfci.Robinson2015_dev.files_to_remove.txt

You should then update the data model in the new workspace to point to the files. If you copied the data model from your old workspace, you should be able to either find and replace the bucket name or add the new bucket name as a prefix, if you did not mirror.