-
Notifications
You must be signed in to change notification settings - Fork 22
Operator Tips & Tricks
Automated file transfers often use scp
but s3cmd
is increasingly used due to the prevalence of cloud storage with an S3 compatible interface (originally developed by Amazon for their cloud services but now an industry standard protocol in its own right). The DigtalOcean Spaces storage that we use for ASGS is also S3 compatible, hence this document.
At some point, it would be great for ASGS to install and set up s3cmd
by default. But until then, this document will help us remember some of the details so we can set it up and use it on an as-needed basis.
Download: S3cmd
By default, s3cmd
stores its configuration file, .s3cfg
, in the home directory of the user that ran the configuration command. .s3cfg
is a plain text file of key/value pairs which can be edited directly once it has been created.
s3cmd
uses the options set in its default configuration file when you run commands. You can specify a different configuration by appending -c ~/path/to/config/file
to each command you run.
If DigitalOcean is the main or only provider you’ll connect to with s3cmd
and you don’t want to specify its configuration file every time you use s3cmd
, configure the default ~/.s3cfg
file with the following command:
s3cmd --configure
Here is how it played out on mike:
[username@mike1 ~]$ module load python
[username@mike1 ~]$ module list
Currently Loaded Modulefiles:
1) intel/2021.5.0 2) intel-mpi/2021.5.1 3) python/3.9.7-anaconda
[username@mike1 ~]$ s3cmd --configure
Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.
Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
Access Key: [[cut/paste accesskey]]
Secret Key: [[cut/paste secretkey]]
Default Region [US]: <just hit enter>
Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
S3 Endpoint [s3.amazonaws.com]: sfo2.digitaloceanspaces.com
Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used
if the target S3 system supports dns based buckets.
DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: %(bucket)s.sfo2.digitaloceanspaces.com
Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password: <just hit enter>
Path to GPG program [/usr/bin/gpg]: <just hit enter>
When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP, and can only be proxied with Python 2.7 or newer
Use HTTPS protocol [Yes]: <just hit enter>
On some networks all internet access must go through a HTTP proxy.
Try setting it here if you can't connect to S3 directly
HTTP Proxy server name: <just hit enter>
New settings:
Access Key: [[cut/pasted accesskey]]
Secret Key: [[cut/pasted secretkey]]
Default Region: US
S3 Endpoint: sfo2.digitaloceanspaces.com
DNS-style bucket+hostname:port template for accessing a bucket: %(bucket)s.sfo2.digitaloceanspaces.com
Encryption password:
Path to GPG program: /usr/bin/gpg
Use HTTPS protocol: True
HTTP Proxy server name:
HTTP Proxy server port: 0
Test access with supplied credentials? [Y/n] <just hit enter>
Please wait, attempting to list all buckets...
Success. Your access key and secret key worked fine :-)
Now verifying that encryption works...
Not configured. Never mind.
Save settings? [y/N] y
Configuration saved to '/home/jgflemin/.s3cfg'
[username@mike1 ~]$ s3cmd ls
2019-03-12 21:06 s3://asgs-static-assets
Create Buckets: Use the command mb
, short for “make bucket”, to create a new bucket:
s3cmd mb s3://spacename s3://secondspace
List Buckets: s3cmd ls
List Buckets and Contents: s3cmd ls s3://spacename s3://secondspace
List all existing buckets: s3cmd la --recursive
Use the put command to copy files from your local machine to a bucket. In all of these commands, you must include the trailing slash. When you include the trailing slash, as in the example below, the original file name will be appended. If you omit the slash, then the file will be copied to the bucket with the new name, path.
s3cmd put file.txt s3://spacename/path/
New name: You can change the name of a file as you put it in a bucket by typing the new name at the end of the path as follows:
s3cmd put file.txt s3://spacename/newname.txt
Multiple Files:
s3cmd put file1.txt file2.txt path/to/file3.txt s3://spacename/path/
Using the * with the put command will copy everything in the current working directory, recursively, into your bucket:
s3cmd put * s3://spacename/path/ --recursive
You can set public permissions for all files at once by adding --acl-public
, and you can similarly set metadata with --add-header
(like --add-header=Cache-Control:max-age=86400
):
s3cmd put * s3://yourfolder --acl-public --add-header=Cache-Control:max-age=86400 --recursive
The command get
copies files from a bucket to your local computer.
One File: s3cmd get s3://spacename/path/to/file.txt
All Files in a Directory: To get multiple files, the s3 address must end with a trailing slash, and the command requires the --recursive
flag:
s3cmd get s3://spacename/path/ --recursive
New Name: Like the put
command, the get
command allows you to give the file a different name.
s3cmd get s3://spacename/file.txt newfilename.txt
s3cmd
only provides output when the command you issue actually changes access levels. For example, when you change the ACL from private to public, you’ll see output like s3://spacename/: ACL set to Public
. If the ACL is already public, s3cmd
will return silently to the command prompt.
Enable directory listings
s3cmd setacl s3://spacename/ --acl-public
Disable directory listings
s3cmd setacl s3://spacename/ --acl-private
Using the setacl
command, files can be made private so that only someone connecting with a valid key pair will be able to read the file, or public so that anyone can read the file with either an S3 compatible client or via HTTPS.
s3cmd
only provides output when the command you issue changes the access. For example, when you change the ACL from private to public, you’ll see output like s3://spacename/test.txt: ACL set to Public [1 of 1]
. If the ACL is already public, s3cmd
will return silently to the command prompt.
Make a file public
s3cmd setacl s3://spacename/file.txt --acl-public
Make all the files at a path public recursively: Use the --recursive
flag to make multiple files public recursively:
s3cmd setacl s3://spacename/path/to/files/ --acl-public --recursive
Make a file private
s3cmd setacl s3://spacename/file.txt --acl-private
Make all the files at a path private recursively: Use the --recursive
flag to make multiple files private recursively:
s3cmd setacl s3://spacename/path/to/files/ --acl-private --recursive
Traditionally, the Operator is required to compute the COLDSTARTDATE
datetime based on the HINDCASTLENGTH
number of days relative to the date at which the first nowcast or forecast is to start (often on or just before the current datetime). For example:
HINDCASTLENGTH=30
COLDSTARTDATE=...<<operator calculates 30 days prior to midnight on the current date in head>>
This is cumbersome and error prone. So we have developed a way for the Operator set to the HINDCASTLENGTH
and then use the date
builtin to bash
to automatically calculate the COLDSTARTDATE
based on the date that the Operator would like for the tidal/river spinup to end, which is much easier. For reference, the date command for date math - https://stackoverflow.com/questions/18180581/subtract-days-from-a-date-in-bash
For example:
HINDCASTLENGTH=30
HINDCASTENDDATE=$(date +%Y%m%d) # e.g., 20210505
COLDSTARTDATE=$(date --date="${HINDCASTENDDATE} -${HINDCASTLENGTH} days" +%Y%m%d%H)
# e.g., date --date="20210505 -30 days" +%Y%m%d%H)
The date for the tide+river initialization to end is given in YYYYmmdd
format. It then uses the HINDCASTLENGTH
(in days) to calculate the COLDSTARTDATE
, relieving the Operator of doing this manually, or with a more elaborate helper script. It does require the Operator to make sure that the HINDCASTLENGTH
is specified before the COLDSTARTDATE
code in the ASGS configuration file.