Skip to content

AWS Simple Storage Service (S3)

Hoa Nguyen edited this page Aug 24, 2016 · 12 revisions

Requirements

At least one AWS EC2 node (or you can use your laptop to test the set-up described here)

Introduction

Using the AWS Management Console

Login to the AWS Management Console and click on the S3 icon

Click the “Create Bucket” button to create a storage location for your data. You have to name your bucket with a globally unique name and make sure you choose a region near the rest of your cluster (e.g., N. Virginia if you are in New York and N. California if you are in Silicon Valley).

While you can transfer data from region to region, it can be costly and takes a significant amount of time. Continue by clicking “Create”.

You can now create folders by clicking on your newly created bucket and clicking the “Create Folder” button. You can upload files from your local system by using the GUI, which is fairly straightforward. See the documentation for more details.

Using the AWS Command Line Interface (CLI)

In order to utilize S3 to its fullest, you should use the AWS Command Line Interface (CLI).

AWS Credentials

In order to use this you’ll need your AWS Access Key ID and Secret Key handy (you should have received these from your Program Director). Your credentials should look something like the below examples

  • Access key ID example: AKIAIOSFODNN7EXAMPLE
  • Secret access key example: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Note: Your account will be shut down temporarily if you post your AWS credentials online (e.g. accidentally push them to Github). As soon as you push anything to Github, it is publicly available and Amazon regularly checks for credentials. Thus, it’s important to store your credentials in a file that is ignored (using the .gitignore file) or in your .profile file.

Installing the AWS CLI

SSH into any one of your EC2 instances with:

Create a Downloads directory (if it doesn’t already exist).

any-node:~$ mkdir ~/Downloads

Then download, unzip, and install the AWS CLI

any-node:~$ curl -L https://s3.amazonaws.com/aws-cli/awscli-bundle.zip -o ~/Downloads/awscli-bundle.zip
any-node:~$ unzip ~/Downloads/awscli-bundle.zip
any-node:~$ sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws

You can ensure that the CLI is installed correctly and get help with the aws help command.

Configuring AWS CLI

Configure your AWS Credentials as environment variables by exporting the AWS credentials described above in your .profile file.

Edit your .profile file

any-node:~$ nano ~/.profile

Add the following for the AWS environment variables (but with your own credentials)

export AWS_ACCESS_KEY_ID=<your-access-key-id>
export AWS_SECRET_ACCESS_KEY=<your-secret-access-key>
export AWS_DEFAULT_REGION=<your-cluster-region>

then source the profile to load the environment variables

any-node:~$ . ~/.profile

The cluster region is given by the location according to the table below:

N. California us-west-1
Oregon us-west-2
N. Virginia us-east-1

Example Commands

Next, create an example directory and a file named Who.txt using an editor

any-node$ mkdir ~/s3-examples
any-node$ nano ~/s3-examples/Who.txt

and copy the following text into it:

So call a big meeting
Get everyone out out
Make every Who holler
Make every Who shout shout

At this point, the file is only on the local Linux File System of the EC2 node (or your laptop if that is where you are testing this).

Make a new bucket on the S3 (but with your own unique name) with the mb command and copy the file to it.

any-node$ aws s3 mb s3://<your-unique-bucket-name>
any-node$ aws s3 cp ~/s3-examples/Who.txt s3://<your-unique-bucket-name>/examples/

Note that the examples folder on S3 didn’t already exist, it was created by the cp command itself.

You can view the copied directory on the S3 WebUI, or you can use the API as demonstrated below.

##Using the AWS SDK

The best way to use S3 (and many of the AWS tools) is to use one of the Software Development Kits (SDK). In particular, AWS provides SDKs for Java, Python, and many other languages. The Python SDK Boto is very popular among Fellows and you can find good examples in the documentation given in the previous link. With that said, getting started is usually as simple as installing boto with pip or from Github, then including the access keys from the environment variable with something like:

import os
import boto

aws_access_key = os.getenv('AWS_ACCESS_KEY_ID', 'default')
aws_secret_access_key = os.getenv('AWS_SECRET_ACCESS_KEY', 'default')

bucket_name = "your-bucket-name"
folder_name = "your-folder-name/"
file_name = "your-file-name"
conn = boto.connect_s3(aws_access_key, aws_secret_access_key)

bucket = conn.get_bucket(bucket_name)
key = bucket.get_key(folder_name + file_name)

data = key.get_contents_as_string()
print data

Warning: DO NOT commit you access or secret key to Github, use environment variables instead.

Clone this wiki locally