-
Notifications
You must be signed in to change notification settings - Fork 61
AWS
This tutorial expects you to have the following items already configured.
- VPC
- Subnet
- Key-pair (and associated PEM key)
- Security Group
- You can do a lot of different things with AWS. For now, let’s just start a virtual machine. Click on EC2 (“Elastic Cloud Compute”).
- Click on “Launch Instance”. It’s worth noting that AWS has different regions, and that you can launch an instance in any of them. (So if you’ve always wanted a server in Brazil or Japan, this is your chance!) Your choice will affect both latency and price - if you are a Fellow in Silicon Valley, choose Oregon. If you are a Fellow in New York, choose Virginia.
- Now it’s time to select the image for your VM. For this tutorial, we’ll use Ubuntu Server 14.04 LTS (HVM). Be sure to select the 64-bit.
- Now choose your instance size.
You can spin up one of the following instance types:
- "t2.*"
- "m4.large"
- "m4.xlarge"
- "c4.large"
- "c4.xlarge"
- "r3.large"
- We can then specify the number of instances to spin up, the VPC to be used and the subnet to be used in that VPC. Choose the VPC and subnets that you created earlier.
- We will use magnetic volume type for storage
- Next we can give these instances a name so they are easily recognizable among other instances in your account. Here we gave all the instances the name ‘master’, but you need to use a uniquely identifiable tag like your name (e.g. david-d).
- The next step will be to configure the security groups setting for these instances. For this exercise, we will have SSH and all the ports open for ease of access. It should be noted that these settings should be much more strict if put in production. If a security group does not exist with the following configuration, you can create a new security group.
- We will then review our instances and then launch. You will then be asked to choose a pem-key which will be used to login to these instances. Use the pem key you downloaded earlier. WARNING: If you lose your pem-key there is no way to recover it and thus lose access to any instances that are associated with this pem-key
- Congratulations! Your AWS instance is now spinning up! Let’s log into it.
Use the blue “Allocate New Address” button on the top left, and allocate one new address for each instance you spun up. Then check the box on one of the new Elastic IPs and choose “Associate Address” under the Actions dropdown.
Type the name of your cluster, and select one of the instances to associate, and click the blue Associate button in the bottom right. Repeat this associate step again until each instance has exactly one Elastic IP.
Until you disassociate these Elastic IPs, they will be the new Public IP of the instances, even if you Stop the node (but not if you Terminate it).
## Logging into an EC2 Instance 11. Return to the AWS console. You should now see “4 Running Instances” if you chose 4 instances in the beginning. Click on it.- You can then choose one of these instances. You can rename the instances by clicking the pencil to the right of the instance name, which will be helpful for technologies that use a single “master” that needs to be distinguished from than the rest. You should now be able to see the public IP address of your Virtual Machine (VM). This is the endpoint we’re going to use to ssh into the machine.
- First: You need to change the permissions of the private key that we emailed you earlier. Do this with the command sudo chmod 600 <path to your key>. Now ssh into the machine. Use the command ssh -i <path to your key> ubuntu@<your VM’s public IP address>. If prompted “Are you sure that you want to continue?”, enter “yes”.
This will be the foundation for spinning up your clusters for various technologies such as Hadoop, Spark, Kafka, Cassandra, Elasticsearch and many others.
Additional Tips:
- To exit an SSH session you can either press Ctrl-D in the terminal or type exit
- To terminate instances when you are finished with them, you can go to AWS and find the Instances tab along the left panel. Next highlight the instances you wish to terminate and the click on Actions -> Instance State -> Terminate
- You may find it useful to set up a configuration file for ssh, which makes it more convenient to connect to your nodes. To do so, create a file named config in your .ssh directory and enter something like the following text in it (but using your own host names, private IPs, and pem keys):
~/.ssh/config
Host <hostname1> HostName <your master’s public dns> User ubuntu Port 22 IdentityFile <path to your pem key> Host <hostname2> HostName <your worker’s public dns> User ubuntu Port 22 IdentityFile <path to your pem key>
Now you can ssh into these two hosts using ssh <hostname1> and ssh <hostname2>, respectively, rather than the longer ssh command from above.
## Export access keys as environment variables ####If you are using Mac:user:~$ nano ~/.bash_profile
Copy the access key id and secret access key that you generated earlier within your IAM role <firstname-lastname-access-keys.txt> and paste it at the bottom of your .bash_profile file as follows
export AWS_ACCESS_KEY_ID=<your access key id> export AWS_SECRET_ACCESS_KEY=<your secret access key>
Source your .bash_profile as follows
user:~$ . ~/.bash_profile####If you are using Linux:
user:~$ nano ~/.profile
Copy the access key id and secret access key that you generated earlier within your IAM role <firstname-lastname-access-keys.txt> and paste it at the bottom of your .profile file as follows
export AWS_ACCESS_KEY_ID=<your access key id> export AWS_SECRET_ACCESS_KEY=<your secret access key>
Source your .profile as follows
user:~$ . ~/.profile
Find out more about the Insight Data Engineering Fellows Program in New York and Silicon Valley, apply today, or sign up for program updates.
You can also read our engineering blog here.