Skip to content

How to run H2O tests on EC2 clusters

Kevin Normoyle edited this page May 24, 2013 · 38 revisions

Hi, here is overview of EC2 manual test execution. -michal.

###Required configuration

  • EC2 ssh key (mrjenkins_test.pem - accessible via AWS management of ask @michal)
  • EC2 API ID and KEY (also in AWS management console or ask @michal)
  • ssh access to EC2 machine in same region as the desired cluster (default: us-east-1)

The images you'll get are setup for oracle java 1.6: /usr/lib/jvm/java-6-sun-1.6.0.38/jre/lib/jsse.jar

And should have all required python packages installed. If not, use 'sudo easy_install'

Heads up on su/sudo on ubuntu

This machines (unlike 0xdata mr_0x*) don't have password assigned to root, so you can't type 'su' and then give a root password. You can, but it will never succeed, no matter what you type. This is the default on ubuntu.

You have to use the preferred sudo in front of every command that needs root permissions. Alternatively, you can use sudo sh and get a shell that is root. You have passwordless sudo, so you won't need to type in a password. See below for adding a user to the sudo group if you create a new user.

Also, don't shutdown the initial ssh target below. Only terminate cluster nodes, using the ec2_cmd.py preferably. (although it's okay to use any method).

###Review the costs It's good to look at the costs for node startup (you get charged for an hour minimum), so you're okay with the costs of what you're doing. Normal use is not much, so don't worry. But you do want to understand the dram, cores, and IO network bw you're getting, for H2O test reasons (behavior varies with configuration).

Nice summary: http://www.ec2instances.info/

Amazon descriptive page: http://aws.amazon.com/ec2/

Good detail by Amazon on instance types: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html

M1 Large is only 500Mbps interconnect!! (half of the typical 1GE)
M1 Large (m1.large) has 7.5GB dram, 4 cores and 500 Mbps network. 0.24/hour
M1 Extra Large (m1.xlarge)  has 1G network, and 15GB dram and 8 cores. 0.48/hour.
High-Memory Double Extra Large (m2.2xlarge) gets you 4 cores on 34.2GB (similar to our 32 GB servers) for 0.82/hour

###Get setup on a single EC2 node first

For manual run of EC2 tests you need to connect to some of running EC2 instances. I use: (key was sent in 'ec2 info' email from me). This is a us-east-1 machine, which is what you want.

ssh -i ~/.ec2/keys/mrjenkins_test.pem hduser@23.21.237.69

This public address doesn't work anymore but you can use the dns addresses also:

ssh -i ~/.ec2/keys/mrjenkins_test.pem hduser@ec2-184-73-55-110.compute-1.amazonaws.com

hduser has passwordless sudo so you can create your own user or ask michal. If you created a ubuntu instance at EC2, the default username should be ubuntu, so you can try that also.

another example: (This is a us-west-1 region machine, which will create hassles for you since we default all scripts to us-east-1 because it's cheaper. Better to start with a us-east-1 machine, otherwise you'll be adding '-r us-west-1' for subsequent commands).

ssh -l kevin -i /home/kevin/.ec2/keys/0xdata_Big.pem ec2-50-18-147-48.us-west-1.compute.amazonaws.com

Note you need to know the password for the username, as well as have the pem files.

Understand EC2 regions.

Our test harness uses private ips and requires the cluster, and the node you're dispatching from, to be in the same region. To use defaults, you want to initially ssh to a us-east-1 machine as shown above. The s3/s3n files are accessible from either, but you don't want your cluster building/test harness to straddle regions.

If you want your own user.

Useful because clusters get created with your username for easy identification. On Ubuntu best to use

sudo adduser <username>

see http://www.howtogeek.com/howto/ubuntu/add-a-user-on-ubuntu-server

You'll want to setup ~/.ec2 and ~/.ssh also. See below. You want to be able to sudo from this user. So do:

sudo usermod -a -G sudo <username>

ec2_cmd.py setup

The ec2_cmd.py requires two environment variables to be set. Check. If not correct for the user name you're using, put the correct values in the .bashrc and source .bashrc. These env variables should be in the username environment (ubuntu?) that you used to ssh to the ec2 instance. The keys need to be kept out of anything that's pushed to git. They are equivalent to passwords.

x's are used to obscure here:

$ printenv | grep AWS
AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxN
AWS_ACCESS_KEY_ID=xxxxxxxxxxxxxxxxxQ

in the .bashrc

export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxN
export AWS_ACCESS_KEY_ID=xxxxxxxxxxxxxxxxxQ

or more simply, just get it from the .ec2 dir, as long as you copy it from some existing user with the right stuff:

export AWS_ACCESS_KEY_ID=`head ~/.ec2/aws_id`
export AWS_SECRET_ACCESS_KEY=`head ~/.ec2/aws_key`

You also want these files in the right place for the username in use. You can get these files from michal or copy from an existing username.

~/.ec2/AwsCredentials.properties
~/.ec2/keys/mrjenkins_test.pem
~/.ec2/core-site.xml

See https://github.com/0xdata/h2o/wiki/H2O-and-S3 for the contents of core-site.xml, or you can copy it from an existing user (check it has s3n info or s3 info or both, depending on what you're doing)

Clone h2o repository (at EC2 cloning is fast). There is not much space on / (do df). But if you go to /tmp and create /tmp/username you can clone there?

git clone https://github.com/0xdata/h2o.git

Creating a cluster of EC2 instances

Go to h2o/py/ and create your 3/4/5... EC2 instances:

python ec2_cmd.py create --instances 5

If you're using a shared username like hduser, it's nice to use --name yourname so running instances from different people can be identified.

python ec2_cmd.py create --instances 5 --name kevin

The ec2_cmd will print some ssh commands for ssh'ing to the created instances with private ip addresses. You should try one. If it doesn't work, there are two possibilites:

  1. If you are in a us-west-1 or us-west-1c region and the default creates instances in us-east-1 region, you won't be able to run the test harness. Terminate the instances you created using the created json:

     python ec2_cmd.py terminate --hosts ec2-config-r-<some number>.json
    

and create new ones in the us-west-1 region with:

    python ec2_cmd.py create --instances 5 -r us-west-1

or vice-versa if the region mismatch was the opposite case. It's okay to create us-west-1 instances if you are on an ec2 instance in us-west-1c.

  1. You may not have authorized_keys in your ~/.ssh. Copy authorized_keys from /home/ubuntu/.ssh on the ec2 node you are on. Be sure to chmod correctly:

     chmod 700 ~/.ssh
     chmod 400 ~/.ssh/authorized_keys
    

Then try the ssh with private ips (from the ec2_cmd.py create stdout) again. A useful alias in your .bashrc is:

alias essh='ssh -i ~/.ec2/keys/mrjenkins_test.pem'

Then you can do things like this:

essh ubuntu@ec2-54-242-71-176.compute-1.amazonaws.com

The ec2_cmd command will produce file called: ec2-config-r-.json which is a description of EC2 hosts and follows the same structure as the host file used by tests.

So if you want to change the number of jvms per machine, you can edit that json file and modify this line. Some tests override the config json, so this is just the default if the test doesn't specify.

"h2o_per_host": 2,

Tests can override the json file settings, when they build_cloud_with_hosts. Another entry in the json file that's useful to modify, (but don't exceed available dram). Again, some tests override the default.

"java_heap_GB": 11,

Watch your commas in the json. The parser isn't user-friendly and the most common error after hand-editing is missing comma, or extra comma at the end of a list of things (json file can't tolerate the extra comma for a last entry).

Run some tests

You can run stests from testdir_ec2, testdir_single_jvm, testdir_multi_jvm, or testdir_hosts:

python testdir_ec2/test_rf_iris.py -cj ./ec2-config-r...json

or build a cloud for manual testing

python testdir_ec2/test_cloud_static.py -cj ./ec2-config-r...json

Or if you want to use nosetests to run tests, to limit stdout if pass

If you prefer to use nosetests, so stdout is captured and hidden unless there is an error, then cd into one of the directories above, and cp the ../ec2-config-r...json to pytest_config-.json, replacing username with whatever you're using.

Nosetests doesn't pass args to unittest easily, so we recognize the existence of that json file, and use it. You should see that confirmed with a print to stdout.

Use this if you do that (most people won't be bothered about this, but it's useful if you want to use the n0 or n1 list of tests in those directories to run a series of tests! This mimics what jenkins does, although you don't need the xml summary like jenkins does

n0.doit has:

nosetests $1.py --nologcapture --with-xunit --xunit-file=$1.nosetests.xml

so in n0 I can do:

./n0.doit test_badchars.py
./n0.doit test_billion_rows.py
./n0.doit test_cols_enum_multi_import.py
./n0.doit test_exec_covtype_cols.py

Changing the default values used by ec2_cmd.py

Just create a python dictionary in a file with the key/value pairs you want to change. You can use python ec2_cmd.py show_defaults to see what names and values exist for defaults. i.e you can put this in my_default_conf:

{"instance_type": "m3.2xlarge"}

It seems like we don't have a way of changing the virtualization type for instances that require hvm virutalization. Also note that the java heap size arg for h2o is changed for some instance_types in ec2_cmd.py. Check that you got what you expected.

Then:

python ec2_cmd.py create --config my_default_conf --instances 1

Can use pattern match for python tests

Once you get used to knowing what files are where, use pattern matching regex's freely, so minimize typing. Just make sure patterns match one thing. And end your python patterns with .py so you don't match the .pyc files python creates. Also: do this in your .bashrc:

alias py=python

For example, when I'm in testdir_multi_jvm

py *browser*py -cj ../*1824*json

Remember to terminate your instances

After your manual testing, or before leaving the office, you should terminate the instances

python ec2_cmd.py terminate --hosts ./ec2-config-r...json

Detail on ec2_cmd.py args

$ py ec2_cmd.py -h

usage: ec2_cmd.py [-h] [-c CONFIG] [-i INSTANCES] [-H HOSTS] [-r REGION]
              [--reservation RESERVATION] [--name NAME]
              [--timeout TIMEOUT] [--instance_type INSTANCE_TYPE]
              [--cmd CMD]

{help,demo,create,terminate,stop,reboot,start,distribute_h2o,
     start_h2o,show_defaults,dump_reservation,show_reservations,clean_tmp,nexec}

H2O EC2 instances launcher

positional arguments:
{help,demo,create,terminate,stop,reboot,start,distribute_h2o,
    start_h2o,show_defaults,dump_reservation,show_reservations,clean_tmp,nexec}

                    EC2 instances action!

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                    Configuration file to configure NEW EC2 instances (if
                    not specified default is used - see "show_defaults")
  -i INSTANCES, --instances INSTANCES
                    Number of instances to launch
  -H HOSTS, --hosts HOSTS
                    Hosts file describing existing "EXISTING" EC2
                    instances
  -r REGION, --region REGION
                    Specifies target create region
  --reservation RESERVATION
                    Reservation ID, for example "r-1824ec65"
  --name NAME           Name for launched instances
  --timeout TIMEOUT     Timeout in seconds.
  --instance_type INSTANCE_TYPE
                    Enfore a type of EC2 to launch (e.g., m2.2xlarge).
 --cmd CMD             Shell command to be executed by nexec.

###S3

Existing buckets (current state):

h2o-airlines (gzipped airlines data per year)
h2o-airlines-unpacked (all unzipped data per year + 12GB 1988-2008 file + 120GB file)
h2o-datasets (plus minus content of /home/0xdiag/datasets/)
h2o-smalldatasets (files from smalldatasets folder)

To upload files into S3 please use S3 web management console (https:/ /console.aws.amazon.com/s3/) or installed s3cmd:

s3cmd put iris.csv s3://h2o-datasets/bflmpsvz.csv

More info about S3 via HDFS: https://github.com/0xdata/h2o/wiki/H2O-and-S3

More info about running EC2 tests: https://github.com/0xdata/h2o/wiki/How-to-run-H2O-tests-on-EC2-clusters

More info about our Jenkins tests: https://github.com/0xdata/h2o-test

Useful hints when running tests

Sometimes you'll notice an odd "browser" window that comes up if you don't (you can exit it, it's a text based browser thing..you'll notice it's pointing to the cloud status) not all tests pop a browser, but some do.

For instance, this is useful in testdir_multi_jvm

py test_GLM_covtype20x_hosts.py -cj ../ec2-config-r-0023e27d.json -bd

../ec2-config-r-0023e27d.json is the ec2 config json that michal's stuff created for me when I created the instances with

python ec2_cmd.py create --instance 5

Also, you can always do

python test.py -h

to get the available args

Running: python test.py
usage: test.py [-h] [-bd] [-b] [-v] [-ip IP] [-cj CONFIG_JSON] [-dbg] [-rud]
           [unittest_args [unittest_args ...]]



optional arguments:
-h, --help            show this help message and exit
-bd, --browse_disable
                      Disable any web browser stuff. Needed for batch.
                      nosetests and jenkins disable browser through other
                      means already, so don't need
-b, --browse_json     Pops a browser to selected json equivalent urls.
                      Selective. Also keeps test alive (and H2O alive) till
                      you ctrl-c. Then should do clean exit
-v, --verbose         increased output
-ip IP, --ip IP       IP address to use for single host H2O with psutil
                      control
-cj CONFIG_JSON, --config_json CONFIG_JSON
                      Use this json format file to provide multi-host
                      defaults. Overrides the default file
                      pytest_config-<username>.json. These are used only if
                      you do build_cloud_with_hosts()
-dbg, --debugger      Launch java processes with java debug attach
                      mechanisms
-rud, --random_udp_drop
                      Drop 20 pct. of the UDP packets at the receive side

Special cases that we'll simplify soon

A Gentle Reader asks: Michal: "What's the right way to terminate this guy if I don't have a json file for him?"

Reservation : r-10952c6d
Instances   : 1
[i-d1d761b1 : m1.xlarge] RUNNING ec2-54-242-71-176.compute-1.amazonaws.com/54.242.71.176/10.191.67.44 <node_kevin>

Michal answers:

 ./ec2_cmd.py dump_reservation --reservation 'r-10952c6d' --hosts tmp.json

(apparently the quotes around the reservation name are needed?) and then

./ec2_cmd.py terminate --host tmp.json
Clone this wiki locally