-
Notifications
You must be signed in to change notification settings - Fork 561
How to run H2O tests on EC2 clusters
Hi, here is overview of EC2 manual test execution. -michal.
###Required configuration
- EC2 ssh key (mrjenkins_test.pem - accessible via AWS management of ask @michal)
- EC2 API ID and KEY (also in AWS management console or ask @michal)
- ssh access to EC2 machine in same region as the desired cluster (default: us-east-1)
The images you'll get are setup for oracle java 1.6: /usr/lib/jvm/java-6-sun-1.6.0.38/jre/lib/jsse.jar
And should have all required python packages installed. If not, use 'sudo easy_install'
This machines (unlike 0xdata mr_0x*) don't have password assigned to root, so you can't type 'su' and then give a root password. You can, but it will never succeed, no matter what you type. This is the default on ubuntu.
You have to use the preferred sudo
in front of every command that needs root permissions. Alternatively, you can use sudo sh
and get a shell that is root. You have passwordless sudo, so you won't need to type in a password. See below for adding a user to the sudo group if you create a new user.
Also, don't shutdown the initial ssh target below. Only terminate cluster nodes, using the ec2_cmd.py preferably. (although it's okay to use any method).
###Review the costs It's good to look at the costs for node startup (you get charged for an hour minimum), so you're okay with the costs of what you're doing. Normal use is not much, so don't worry. But you do want to understand the dram, cores, and IO network bw you're getting, for H2O test reasons (behavior varies with configuration).
Nice summary: http://www.ec2instances.info/
Amazon descriptive page: http://aws.amazon.com/ec2/
Good detail by Amazon on instance types: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html
M1 Large is only 500Mbps interconnect!! (half of the typical 1GE)
M1 Large (m1.large) has 7.5GB dram, 4 cores and 500 Mbps network. 0.24/hour
M1 Extra Large (m1.xlarge) has 1G network, and 15GB dram and 8 cores. 0.48/hour.
High-Memory Double Extra Large (m2.2xlarge) gets you 4 cores on 34.2GB (similar to our 32 GB servers) for 0.82/hour
###Get setup on a single EC2 node first
For manual run of EC2 tests you need to connect to some of running EC2 instances. I use: (key was sent in 'ec2 info' email from me). This is a us-east-1 machine, which is what you want.
ssh -i ~/.ec2/keys/mrjenkins_test.pem hduser@23.21.237.69
This public address doesn't work anymore but you can use the dns addresses also:
ssh -i ~/.ec2/keys/mrjenkins_test.pem hduser@ec2-184-73-55-110.compute-1.amazonaws.com
hduser has passwordless sudo
so you can create your own user or ask michal. If you created a ubuntu instance at EC2, the default username should be ubuntu, so you can try that also.
another example: (This is a us-west-1 region machine, which will create hassles for you since we default all scripts to us-east-1 because it's cheaper. Better to start with a us-east-1 machine, otherwise you'll be adding '-r us-west-1' for subsequent commands).
ssh -l kevin -i /home/kevin/.ec2/keys/0xdata_Big.pem ec2-50-18-147-48.us-west-1.compute.amazonaws.com
Note you need to know the password for the username, as well as have the pem files.
Our test harness uses private ips and requires the cluster, and the node you're dispatching from, to be in the same region. To use defaults, you want to initially ssh to a us-east-1 machine as shown above. The s3/s3n files are accessible from either, but you don't want your cluster building/test harness to straddle regions.
Useful because clusters get created with your username for easy identification. On Ubuntu best to use
sudo adduser <username>
see http://www.howtogeek.com/howto/ubuntu/add-a-user-on-ubuntu-server
You'll want to setup ~/.ec2 and ~/.ssh also. See below. You want to be able to sudo from this user. So do:
sudo usermod -a -G sudo <username>
The ec2_cmd.py requires two environment variables to be set. Check. If not correct for the user name you're using, put the correct values in the .bashrc and source .bashrc
. These env variables should be in the username environment (ubuntu?) that you used to ssh to the ec2 instance. The keys need to be kept out of anything that's pushed to git. They are equivalent to passwords.
x's are used to obscure here:
$ printenv | grep AWS
AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxN
AWS_ACCESS_KEY_ID=xxxxxxxxxxxxxxxxxQ
in the .bashrc
export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxN
export AWS_ACCESS_KEY_ID=xxxxxxxxxxxxxxxxxQ
or more simply, just get it from the .ec2 dir, as long as you copy it from some existing user with the right stuff:
export AWS_ACCESS_KEY_ID=`head ~/.ec2/aws_id`
export AWS_SECRET_ACCESS_KEY=`head ~/.ec2/aws_key`
You also want these files in the right place for the username in use. You can get these files from michal or copy from an existing username.
~/.ec2/AwsCredentials.properties
~/.ec2/keys/mrjenkins_test.pem
~/.ec2/core-site.xml
See https://github.com/0xdata/h2o/wiki/H2O-and-S3 for the contents of core-site.xml, or you can copy it from an existing user (check it has s3n info or s3 info or both, depending on what you're doing)
Clone h2o repository (at EC2 cloning is fast). There is not much space on / (do df). But if you go to /tmp and create /tmp/username you can clone there?
git clone https://github.com/0xdata/h2o.git
Go to h2o/py/ and create your 3/4/5... EC2 instances:
python ec2_cmd.py create --instances 5
If you're using a shared username like hduser, it's nice to use --name yourname
so running instances from different people can be identified.
python ec2_cmd.py create --instances 5 --name kevin
The ec2_cmd will print some ssh commands for ssh'ing to the created instances with private ip addresses. You should try one. If it doesn't work, there are two possibilites:
-
If you are in a us-west-1 or us-west-1c region and the default creates instances in us-east-1 region, you won't be able to run the test harness. Terminate the instances you created using the created json:
python ec2_cmd.py terminate --hosts ec2-config-r-<some number>.json
and create new ones in the us-west-1 region with:
python ec2_cmd.py create --instances 5 -r us-west-1
or vice-versa if the region mismatch was the opposite case. It's okay to create us-west-1 instances if you are on an ec2 instance in us-west-1c.
-
You may not have authorized_keys in your ~/.ssh. Copy authorized_keys from /home/ubuntu/.ssh on the ec2 node you are on. Be sure to chmod correctly:
chmod 700 ~/.ssh chmod 400 ~/.ssh/authorized_keys
Then try the ssh with private ips (from the ec2_cmd.py create stdout) again. A useful alias in your .bashrc is:
alias essh='ssh -i ~/.ec2/keys/mrjenkins_test.pem'
Then you can do things like this:
essh ubuntu@ec2-54-242-71-176.compute-1.amazonaws.com
The ec2_cmd command will produce file called: ec2-config-r-.json which is a description of EC2 hosts and follows the same structure as the host file used by tests.
So if you want to change the number of jvms per machine, you can edit that json file and modify this line. Some tests override the config json, so this is just the default if the test doesn't specify.
"h2o_per_host": 2,
Tests can override the json file settings, when they build_cloud_with_hosts. Another entry in the json file that's useful to modify, (but don't exceed available dram). Again, some tests override the default.
"java_heap_GB": 11,
Watch your commas in the json. The parser isn't user-friendly and the most common error after hand-editing is missing comma, or extra comma at the end of a list of things (json file can't tolerate the extra comma for a last entry).
You can run stests from testdir_ec2, testdir_single_jvm, testdir_multi_jvm, or testdir_hosts:
python testdir_ec2/test_rf_iris.py -cj ./ec2-config-r...json
or build a cloud for manual testing
python testdir_ec2/test_cloud_static.py -cj ./ec2-config-r...json
If you prefer to use nosetests, so stdout is captured and hidden unless there is an error, then cd into one of the directories above, and cp the ../ec2-config-r...json to pytest_config-.json, replacing username with whatever you're using.
Nosetests doesn't pass args to unittest easily, so we recognize the existence of that json file, and use it. You should see that confirmed with a print to stdout.
Use this if you do that (most people won't be bothered about this, but it's useful if you want to use the n0 or n1 list of tests in those directories to run a series of tests! This mimics what jenkins does, although you don't need the xml summary like jenkins does
n0.doit has:
nosetests $1.py --nologcapture --with-xunit --xunit-file=$1.nosetests.xml
so in n0 I can do:
./n0.doit test_badchars.py
./n0.doit test_billion_rows.py
./n0.doit test_cols_enum_multi_import.py
./n0.doit test_exec_covtype_cols.py
Just create a python dictionary in a file with the key/value pairs you want to change. You can use python ec2_cmd.py show_defaults
to see what names and values exist for defaults.
i.e you can put this in my_default_conf:
{"instance_type": "m3.2xlarge"}
It seems like we don't have a way of changing the virtualization type for instances that require hvm virutalization. Also note that the java heap size arg for h2o is changed for some instance_types in ec2_cmd.py. Check that you got what you expected.
Then:
python ec2_cmd.py create --config my_default_conf --instances 1
Once you get used to knowing what files are where, use pattern matching regex's freely, so minimize typing. Just make sure patterns match one thing. And end your python patterns with .py so you don't match the .pyc files python creates. Also: do this in your .bashrc:
alias py=python
For example, when I'm in testdir_multi_jvm
py *browser*py -cj ../*1824*json
After your manual testing, or before leaving the office, you should terminate the instances
python ec2_cmd.py terminate --hosts ./ec2-config-r...json
$ py ec2_cmd.py -h
usage: ec2_cmd.py [-h] [-c CONFIG] [-i INSTANCES] [-H HOSTS] [-r REGION]
[--reservation RESERVATION] [--name NAME]
[--timeout TIMEOUT] [--instance_type INSTANCE_TYPE]
[--cmd CMD]
{help,demo,create,terminate,stop,reboot,start,distribute_h2o,
start_h2o,show_defaults,dump_reservation,show_reservations,clean_tmp,nexec}
H2O EC2 instances launcher
positional arguments:
{help,demo,create,terminate,stop,reboot,start,distribute_h2o,
start_h2o,show_defaults,dump_reservation,show_reservations,clean_tmp,nexec}
EC2 instances action!
optional arguments:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
Configuration file to configure NEW EC2 instances (if
not specified default is used - see "show_defaults")
-i INSTANCES, --instances INSTANCES
Number of instances to launch
-H HOSTS, --hosts HOSTS
Hosts file describing existing "EXISTING" EC2
instances
-r REGION, --region REGION
Specifies target create region
--reservation RESERVATION
Reservation ID, for example "r-1824ec65"
--name NAME Name for launched instances
--timeout TIMEOUT Timeout in seconds.
--instance_type INSTANCE_TYPE
Enfore a type of EC2 to launch (e.g., m2.2xlarge).
--cmd CMD Shell command to be executed by nexec.
###S3
Existing buckets (current state):
h2o-airlines (gzipped airlines data per year)
h2o-airlines-unpacked (all unzipped data per year + 12GB 1988-2008 file + 120GB file)
h2o-datasets (plus minus content of /home/0xdiag/datasets/)
h2o-smalldatasets (files from smalldatasets folder)
To upload files into S3 please use S3 web management console (https:/ /console.aws.amazon.com/s3/) or installed s3cmd:
s3cmd put iris.csv s3://h2o-datasets/bflmpsvz.csv
More info about S3 via HDFS: https://github.com/0xdata/h2o/wiki/H2O-and-S3
More info about running EC2 tests: https://github.com/0xdata/h2o/wiki/How-to-run-H2O-tests-on-EC2-clusters
More info about our Jenkins tests: https://github.com/0xdata/h2o-test
Sometimes you'll notice an odd "browser" window that comes up if you don't (you can exit it, it's a text based browser thing..you'll notice it's pointing to the cloud status) not all tests pop a browser, but some do.
For instance, this is useful in testdir_multi_jvm
py test_GLM_covtype20x_hosts.py -cj ../ec2-config-r-0023e27d.json -bd
../ec2-config-r-0023e27d.json is the ec2 config json that michal's stuff created for me when I created the instances with
python ec2_cmd.py create --instance 5
Also, you can always do
python test.py -h
to get the available args
Running: python test.py
usage: test.py [-h] [-bd] [-b] [-v] [-ip IP] [-cj CONFIG_JSON] [-dbg] [-rud]
[unittest_args [unittest_args ...]]
optional arguments:
-h, --help show this help message and exit
-bd, --browse_disable
Disable any web browser stuff. Needed for batch.
nosetests and jenkins disable browser through other
means already, so don't need
-b, --browse_json Pops a browser to selected json equivalent urls.
Selective. Also keeps test alive (and H2O alive) till
you ctrl-c. Then should do clean exit
-v, --verbose increased output
-ip IP, --ip IP IP address to use for single host H2O with psutil
control
-cj CONFIG_JSON, --config_json CONFIG_JSON
Use this json format file to provide multi-host
defaults. Overrides the default file
pytest_config-<username>.json. These are used only if
you do build_cloud_with_hosts()
-dbg, --debugger Launch java processes with java debug attach
mechanisms
-rud, --random_udp_drop
Drop 20 pct. of the UDP packets at the receive side
A Gentle Reader asks: Michal: "What's the right way to terminate this guy if I don't have a json file for him?"
Reservation : r-10952c6d
Instances : 1
[i-d1d761b1 : m1.xlarge] RUNNING ec2-54-242-71-176.compute-1.amazonaws.com/54.242.71.176/10.191.67.44 <node_kevin>
Michal answers:
./ec2_cmd.py dump_reservation --reservation 'r-10952c6d' --hosts tmp.json
(apparently the quotes around the reservation name are needed?) and then
./ec2_cmd.py terminate --host tmp.json