Skip to content
mmalohlava edited this page Apr 12, 2013 · 13 revisions

#H2O and S3

S3 general information

  • S3 bucket name should not contain underscore

S3 command line parameters

  • --aws_credentials - define a location of file with AWS access credentials (access key and secret key)

      java -Xms60G -Xmx60G -XX:MaxDirectMemorySize=1g -ea -jar h2o.jar --aws_credentials=~./ec2/AwsCredentials.properties 
    

-XX:MaxDirectMemorySize=1G is a java parameter Michal uses but is not necessary? (someone should answer and/or delete from here?)

Expected format of AwsCredentials

The AwsCredentials.properties should have following format:

accessKey=<put here your access key>
secretKey=<put here your secret key>

Access S3N via Hadoop HDFS

  1. setup core-site.xml according Hadoop help http://wiki.apache.org/hadoop/AmazonS3
<property>
  <name>fs.default.name</name>
  <value>s3://BUCKET</value>
</property>

<property>
  <name>fs.s3.awsAccessKeyId</name>
  <value>ID</value>
</property>

<property>
  <name>fs.s3.awsSecretAccessKey</name>
  <value>SECRET</value>
</property>

For S3N please replace s3 by s3n.

  1. Append the following options to H2O command line: -hdfs=hdfs://10.78.14.235:9000 -hdfs_version=0.20.2:

(Note: we only test hdfs+s3n URIs with hdfs_version=0.20.2 ..no other version is tested that I know of.)

Customer's usecase is one with the core-site.xml

It is also the simplest one

      $java -Xmx4g -jar target/h2o.jar --hdfs_config=core-site.xml 

and core-site.xml looks like this:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>
  <name>fs.default.name</name>
    <value>s3n://h2o-datasets</value>
</property>

	<property>
	  <name>fs.s3n.awsAccessKeyId</name>
      <value>ID</value>
	</property>

	<property>
	  <name>fs.s3n.awsSecretAccessKey</name>
      <value>Secret</value>
	</property>
</configuration>

Then from tab, Data > Import HDFS, type in the text box:

s3n:// 

And you should see completion of buckets in s3 over hdfs. Pick one of them with the mouse or down arrow/enter to select one.

S3 best practices

Clone this wiki locally