This sections including the steps for setting up Prerequisites for all bigdata-integration examples.
This section including step by step procedures for installing 'Hadoop 1.2.1' to RHEL 6, and configuring a Single Node Setup.
$ uname -a
Linux kylin.xx.com 2.6.32-431.20.3.el6.x86_64 #1 SMP Fri Jun 6 18:30:54 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
$ java -version
java version "1.7.0_60"
Java(TM) SE Runtime Environment (build 1.7.0_60-b19)
Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode)
$ wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz
$ tar -xvf hadoop-1.2.1.tar.gz
$ cd hadoop-1.2.1
Edit 'conf/hadoop-env.sh', comment out JAVA_HOME, make sure it point to a valid Java Home:
export JAVA_HOME=/usr/java/jdk1.7.0_60
NOTE: Hadoop 1.2.1 need Java 1.6 or higher
Edit 'conf/core-site.xml', add the following properties in :
<property>
<name>hadoop.tmp.dir</name>
<value>/home/kylin/tmp/hadoop</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/kylin/tmp/hadoop/name</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
NOTE: the property's value should match to your's setting.
Edit 'conf/hdfs-site.xml', add the following 2 property in :
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
Format a new distributed-filesystem via execute
hadoop-1.2.1/bin/hadoop namenode -format
Start all hadoop services via execute
$ ./bin/start-all.sh
NOTE: there are 5 java processes which represent 5 services be started:
NameNode
,SecondaryNameNode
,DataNode
,JobTracker
,TaskTracker
. Execute 'jps -l' to check the java processes:
$ jps -l
4056 org.apache.hadoop.hdfs.server.namenode.NameNode
4271 org.apache.hadoop.hdfs.server.datanode.DataNode
4483 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
4568 org.apache.hadoop.mapred.JobTracker
4796 org.apache.hadoop.mapred.TaskTracker
NOTE:
NameNode
,JobTracker
,TaskTracker
has relevant Web Consoles for View and Monitor the serivces. Web Access URLs for Services:
http://localhost:50030/ for the Jobtracker
http://localhost:50070/ for the Namenode
http://localhost:50060/ for the Tasktracker
Stop all hadoop services via execute
# bin/stop-all.sh
This section including step by step procedures for installing Apache Hive and set up HiveServer2.
Hadoop is the prerequisite, refer to above steps to install and start Hadoop.
$ tar -xvf apache-hive-1.2.1-bin.tar.gz
$ cd apache-hive-1.2.1-bin
Create a 'hive-env.sh' under 'conf'
$ cd conf/
$ cp hive-env.sh.template hive-env.sh
$ vim hive-env.sh
comment out HADOOP_HOME and make sure point to a valid Hadoop home, for example:
HADOOP_HOME=/home/kylin/server/hadoop-1.2.1
Navigate to Hadoop Home, create '/tmp' and '/user/hive/warehouse' and chmod g+w in HDFS before running Hive:
$ ./bin/hadoop fs -mkdir /tmp
$ ./bin/hadoop fs -mkdir /user/hive/warehouse
$ ./bin/hadoop fs -chmod g+w /tmp
$ ./bin/hadoop fs -chmod g+w /user/hive/warehouse
$ ./bin/hadoop fs -chmod 777 /tmp/hive
NOTE: Restart Hadoop services is needed, this for avoid 'java.io.IOException: Filesystem closed' in DFSClient check Open.
Create a 'hive-site.xml' file under conf folder
$ cd apache-hive-1.2.1-bin/conf/
$ touch hive-site.xml
Edit the 'hive-site.xml', add the following content:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.server2.thrift.min.worker.threads</name>
<value>5</value>
</property>
<property>
<name>hive.server2.thrift.max.worker.threads</name>
<value>500</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>0.0.0.0</value>
</property>
</configuration>
NOTE: there are other Optional properties, more refer to Setting+Up+HiveServer2
$ ./bin/hiveserver2
The following steps show how to install Apache Spark and start the Thrift JDBC/ODBC server.
$ tar -xvf $ tar -xvf spark-1.4.0-bin-hadoop2.4.tgz
$ cd spark-1.4.0-bin-hadoop2.4
$ ./sbin/start-thriftserver.sh