The steps are tested on Linux Ubuntu 10 using user 'yarn'. You can use any other user for the same.
- Download Hadoop from http://www.apache.org/dyn/closer.cgi/hadoop/common/. This document is done with hadoop-2.3.0/ release.
- Unzip Hadoop in a directory. In this case I have created hadoop in the home directory and unzipped there.
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk/
export HADOOP_HOME=/home/yarn/hadoop/hadoop-2.3.0
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=/home/yarn/hadoop/hadoop-2.3.0
export HADOOP_HDFS_HOME=/home/yarn/hadoop/hadoop-2.3.0
export HADOOP_CONF_DIR=/home/yarn/hadoop/hadoop-2.3.0/etc/hadoop
export YARN_CONF_DIR=$HADOOP_CONF_DIR
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_HOME=/home/yarn/hadoop/hadoop-2.3.0
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=/home/yarn/hadoop/hadoop-2.3.0
export HADOOP_HDFS_HOME=/home/yarn/hadoop/hadoop-2.3.0
export HADOOP_CONF_DIR=/home/yarn/hadoop/hadoop-2.3.0/etc/hadoop
export YARN_CONF_DIR=$HADOOP_CONF_DIR
export PATH=$PATH:$HADOOP_HOME/bin
- Check if hadoop is installed properly by using hadoop command
Hadoop 2.3.0
Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123
Compiled by jenkins on 2014-02-11T13:40Z
Compiled with protoc 2.5.0
From source with checksum dfe46336fbc6a044bc124392ec06b85
This command was run using /home/yarn/hadoop/hadoop-2.3.0/share/hadoop/common/hadoop-common-2.3.0.jar
- Hadoop can be configured using XML configuration files which are at etc/hadoop directory inside hadoop folder. The improtant onces to be configured are for single node cluster
mapred-site.xml
Go to $HADOOP_HOME/etc/hadoop
cp mapred-site.xml.template mapred-site.xml
Open mapred-site.xml
Between configuration put the following properties (Make local and temp directory at the given location)
<property>
<name>mapreduce.cluster.temp.dir</name>
<value>file:/home/yarn/hadoop/temp</value>
<description>No description</description>
<final>true</final>
</property>
<property>
<name>mapreduce.cluster.local.dir</name>
<value>file:/home/yarn/hadoop/local</value>
<description>No description</description>
<final>true</final>
</property>
yarn-site.xml
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8000</value>
<description>host is the hostname of the resource manager and
port is the port on which the NodeManagers contact the Resource Manager.
</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8001</value>
<description>host is the hostname of the resourcemanager and port is the port
on which the Applications in the cluster talk to the Resource Manager.
</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
<description>In case you do not want to use the default scheduler</description>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8002</value>
<description>the host is the hostname of the ResourceManager and the port is the port on
which the clients can talk to the Resource Manager. </description>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:/home/yarn/hadoop/nodemanager</value>
<description>the local directories used by the nodemanager</description>
</property>
<property>
<name>yarn.nodemanager.address</name>
<value>localhost:8003</value>
<description>the nodemanagers bind to this port</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>10240</value>
<description>the amount of memory on the NodeManager in GB</description>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>file:/home/yarn/hadoop/app-logs</value>
<description>directory on hdfs where the application logs are moved to </description>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>file:/home/yarn/hadoop/app-logs</value>
<description>the directories used by Nodemanagers as log directories</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>shuffle service that needs to be set for Map Reduce to run </description>
</property>
capacity-schedular.xml : Make the follwoign changes
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>unfunded,default</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.unfunded.capacity</name>
<value>50</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>50</value>
</property>
- Start resourcemanger and nodemanger
cd /home/yarn/hadoop/hadoop-2.3.0/sbin
./yarn-daemon.sh start resourcemanager
./yarn-daemon.sh start nodemanager
- Run the example. Go to hadoop installation directory and run
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar randomwriter out
No comments:
Post a Comment