Setting up Hadoop Locally on Mac
Often times we wish to test or scripts locally before pushing our code to production machine and going through 2 factor authentication. This file documents the steps that have worked for me.
Make sure we enable Remote Login
in System Preference -> Sharing
.
Download Hadoop and modify some configurations.
The following steps goes through setting up Hadoop in pseudo-distributed mode.
In this mode, Hadoop runs on a single node and each Hadoop daemon runs in a separate Java process.
Download Hadoop 2.8.1, you don't have to pick this version.
Unpack the tar file and save it to our favorite location. In this case it will be
~/hadoop-2.8.1
.
Next we'll edit the some config files under ~/hadoop-2.8.1/etc/hadoop/
.
hdfs-site.xml
, HDFS's default backup file number is 3, we will change it to 1.
core-site.xml
, Configure HDFS's port number.
mapred-site.xml
, we can configure Hadoop to use Yarn as the resource manager framework.
yarn-site.xml
Format and start HDFS and Yarn, remember to change the part to your user name.
stop HDFS and Yarn after we are done.
None mandatory steps.
We can add HADOOP_HOME
environment variable to .bashrc
or .bash_profile
for future use.
In the future, we can start and close Hadoop under the root directory rather than going under the hadoop directory every time.