Installing Hadoop on Windows 8.1 with Visual Studio 2010 Professional

Looking for a matrix system that work like gensim in Python, I discovered Mahout. Wanting to test this against the Universal Java Matrix Package, I decided to give the install a try. That, unfortunately was a long side-tracked road going well beyond the requirements listed in the install file.

In the end, I found a way that did not require Cygwin and went smoothly without requiring building packages in the Visual Studio IDE.

Installation instructions follow.

It is possible to download the hadoop install from a variety of mirrors.

The x64 Platform

Before starting, understand that hadoop uses a x64 system. x86-x64 works as well and using x32 installations of Cmake and the jdk will not harm the project. However, it is a 64 bit program and requires a Visual Studio 10 2010 Win 64 generator to compile the hdfs project files.

Uninstall Visual Studio 2010 Express and Distributables
Visual Studio 2010 Express uses a C++ distributable that will cause the command prompt for Windows SDK to fail and will also conflict with some of the build using the Visual Studion Command Prompt.

Requirements

The following requirements are necessary in no particular order.

  1. Microsoft Visual Studio 2010 Professional with C++
  2. Install the .Net 4.0 framework
  3. Zlib
  4. Most recent Maven
  5. MSBuild
  6. CMake
  7. Protoc
  8. Java JDK 1.7

Path Variables

The following must be in your path. The only order should be, if you have and wish to use Cygwin to place MS Visual studio before Cygwin to get rid of a copy of cmake that will not work for this task. It is better to just delete cmake on Cygwin and use it for Windows if this is the path you choose.

  1. MSBuild
  2. Cmake
  3. Visual Studio 2010
  4. Zlib
  5. protoc
  6. java

Environment Variables

The following can be set for an individual instance of command prompt.

  1. JAVA_HOME=path to jdk
  2. M2_HOME=path to maven
  3. VCTargetsPath=set to MSBuild/Microsoft.CPP/4.0 or other valid path to the CPP properties file
  4. Platform=x64

Run the Build

Open up a Visual Studio 2010 Win 64 command prompt and type the following command.

mvn package -Pdist,native-win -DskipTests -Dtar

Resulting Files

The following files should appear in your unzipped haddoop file under hadoop-dist/target.

  1. hadoop-2.6.X.tar
  2. hadoop-dist-2.6.X.jar
  3. dist-layout-stitching.sh
  4. dist-tar-stitching.sh

Special Thanks

Special thanks to the IT admin and security professional contractor at Hygenics Data LLC for the copy of Microsoft Visual Studio 2010.

Happy hadooping or being a Mahout.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s