Looking for a matrix system that work like gensim in Python, I discovered Mahout. Wanting to test this against the Universal Java Matrix Package, I decided to give the install a try. That, unfortunately was a long side-tracked road going well beyond the requirements listed in the install file.
In the end, I found a way that did not require Cygwin and went smoothly without requiring building packages in the Visual Studio IDE.
Installation instructions follow.
It is possible to download the hadoop install from a variety of mirrors.
The x64 Platform
Before starting, understand that hadoop uses a x64 system. x86-x64 works as well and using x32 installations of Cmake and the jdk will not harm the project. However, it is a 64 bit program and requires a Visual Studio 10 2010 Win 64 generator to compile the hdfs project files.
Uninstall Visual Studio 2010 Express and Distributables
Visual Studio 2010 Express uses a C++ distributable that will cause the command prompt for Windows SDK to fail and will also conflict with some of the build using the Visual Studion Command Prompt.
The following requirements are necessary in no particular order.
- Microsoft Visual Studio 2010 Professional with C++
- Install the .Net 4.0 framework
- Most recent Maven
- Java JDK 1.7
The following must be in your path. The only order should be, if you have and wish to use Cygwin to place MS Visual studio before Cygwin to get rid of a copy of cmake that will not work for this task. It is better to just delete cmake on Cygwin and use it for Windows if this is the path you choose.
- Visual Studio 2010
The following can be set for an individual instance of command prompt.
- JAVA_HOME=path to jdk
- M2_HOME=path to maven
- VCTargetsPath=set to MSBuild/Microsoft.CPP/4.0 or other valid path to the CPP properties file
Run the Build
Open up a Visual Studio 2010 Win 64 command prompt and type the following command.
mvn package -Pdist,native-win -DskipTests -Dtar
The following files should appear in your unzipped haddoop file under hadoop-dist/target.
Special thanks to the IT admin and security professional contractor at Hygenics Data LLC for the copy of Microsoft Visual Studio 2010.
Happy hadooping or being a Mahout.