How-to: Install Vowpal Wabbit on CentOS 6.5+
Vowpal Wabbit (VW) is a blazingly fast machine learning system which is based on an intrinsically fast learning algorithm. There are several optimization algorithms available with the baseline being sparse gradient descent (GD) on a number of available loss functions.
It boasts several features:
- Input Format: It has a pretty flexible input format that can consist of free form text (which is somewhat difficult to deal with in other ML libraries), and is interpreted in a bag-of-words way.
- Speed: The learning algorithm is pretty fast—similar to the few other online algorithm implementations out there.
- Scalability: An important characteristic here is that the memory footprint of the program is bounded independent of data. This means the training set is not loaded into main memory before learning starts. In addition, the size of the set of features is bounded independent of the amount of training data using the hashing trick.
- Feature Pairing: Subsets of features can be internally paired so that the algorithm is linear in the cross-product of the subsets. This is useful for ranking problems.
A step by step installation guide is available at the project’s github page. However, those instructions are for Ubuntu. On the other hand, the repositories of CentOS generally have outdated packages as compared to debian distributions. In this tutorial, we explain how to install VW on CentOS 6.5+.
- These installation instructions are tested on a CentOS 6.5 distribution.
- $ uname -a
- Linux node1.madoverdata.com 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
- cat /etc/centos-release
- CentOS release 6.5 (Final)
- The distribution should be updated with all the latest packages. Do keep in mind that updating the distribution will upgrade it from v6.5 to v6.6. To update all the packages without updating the distribution & kernels, do the following:
- vi /etc/yum.conf
- Add the below line after the one at which logfile’s path is specified:
- Save & exit the file
sudo yum update
- At the time of writing this, the latest version of VW is 7.10.3
- A cup of coffee
VW needs several libraries such as boost, perl, g++ etc. The last version of VW that supported the boost libraries present in CentOS’s repositories was 7.7. In order to install the latest & greatest version of VW, newer versions of boost libraries are needed.
- sudo yum install zlib-devel nc
nc is netcat which is required by one of the unit tests.
Install the tools needed to compile VW:
- sudo yum groupinstall “Development Tools”
The latest version of g++ available in the repositories is 4.4.7 which is incompatible with VW.
- $ g++ –version
- g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11)
Install a later version of g++:
- sudo vi /etc/yum.repos.d/DevToolset.repo
This will add a new repository.
Paste the following lines to the repo:
- name=RedHat DevToolset v2 $releasever – $basearch
Install a newer version of g++, namely, 4.8+
- sudo yum install devtoolset-2-gcc devtoolset-2-binutils devtoolset-2-gcc-c++
Modify your path to point to the new executables installed. It is generally recommended to modify the path temporarily (for the current session only) so that other scripts/programs don’t break.
- $ export PATH=“/opt/rh/devtoolset-2/root/usr/bin:$PATH”
- $ g++ –version
- g++ (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15)
The currently installed version of perl is 5.10.1. VW needs a later version.
- $ perl -v
- This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi
Download, unpack & install the latest perl (change it to a newer minor version, if available here)
- $ wget http://www.cpan.org/src/5.0/perl-5.22.0.tar.gz
- $ tar -xzvf perl-5.22.0.tar.gz
- $ cd perl-5.22.0/
- $ ./Configure -des
- $ make
- $ sudo make install
Everything should go on without any errors.
But perl will not be available just yet, we need to reconfigure the dynamic linking bindings.
- sudo ldconfig
Logout and login and check perl’s version again
- $ perl -v
- This is perl 5, version 22, subversion 0 (v5.22.0) built for x86_64-linux
Installing boost on CentOS is the most difficult piece of the puzzle. Don’t worry, we have you covered.
First, make sure that boost doesn’t already exist on your system.
- rpm –qa|grep boost
If it does, uninstall it, and check again. Otherwise, installation of VW will fail.
- sudo yum remove boost*
Download, unpack & install the latest version (change it a new minor version if available here)
- $ wget http://liquidtelecom.dl.sourceforge.net/project/boost/boost/1.58.0/boost_1_58_0.tar.gz
- $ tar -xzvf boost_1_58_0.tar.gz
- $ cd boost_1_58_0/
- $ ./bootstrap.sh
- $ sudo ./bjam –layout=system install
Now grab a coffee!! (from prerequisite #3 above)
There’s a bug in VW where it hardlinks to netcat. On debian systems, netcat is available but on RPM systems, it’s available as nc. There’s a symbolic link needs to be made from nc to netcat.
- sudo ln -s /usr/bin/nc /usr/bin/netcat
And now for the guest of honor- VW
Clone the master of VW’s git repo. master contains all the latest bug fixes so it is recommended.
- $ git clone https://github.com/JohnLangford/vowpal_wabbit.git
- $ cd vowpal_wabbit/
- $ make
Troubleshooting tip #1:
If at any point, you need to start over, do a make clean and clean out the working directory.
- $ make clean
- $ git checkout .
Troubleshooting tip #2:
If make fails and the last error is related to boost-program-options, double check that boost didn’t already exist on the system prior to our installation.
Also check whether boost libraries are installed in /usr/local/lib
- $ ll /usr/local/lib | grep boost
- $ make test
- $ sudo make install
- $ sudo ldconfig
Logout and log back in.
- $ vw –version