Skip to content

Latest commit

 

History

History
445 lines (302 loc) · 13 KB

linux.md

File metadata and controls

445 lines (302 loc) · 13 KB

Machine Learning Build Machine Setup for Linux

To ensure everything is consistent for redistributable builds we build all redistributable components from source with a specific version of gcc.

You will need the following environment variables to be defined:

  • JAVA_HOME - Should point to the JDK you want to use to run Gradle.
  • CPP_SRC_HOME - Only required if building the C++ code directly using make, as Gradle sets it automatically.
  • PATH - Must have /usr/local/gcc73/bin before /usr/bin and /bin.
  • LD_LIBRARY_PATH - Must have /usr/local/gcc73/lib64 and /usr/local/gcc73/lib before /usr/lib and /lib.

For example, you might create a .bashrc file in your home directory containing something like this:

umask 0002
export JAVA_HOME=/usr/local/jdk1.8.0_121
export LD_LIBRARY_PATH=/usr/local/gcc73/lib64:/usr/local/gcc73/lib:/usr/lib:/lib
export PATH=$JAVA_HOME/bin:/usr/local/gcc73/bin:/usr/bin:/bin:/usr/sbin:/sbin:/home/vagrant/bin
# Only required if building the C++ code directly using make - adjust depending on the location of your Git clone
export CPP_SRC_HOME=$HOME/ml-cpp

OS Packages

You need the C++ compiler and the headers for the zlib library that comes with the OS. You also need the archive utilities unzip and bzip2. Finally, the unit tests for date/time parsing require the tzdata package that contains the Linux timezone database. On RHEL/CentOS these can be installed using:

sudo yum install bzip2
sudo yum install gcc-c++
sudo yum install tzdata
sudo yum install unzip
sudo yum install zlib-devel

On other Linux distributions the package names are generally the same and you just need to use the correct package manager to install these packages.

General settings for building the tools

Most of the tools are built via a GNU "configure" script. There are some environment variables that affect the behaviour of this. Therefore, when building ANY tool on Linux, set the following environment variables:

export CFLAGS='-g -O3 -fstack-protector -D_FORTIFY_SOURCE=2'
export CXX='g++ -std=gnu++14'
export CXXFLAGS='-g -O3 -fstack-protector -D_FORTIFY_SOURCE=2'
export LDFLAGS='-Wl,-z,relro -Wl,-z,now'
export LDFLAGS_FOR_TARGET='-Wl,-z,relro -Wl,-z,now'
unset LIBRARY_PATH

These environment variables only need to be set when building tools on Linux. They should NOT be set when compiling the Machine Learning source code (as this should pick up all settings from our Makefiles).

gcc

We have to build on old Linux versions to enable our software to run on the older versions of Linux that users have. However, this means the default compiler on our Linux build servers is also very old. To enable use of more modern C++ features, we use the default compiler to build a newer version of gcc and then use that to build all our other dependencies.

Download gcc-7.3.0.tar.gz from http://ftpmirror.gnu.org/gcc/gcc-7.3.0/gcc-7.3.0.tar.gz.

Unlike most automake-based tools, gcc must be built in a directory adjacent to the directory containing its source code, so build and install it like this:

tar zxvf gcc-7.3.0.tar.gz
cd gcc-7.3.0
contrib/download_prerequisites
sed -i -e 's/$(SHLIB_LDFLAGS)/$(LDFLAGS) $(SHLIB_LDFLAGS)/' libgcc/config/t-slibgcc
cd ..
mkdir gcc-7.3.0-build
cd gcc-7.3.0-build
unset CXX
unset LD_LIBRARY_PATH
export PATH=/usr/bin:/bin:/usr/sbin:/sbin
../gcc-7.3.0/configure --prefix=/usr/local/gcc73 --enable-languages=c,c++ --enable-vtable-verify --with-system-zlib --disable-multilib
make -j 6
sudo make install

It's important that gcc itself is built using the system compiler in C++98 mode, hence the adjustment to PATH and unsetting of CXX and LD_LIBRARY_PATH.

After the gcc build is complete, if you are going to work through the rest of these instructions in the same shell remember to reset the CXX environment variable so that the remaining C++ components get built with C++14:

export CXX='g++ -std=gnu++14'

To confirm that everything works correctly run:

g++ --version

It should print:

g++ (GCC) 7.3.0

in the first line of the output. If it doesn't then double check that /usr/local/gcc73/bin is near the beginning of your PATH.

Git

Modern versions of Linux will come with Git in their package repositories, and (since we're not redistributing it so don't really care about the exact version used) this is the easiest way to install it. The command will be:

sudo yum install git

on RHEL clones. However, Jenkins requires at minimum version 1.7.9 of Git, so if the version that yum installs is older you'll still have to build it from scratch. In this case, you may need to uninstall the version that yum installed:

git --version
sudo yum remove git

If you have to build Git from source in order to get version 1.7.9 or above, this is what to do:

Make sure you install the packages python-devel, curl-devel and openssl-devel using yum or similar before you start this.

Start by running:

./configure

as usual.

Then run:

make prefix=/usr all
sudo make prefix=/usr install

Without the prefix=/usr bit, you'll end up with a personal Git build in ~/bin instead of one everyone on the machine can use.

libxml2

Anonymous FTP to ftp.xmlsoft.org, change directory to libxml2, switch to binary mode, and get libxml2-2.9.4.tar.gz.

Uncompress and untar the resulting file. Then run:

./configure --prefix=/usr/local/gcc73 --without-python --without-readline

This should build an appropriate Makefile. Assuming it does, type:

make
sudo make install

to install.

expat

Download expat from https://github.com/libexpat/libexpat/releases/download/R_2_2_6/expat-2.2.6.tar.bz2.

Extract the tarball to a temporary directory:

tar jxvf expat-2.2.6.tar.bz2

Then build using:

./configure --prefix=/usr/local/gcc73 --without-docbook
make
sudo make install

APR

For Linux, before building log4cxx you must download the Apache Portable Runtime (APR) from http://archive.apache.org/dist/apr/apr-1.7.0.tar.bz2.

Extract the tarball to a temporary directory:

tar jxvf apr-1.7.0.tar.bz2

We want to avoid a dependency on the operating system libcrypt, as this may not be available in all Linux distributions. Therefore, before building, in configure change:

for ac_lib in '' crypt ufc; do

to:

for ac_lib in ''; do

And in include/apr.h.in change:

#define APR_HAVE_CRYPT_H         @crypth@

to:

#define APR_HAVE_CRYPT_H         0

Then build using:

./configure --prefix=/usr/local/gcc73
make
sudo make install

APR utilities

For Linux, before building log4cxx you must download the Apache Portable Runtime (APR) utilities from http://archive.apache.org/dist/apr/apr-util-1.6.1.tar.bz2.

Extract the tarball to a temporary directory:

tar jxvf apr-util-1.6.1.tar.bz2

We want to avoid a dependency on the operating system libcrypt, as this may not be available in all Linux distributions. Therefore, before building, in configure change:

for ac_lib in '' crypt ufc; do

to:

for ac_lib in ''; do

And in crypto/apr_passwd.c change:

#define CRYPT_MISSING 0

to:

#define CRYPT_MISSING 1

Then build using:

./configure --prefix=/usr/local/gcc73 --with-apr=/usr/local/gcc73/bin/apr-1-config --with-expat=/usr/local/gcc73
make
sudo make install

log4cxx

Download from one of the mirrors listed at http://www.apache.org/dyn/closer.cgi/logging/log4cxx/0.10.0/apache-log4cxx-0.10.0.tar.gz.

Unzip using:

tar zxvf apache-log4cxx-0.10.0.tar.gz

Unfortunately one of the log4cxx headers triggers an annoying (but harmless) g++ warning message. This is due to a copy constructor failing to explicitly call a base class constructor - it doesn't matter as the base class has no member variables, but g++ still complains. You can prevent the header causing a warning (without changing its meaning in any way) by making the changes detailed at http://issues.apache.org/jira/browse/LOGCXX-314, i.e. change:

#if LOG4CXX_HELGRIND
#define _LOG4CXX_OBJECTPTR_INIT(x) { exchange(x);
#else
#define _LOG4CXX_OBJECTPTR_INIT(x) : p(x) {
#endif

to:

#if LOG4CXX_HELGRIND
#define _LOG4CXX_OBJECTPTR_INIT(x) : ObjectPtrBase() { exchange(x);
#else
#define _LOG4CXX_OBJECTPTR_INIT(x) : ObjectPtrBase(), p(x) {
#endif

in src/main/include/log4cxx/helpers/objectptr.h.

Also, in src/main/cpp/inputstreamreader.cpp and src/main/cpp/socketoutputstream.cpp, after the last existing #include add:

#include <string.h>

and in src/examples/cpp/console.cpp, after the last existing #include add:

#include <string.h>
#include <stdio.h>
#include <wchar.h>

Note that the following 5 edits can be accomplished using these sed commands:

sed -i -e '152,163s/0x/(char)0x/g' src/main/cpp/locationinfo.cpp
sed -i -e '239,292s/0x/(char)0x/g' src/main/cpp/loggingevent.cpp
sed -i -e '39s/0x/(char)0x/g' src/main/cpp/objectoutputstream.cpp
sed -i -e '84,92s/0x/(char)0x/g' src/main/cpp/objectoutputstream.cpp
sed -i -e '193,214s/0x/(char)0x/g' src/test/cpp/xml/domtestcase.cpp

In src/main/cpp/locationinfo.cpp replace 0x with (char)0x on lines 152 to 163 - this can be done using the vim command:

:152,163s/0x/(char)0x/g

In src/main/cpp/loggingevent.cpp replace 0x with (char)0x on lines 239 to 292 - this can be done using the vim command:

:239,292s/0x/(char)0x/g

In src/main/cpp/objectoutputstream.cpp replace 0x with (char)0x on lines 39 and 84 to 92 - this can be done using the vim commands:

:39s/0x/(char)0x/g
:84,92s/0x/(char)0x/g

In src/test/cpp/xml/domtestcase.cpp replace 0x with (char)0x on lines 193 to 214 - this can be done using the vim command:

:193,214s/0x/(char)0x/g

Once all the changes are made, configure using:

./configure --prefix=/usr/local/gcc73 --with-charset=utf-8 --with-logchar=utf-8 --with-apr=/usr/local/gcc73 --with-apr-util=/usr/local/gcc73

This should build an appropriate Makefile. Assuming it does, type:

make
sudo make install

to install the necessary headers and libraries.

Boost 1.65.1

Download version 1.65.1 of Boost from http://sourceforge.net/projects/boost/files/boost/1.65.1/. You must get this exact version, as the Machine Learning Makefiles expect it.

Assuming you chose the .bz2 version, extract it to a temporary directory:

bzip2 -cd boost_1_65_1.tar.bz2 | tar xvf -

In the resulting boost_1_65_1 directory, run:

./bootstrap.sh --without-libraries=context --without-libraries=coroutine --without-libraries=graph_parallel --without-libraries=log --without-libraries=mpi --without-libraries=python --without-icu

This should build the b2 program, which in turn is used to build Boost.

Edit boost/unordered/detail/implementation.hpp and change line 270 from:

    (17ul)(29ul)(37ul)(53ul)(67ul)(79ul) \

to:

    (3ul)(17ul)(29ul)(37ul)(53ul)(67ul)(79ul) \

Then edit boost/math/tools/config.hpp and change line 380 from:

#if ((defined(__linux__) && !defined(__UCLIBC__) && !defined(BOOST_MATH_HAVE_FIXED_GLIBC)) || defined(__QNX__) || defined(__IBMCPP__)) && !defined(BOOST_NO_FENV_H)

to:

#if ((!defined(BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS) && defined(__linux__) && !defined(__UCLIBC__) && !defined(BOOST_MATH_HAVE_FIXED_GLIBC)) || defined(__QNX__) || defined(__IBMCPP__)) && !defined(BOOST_NO_FENV_H)

Finally, run:

./b2 -j6 --layout=versioned --disable-icu pch=off optimization=speed inlining=full define=BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS define=_FORTIFY_SOURCE=2 cxxflags=-std=gnu++14 cxxflags=-fstack-protector linkflags=-Wl,-z,relro linkflags=-Wl,-z,now
sudo env PATH="$PATH" ./b2 install --prefix=/usr/local/gcc73 --layout=versioned --disable-icu pch=off optimization=speed inlining=full define=BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS define=_FORTIFY_SOURCE=2 cxxflags=-std=gnu++14 cxxflags=-fstack-protector linkflags=-Wl,-z,relro linkflags=-Wl,-z,now

to install the Boost headers and libraries. (Note the env PATH="$PATH" bit in the install command - this is because sudo usually resets PATH and that will cause Boost to rebuild everything again with the default compiler as part of the install!)

cppunit

Download the latest version of cppunit from http://dev-www.libreoffice.org/src/cppunit-1.13.2.tar.gz (or if that no longer exists by the time you read this, find the relevant link on http://dev-www.libreoffice.org/src).

Untar it to a temporary directory and run:

./configure --prefix=/usr/local/gcc73

This should build an appropriate Makefile. Assuming it does, type:

make
sudo make install

to install the cppunit headers, libraries, binaries and documentation.

patchelf

Obtain patchelf from http://nixos.org/releases/patchelf/patchelf-0.9/ - the download file will be patchelf-0.9.tar.bz2.

Extract it to a temporary directory using:

bzip2 -cd patchelf-0.9.tar.bz2 | tar xvf -

In the resulting patchelf-0.9 directory, run the:

./configure --prefix=/usr/local/gcc73

script. This should build an appropriate Makefile. Assuming it does, run:

make
sudo make install

to complete the build.