Sleipnir is a C++ library enabling efficient analysis, integration, mining, and machine learning over genomic data. This includes a particular focus on microarrays, since they make up the bulk of available data for many organisms, but Sleipnir can also integrate a wide variety of other data types, from pairwise physical interactions to sequence similarity or shared transcription factor binding sites.
Main documentation: https://functionlab.github.io/sleipnir-docs/
The Sleipnir wiki and bug reporting system are at: (TBD)
The file README.developer has notes for Sleipnir developers.
Sleipnir also includes the code to compile SEEK (the human coexpression search engine). See the link http://seek.princeton.edu/installation.jsp for information on its installation.
The latest version of Sleipnir software can be obtained by issuing the following command:
git clone https://github.com/FunctionLab/sleipnir.git
-
Install g++, cmake
-
Install libraries
- On Mac:
brew install libsvmbrew install libompbrew install thriftbrew install gslbrew install boost
- On CentOS Linux:
sudo yum install libsvmsudo yum install libgompsudo yum install thrift-develsudo yum install gslsudo yum install boost
- On Ubuntu Linux:
apt-get updateapt-get install build-essentialapt-get install libsvm-devapt-get install libomp-devapt-get install libthrift-devapt-get install libgsl-devapt-get install libboost-devapt-get install libboost-graph-devapt-get install libboost-regex-devapt-get install libreadline-dev
- On Mac:
-
Clone repository
git clone https://github.com/FunctionLab/sleipnir.gitcd sleipnirgit submodule initgit submodule update
-
Prep make files with cmake
mkdir Debugcd Debug/cmake -DCMAKE_BUILD_TYPE=Debug ..- Alternately replace 'Debug' with 'Release' in all the above commands to make the release build
-
Build the code
- (On Mac) - Edit sleipnir/src/libsvm.h
- Replace: #include <libsvm/svm.h>
- With: #include <svm.h>
cd Debug/make- In case of errors:
make cleanmake VERBOSE=1
- In case of errors:
- (On Mac) - Edit sleipnir/src/libsvm.h
-
[Optional] Install SVM_PERF libraries to build: Data2SVM, SVMperfer, SVMperfing, SVMfe, SVMer
wget http://download.joachims.org/svm_perf/current/svm_perf.tar.gzmkdir svm_perf; cd svm_perf; tar xzvf ../svm_perf.tar.gzmakear rcs libsvmperf.a *.o /.ocd ..; cp -a svm_perf /usr/local/lib/ln -s /usr/local/lib/svm_perf/libsvmperf.a /usr/local/libln -s /usr/local/lib/svm_perf /usr/local/include
-
One-time prep: create the conda environment (by default this will create the 'genomics' conda env)
conda env create --file scripts/seek/conda_environment.yml
-
Run the c++ unit tests
Debug/tests/unit_tests
-
Test the scripts for building and merging SEEK database compendiums
conda activate genomicspython -m pytest -s -v scripts/seek/tests
-
Run the SEEK system tests (test SeekMiner and SeekRPC)
conda activate genomicspython -m pytest -s -v tests/
-
Run Seek DB tests (test that the database gives expected bio-informative results). These tests can only be run where the full SEEK database is installed.
cd tests/bioinform_tests- PREP: Install and init Git LFS (Large File Storage)
- On Mac:
brew install git-lfs - On Centos:
yum install git-lfs - On Ubuntu:
apt-get install git-lfs - Initialize git-lfs:
git lfs install - Refresh the gold standard tgz files (should be multipe MB in size)
rm gold_standard_results/*git restore gold_standard_results/*
- On Mac:
- Run the tests:
(The bioinform test has an option for different lengths of test, i.e. how many queries are run)
bash run_paramtest.sh -v -s <path_to_seek_db> -b <path_to_seek_binaries>bash run_querysize.sh -v -s <path_to_seek_db> -b <path_to_seek_binaries>bash run_bioinform.sh -v -s <path_to_seek_db> -b <path_to_seek_binaries> -t [tiny,short,medium,long]