#PGMLab (Probablistic Graphical Model Laboratory)
PGMLab performs learning and inference in large discrete baysian networks. PGMLab is a standalone C library, which has command line and R interfaces. (M.H. Radfar, et al.)
PGMLab developed to fulfill three goals:
- To perform learning and inference in extremely large graphs.
- To be used by both expert and non expert in the field of machine learning
- To be as fast and as accurate as possible
##Authors
- Martin H. Radfar
- Adam J. Wright (Lead Developer)
- Martin D. Pham (Co-op Developer)
##Web Site
- Visit the PGMLab website here
##Wiki
- Please visit the PGMLab wiki here for details and theory.
- Make sure to follow the input file formats specifications here.
##System requirements
PGMLab has been tested on OS X and Ubuntu 14.04.
###How to download, install and Run PGMLab ####1. Download
- Download the latest version of PGMLab from here
####2. Installation
#####2.1 Dependencies
Linux: sudo apt-get install texinfo r-base
Mac OS X: follow the following instructions http://macappstore.org/texinfo/
#####2.2 Install PGMLab Package
-
Type the following commands in a terminal:
cd .../your-download-directory/PGMLab make
make install (this will link PGMLab to system paths, making PGMLab accessible from anywhere on the host)
####3 User Interfaces
There are two interaces to the shared object that come with this package: a command line interaface and a C interface. To make either of the interfaces you are required to have already compiled the PGMLab shared object (3.2).
#####3.1Command line interface
######3.1.2 Interacting with the command line interface
This command line interface can be used in two distinct ways. The first way is to supply the paths of the files in a config file, in the same way as the example config files in the config folder and flags to customize the parameters to be use and the second way is to input the information through an interactive interface. Further infromation on how to use these interface can be found in the wiki.
- Run the following commands for a description of the PGMLab command line interface.
pgmlab --help
- The following will be outputed from the previous command
pgmlab [-gliv] [--interactive] [--data-dir=<string>] [--pairwise-interaction-file=<file>] [--logical-factorgraph-file=<file>] [--estimated-parameters-file=<file>] [--learning-observed-data-file=<file>] [--inference-observed-data-file=<file>] [--inference-factorgraph-file=<file>] [--posterior-probability-file=<file>] [--number-of-states=<int>] [--em-max-iterations=<int>] [--training-samples=<int>] [--log-likelihood-change-limit=<double>] [--parameters-change-limit=<double>] [--logging-on] [--maximum-a-posteriori-estimation] [--help] [--version]
###Flag descriptions
-g, --generate-factorgraph Generate factor graph from reaction logic [pairwise-interaction-file, logical-factorgraph-file]
-l, --learning Run learning using training dataset [pairwise-interaction-file, logical-factorgraph-file, learning-observed-data-file, estimated-parameters-file]
-i, --inference Run inference given the states of visible sets [pairwise-interaction-file, inference-factorgaph-file, inference-observed-data-file, posterior-probability-file]
--interactive Interactive mode
--data-dir=<string> Path to folder containing data in specified folder structure and naming conventions
--pairwise-interaction-file=<file> File path to pairwise interaction file
--logical-factorgraph-file=<file> File path to factorgraph file create from pairwise interaction file
--estimated-parameters-file=<file> File path to factorgraph file generated by learning
--learning-observed-data-file=<file> File path to oberserved data used during learning
--inference-observed-data-file=<file> File path to oberserved data used during inference
--inference-factorgraph-file=<file> File path to factorgraph used during inference
--posterior-probability-file=<file> File path to where you would like the posterior probabiliies to be written
--number-of-states=<int> Number of states for each node (default is 2)
--em-max-iterations=<int> Maximum number of iterations in the EM algorithm - used in learning (default is 4000)
--log-likelihood-change-limit=<double> Stopping criteria: change in the ML - used in learning (default 1e-5)
--parameters-change-limit=<double> Stopping criteria: change in the parameters - used in learning (default 1e-3)
--logging-on Set this flag if you would like the learning step to print out the status into a log file (this file will have the same name as the estimate parameters file with .log appended to the end)
--maximum-a-posteriori-estimation Use this flag to set the MAP flag to 0 (default 1)
-v, --verbose will provide verbose output when using data dir
--help Display help and exit
--version Display version information and exit
*If you would like to use the interactive interface select the following flag: "--interactive".
#######3.1.3 Anaylzing multiple pathways
If you want to have PGMLab automatically process the data you will need to supply the data in a folder of your data with the "--data-dir" flag. When using this flag you will not need to specify any of the actions (infernce, learning, etc...) you will also not need to specify any files. However, if you would like to change defaults you can still use the flags to change their values for analysis.
#######3.1.3.1 Folder structure
When using the "data-dir" flag PGMLab expects the specified folder to follow certain specifications. Inside the spefied folder there should be a list of subfolders each named based on the name of the network. Inside each sub folder there should be a pairwise interaction file with the name "<networkname>.pi". From the pairwise interaction file a logicall factorgraph will be created called "logical.fg" If the folder contains a learning.obs file, PGMLab will assume that it contains the observation data for learning and will perform learning with the data in the file. The output will be a file called "learnt.fg". If there is a file called "inference.obs" PGMLab will perform inference creating a file conatianing the posterior probabilities called "<networkname>.pp". In the inference step PGM LAB will use the "learnt.fg" instead of "logical.fg" if it exists.
Files in "example" folder: example.pi learning.obs logical.fg learnt.fg infernece.obs example.pp
#####3.2 R interface
In order to call pgmlab from the R Console you will need to load the PGMLab R shared object.
######3.2.1 Running R in order to be able to access the PGMLab R shared object
- Run on of the first two commands, depending on your OS, and then run one of the two options in the last line.
cd R/pgmlabR/ (for Linux)
cd R/ (for OS X)
type r or rstudio
- You should now be in a R prompt
*The current working directory needs to be correct to have the shared obejects link to one another correctly
######3.2.2 Loading the PGMLab shared ojbect within R or Rstudio
- Run the following command in order to load the shared object
dyn.load("<path to repo>/PGMLab/R/pgmlabR/lib/pgmlabR.so") (for Linux)
dyn.load("pgmlabR/lib/pgmlabR.so") (for OS X)
######3.2.3 Description of functions available from the PGMLab R library
r_reaction_logic_to_factorgraph(SEXP reaction_logic_pathway_filepath_, SEXP pathway_filepath_, SEXP number_of_states_)
r_learning_discrete_BayNet(SEXP reaction_logic_pathway_filepath_, SEXP pathway_filepath_, SEXP observed_data_filepath_, SEXP estimated_parameters_filepath_, SEXP number_of_states_, SEXP em_max_iterations_, SEXP em_log_likelihood_change_limit_, SEXP map_flag_, SEXP logging_)
r_doLBPinference(SEXP reaction_logic_pathway_filepath_, SEXP pathway_filepath_, SEXP observed_data_filepath_, SEXP posterior_probabilities_filepath_, SEXP number_of_states_)
*All filepaths can be either full or abolute paths and the rest of the variables should be supplied as integer values.
######3.2.4 Example R function calls
*These relative paths are for linux and should be changed of OS X. In OS X the change should be to remove one of the '../' from the beginning of the each filepath.
- Reaction Logic to Factorgraph
.Call("r_reaction_logic_to_factorgraph", "../../data/munin-dataset/munin4_pairwise.txt", "../../data/munin-dataset/logical_factorgraph.txt",2)
- Learning
.Call("r_learning_discrete_BayNet", "../../data/munin-dataset/munin4_pairwise.txt", "../../data/munin-dataset/logical_factorgraph.txt", "../../data/munin-dataset/visibleSet_0.5.txt", "../../data/munin-dataset/estimated_parameters_0.5.txt", 2, 4000, 1e-5, 1e-3, 1, 1)
- Inference
.Call("r_doLBPinference", "../../data/munin-dataset/munin4_pairwise.txt", "../../data/munin-dataset/estimated_parameters_0.5.txt", "../../data/munin-dataset/visibleSet_0.7.txt", "../../data/munin-dataset/visibleSet_0.5.txt", 2)
*Functions will return 0 upon success and error codes otherwise.
####4 PGMLab dependencies
All resources are included in the PGMLab package.
#####4.1 Resources
Name | Description | Link |
---|---|---|
Minimal Perfect Hashing | Minimal Perfect Hashing wass Created by Bob Jenkins and is used to hashing the names of the nodes. This allows PGMLab to very quickely query nodes by their unique hash | http://burtleburtle.net/bob/hash/perfect.html |
#####4.2 External Libraries
Name | Description | Link |
---|---|---|
GNU Scientific Library (GSL) | GSL is a numerical library for C and C++ that provides a wirde range of mathematical routines | http://www.gnu.org/software/gsl/ |
GNU Readline Library | The GNU Readline library provides a set of functions for use by applications that allow users to edit command lines as they are typed in. This library is used for the interactive command line interface. | https://cnswww.cns.cwru.edu/php/chet/readline/rltop.html |
GNU Termcap Library | Termcap is a library and data base that enables programs to use display terminals in a terminal-independent manner | https://www.gnu.org/software/termutils/manual/termcap-1.3/html_mono/termcap.html |