diff --git a/README.md b/README.md
index 4e4738c..dcc6a36 100644
--- a/README.md
+++ b/README.md
@@ -9,47 +9,61 @@ Correlation
 - Spearman
 
 Clustering
-- K-means
 - Gaussian mixture models
 
 Thresholding
+- Power-law
 - Random matrix theory
 
-# Installation
+KINC is built with [ACE](https://github.com/SystemsGenetics/ACE), a framework which provides mechanisms for large-scale heterogeneous computing and data management. As such, KINC can be run in a variety of compute configurations, including single-core / single-GPU and multi-core / multi-GPU, and KINC uses its own binary file formats to represent the data objects that it produces. Each of these binary formats can be exported to a plain-text format for use in other applications.
 
-This software uses GSL, OpenCL, and [ACE](https://github.com/SystemsGenetics/ACE). For instructions on installing ACE, see the project repository. For all other dependencies, consult your package manager. For example, to install dependencies on Ubuntu:
-```
-sudo apt install libgsl2 ocl-icd-opencl-dev libopenmpi-dev
-```
+## Installation
+
+Refer to the files under `docs` for installation instructions. KINC is currently supported on most flavors of Linux.
 
-To build & install KINC:
+### Palmetto
+
+To use KINC on Palmetto, you must add the following modules in lieu of installing dependencies through a package manager:
+```bash
+module add cuda-toolkit/9.2
+module add gcc/5.4.0
+module add git
+module add gsl/2.3
+module add openmpi/1.10.7
+module add Qt/5.9.2
 ```
-cd build
-qmake ../src/KINC.pro
-make qmake_all
-make
-make qmake_all
-make install
+
+## Usage
+
+KINC provides two executables: `kinc`, the command-line version, and `qkinc`, the GUI version. The command-line version can use MPI while the GUI version can display data object files that are produced by KINC. KINC produces a gene-coexpression network in several steps:
+1. `import-emx`: Import expression matrix text file into binary format
+2. `similarity`: Compute a cluster matrix and correlation matrix from expression matrix
+3. `threshold`: Determine an appropriate correlation threshold for correlation matrix
+4. `extract`: Extract an edge list from a correlation matrix given a threshold
+
+Below is an example usage of `kinc` on the Yeast dataset:
 ```
+# import expression matrix into binary format
+kinc run import-emx --input Yeast-GEM.txt --output Yeast.emx --nan NA
 
-## Using the KINC GUI or Console
+# compute similarity matrix (with GMM clustering)
+mpirun -np 8 kinc run similarity --input Yeast.emx --ccm Yeast.ccm --cmx Yeast.cmx --clusmethod gmm --corrmethod spearman --minclus 1 --maxclus 5
 
-ACE provides two different libraries for GUI and console applications. The `kinc` executable is the console or command line version and the `qkinc` executable is the GUI version.
+# determine correlation threshold
+kinc run rmt --input Yeast.cmx --log Yeast.log
 
-# Usage
+# read threshold from log file
+THRESHOLD=$(tail -n 1 Yeast.log)
 
-To build a GCN involves several steps:
+# extract network file from thresholded similarity matrix
+kinc run extract --emx Yeast.emx --ccm Yeast.ccm --cmx Yeast.cmx --output Yeast-net.txt --mincorr $THRESHOLD
+```
 
-1. Import expression matrix
-2. Compute cluster composition matrix
-3. Compute correlation matrix
-4. Compute thresholded correlation matrix
+A more thorough example usage is provided in `scripts/run-all.sh`.
 
-# Troubleshooting
-## An error occurred in MPI_Init
-KINC requires MPI as a dependency, but on most systems you can execute the command-line KINC as a stand-alone tool without using 'mpirun'.  This is because KINC checks during runtime if MPI is appropriate for execution. However, on a SLURM cluster where MPI jobs must be run using the srun command and where PMI2 is compiled into MPI, then KINC cannot be executed stand-alone.  It must be executed using srun with the --mpi argument set to pmi2.  For example:
+### Running KINC on SLURM
 
+Although KINC is an MPI application, generally you can run `kinc` as a stand-alone application without `mpirun` and achieve normal serial behavior. However, on a SLURM cluster where MPI jobs must be run with the `srun` command and where PMI2 is compiled into MPI, `kinc` cannot be executed stand-alone. It must be executed using `srun` with the additional argument `--mpi=pmi2`. For example:
 ```
 srun --mpi=pmi2 kinc run import_emx --input Yeast-ematrix.txt --output Yeast.emx --nan NA
 ```
-
diff --git a/build-tests/.gitignore b/build-tests/.gitignore
deleted file mode 100644
index a5baada..0000000
--- a/build-tests/.gitignore
+++ /dev/null
@@ -1,3 +0,0 @@
-*
-!.gitignore
-
diff --git a/docs/Ubuntu_16_04_Setup.md b/docs/Ubuntu_16_04_Setup.md
index a9274ae..54d1a45 100644
--- a/docs/Ubuntu_16_04_Setup.md
+++ b/docs/Ubuntu_16_04_Setup.md
@@ -7,7 +7,7 @@ Use the following steps to setup KINC for development on Ubuntu 16.04:
 
 Most of the dependencies are available as packages:
 ```bash
-sudo apt install g++ libgsl-dev libopenblas-dev libopenmpi-dev ocl-icd-opencl-dev
+sudo apt install build-essential libgsl-dev libopenblas-dev libopenmpi-dev ocl-icd-opencl-dev
 ```
 
 For device drivers (AMD, Intel, NVIDIA, etc), refer to the manufacturer's website.
@@ -25,7 +25,7 @@ If you install Qt locally then you must add Qt to the executable path:
 
 ```bash
 # append to ~/.bashrc
-export QTDIR="$HOME/Qt/5.10.1/gcc_64"
+export QTDIR="$HOME/Qt/5.7.1/gcc_64"
 export PATH="$QTDIR/bin:$PATH"
 ```
 
@@ -34,8 +34,8 @@ export PATH="$QTDIR/bin:$PATH"
 Clone the ACE and KINC repositories from Github.
 
 ```bash
-git clone git@github.com:SystemsGenetics/ACE.git
-git clone git@github.com:SystemsGenetics/KINC.git
+git clone https://github.com/SystemsGenetics/ACE.git
+git clone https://github.com/SystemsGenetics/KINC.git
 ```
 
 ## Step 3: Build ACE and KINC
@@ -45,6 +45,9 @@ Follow the ACE instructions to build ACE. If you install ACE locally then you mu
 ```bash
 # append to ~/.bashrc
 export INSTALL_PREFIX="$HOME/software"
+export PATH="$INSTALL_PREFIX/bin:$PATH"
+export CPLUS_INCLUDE_PATH="$INSTALL_PREFIX/include:$CPLUS_INCLUDE_PATH"
+export LIBRARY_PATH="$INSTALL_PREFIX/lib:$LIBRARY_PATH"
 export LD_LIBRARY_PATH="$INSTALL_PREFIX/lib:$LD_LIBRARY_PATH"
 ```
 
@@ -52,7 +55,7 @@ Build & install KINC:
 
 ```bash
 cd build
-qmake ../src/KINC.pro
+qmake ../src/KINC.pro PREFIX=$INSTALL_PREFIX
 make qmake_all
 make
 make qmake_all
@@ -63,4 +66,4 @@ You should now be able to run KINC.
 
 ## (Optional) Use QtCreator
 
-Select **File** > **Open File or Project** and then navigate in the file browser to the ACE directory and select the ACE.pro file. Navigate through configure setup. Repeat for KINC.
+Select __File__ > __Open File or Project__ and then navigate in the file browser to the ACE directory and select the ACE.pro file. Navigate through configure setup. Repeat for KINC.
diff --git a/scripts/extract.py b/scripts/extract.py
new file mode 100644
index 0000000..c571957
--- /dev/null
+++ b/scripts/extract.py
@@ -0,0 +1,58 @@
+import argparse
+import pandas as pd
+
+
+
+if __name__ == "__main__":
+	# parse command-line arguments
+	parser = argparse.ArgumentParser()
+	parser.add_argument("--emx", required=True, help="expression matrix file", dest="EMX")
+	parser.add_argument("--cmx", required=True, help="correlation matrix file", dest="CMX")
+	parser.add_argument("-o", "--output", required=True, help="output net file", dest="OUTPUT")
+	parser.add_argument("--mincorr", type=float, default=0, help="minimum absolute correlation threshold", dest="MINCORR")
+	parser.add_argument("--maxcorr", type=float, default=1, help="maximum absolute correlation threshold", dest="MAXCORR")
+
+	args = parser.parse_args()
+
+	# load data
+	emx = pd.read_table(args.EMX)
+	cmx = pd.read_table(args.CMX, header=None, names=[
+		"x",
+		"y",
+		"Cluster",
+		"Num_Clusters",
+		"Cluster_Samples",
+		"Missing_Samples",
+		"Cluster_Outliers",
+		"Pair_Outliers",
+		"Too_Low",
+		"sc",
+		"Samples"
+	])
+
+	# extract correlations within thresholds
+	cmx = cmx[(args.MINCORR <= abs(cmx["sc"])) & (abs(cmx["sc"]) <= args.MAXCORR)]
+
+	# insert additional columns used in netlist format
+	cmx.insert(len(cmx.columns), "Source", [emx.index[x] for x in cmx["x"]])
+	cmx.insert(len(cmx.columns), "Target", [emx.index[y] for y in cmx["y"]])
+	cmx.insert(len(cmx.columns), "Interaction", ["co" for idx in cmx.index])
+
+	# reorder columns to netlist format
+	cmx = cmx[[
+		"Source",
+		"Target",
+		"sc",
+		"Interaction",
+		"Cluster",
+		"Num_Clusters",
+		"Cluster_Samples",
+		"Missing_Samples",
+		"Cluster_Outliers",
+		"Pair_Outliers",
+		"Too_Low",
+		"Samples"
+	]]
+
+	# save output data
+	cmx.to_csv(args.OUTPUT, sep="\t", index=False)
diff --git a/scripts/run-all-py.sh b/scripts/run-all-py.sh
new file mode 100755
index 0000000..3e21d89
--- /dev/null
+++ b/scripts/run-all-py.sh
@@ -0,0 +1,73 @@
+#!/bin/bash
+
+# parse command-line arguments
+if [[ $# != 1 ]]; then
+	echo "usage: $0 <infile>"
+	exit -1
+fi
+
+# define analytic flags
+DO_SIMILARITY=1
+DO_THRESHOLD=1
+DO_EXTRACT=1
+
+# define input/output files
+DATA="data"
+EMX_FILE="$1"
+CMX_FILE="$DATA/$(basename $EMX_FILE .txt)-cmx-py.txt"
+NET_FILE="$DATA/$(basename $EMX_FILE .txt)-net-py.txt"
+
+# similarity
+if [[ $DO_SIMILARITY = 1 ]]; then
+	CLUSMETHOD="gmm"
+	CORRMETHOD="pearson"
+	MINEXPR="-inf"
+	MINCLUS=1
+	MAXCLUS=5
+	CRITERION="bic"
+	PREOUT="--preout"
+	POSTOUT="--postout"
+	MINCORR=0
+	MAXCORR=1
+
+	python scripts/similarity.py \
+	   -i $EMX_FILE \
+	   -o $CMX_FILE \
+	   --clusmethod $CLUSMETHOD \
+	   --corrmethod $CORRMETHOD \
+	   --minexpr=$MINEXPR \
+	   --minclus $MINCLUS --maxclus $MAXCLUS \
+	   --crit $CRITERION \
+	   $PREOUT $POSTOUT \
+	   --mincorr $MINCORR --maxcorr $MAXCORR
+fi
+
+# threshold
+if [[ $DO_THRESHOLD = 1 ]]; then
+	NUM_GENES=$(expr $(cat $EMX_FILE | wc -l) - 1)
+	METHOD="rmt"
+	TSTART=0.99
+	TSTEP=0.001
+	TSTOP=0.50
+
+	python scripts/threshold.py \
+	   -i $CMX_FILE \
+	   --genes $NUM_GENES \
+	   --method $METHOD \
+	   --tstart $TSTART \
+	   --tstep $TSTEP \
+	   --tstop $TSTOP
+fi
+
+# extract
+if [[ $DO_EXTRACT = 1 ]]; then
+	MINCORR=0
+	MAXCORR=1
+
+	python scripts/extract.py \
+	   --emx $EMX_FILE \
+	   --cmx $CMX_FILE \
+	   --output $NET_FILE \
+	   --mincorr $MINCORR \
+	   --maxcorr $MAXCORR
+fi
diff --git a/scripts/run-all.sh b/scripts/run-all.sh
new file mode 100755
index 0000000..0fc3e0c
--- /dev/null
+++ b/scripts/run-all.sh
@@ -0,0 +1,108 @@
+#!/bin/bash
+
+# parse command-line arguments
+if [[ $# != 1 ]]; then
+	echo "usage: $0 <infile>"
+	exit -1
+fi
+
+GPU=1
+
+# define analytic flags
+DO_IMPORT_EMX=1
+DO_SIMILARITY=1
+DO_EXPORT_CMX=1
+DO_THRESHOLD=1
+DO_EXTRACT=1
+
+# define input/output files
+INFILE="$1"
+DATA="data"
+EMX_FILE="$DATA/$(basename $INFILE .txt).emx"
+CCM_FILE="$DATA/$(basename $EMX_FILE .emx).ccm"
+CMX_FILE="$DATA/$(basename $EMX_FILE .emx).cmx"
+LOGS="logs"
+RMT_FILE="$LOGS/$(basename $CMX_FILE .cmx).txt"
+
+# apply settings
+if [[ $GPU == 1 ]]; then
+   kinc settings set opencl 0:0
+   kinc settings set threads 4
+   kinc settings set logging off
+
+   NP=1
+else
+   kinc settings set opencl none
+   kinc settings set logging off
+
+   NP=$(nproc)
+fi
+
+# import emx
+if [[ $DO_IMPORT_EMX = 1 ]]; then
+	kinc run import-emx \
+		--input $INFILE \
+		--output $EMX_FILE \
+		--nan NA
+fi
+
+# similarity
+if [[ $DO_SIMILARITY = 1 ]]; then
+	CLUSMETHOD="gmm"
+	CORRMETHOD="pearson"
+	MINEXPR="-inf"
+	MINCLUS=1
+	MAXCLUS=5
+	CRITERION="BIC"
+	PREOUT="--preout"
+	POSTOUT="--postout"
+	MINCORR=0.5
+	MAXCORR=1
+
+	mpirun -np $NP kinc run similarity \
+	   --input $EMX_FILE \
+	   --ccm $CCM_FILE \
+	   --cmx $CMX_FILE \
+	   --clusmethod $CLUSMETHOD \
+	   --corrmethod $CORRMETHOD \
+	   --minexpr $MINEXPR \
+	   --minclus $MINCLUS --maxclus $MAXCLUS \
+	   --crit $CRITERION \
+	   $PREOUT $POSTOUT \
+	   --mincorr $MINCORR --maxcorr $MAXCORR
+fi
+
+# export cmx
+if [[ $DO_EXPORT_CMX = 1 ]]; then
+	OUTFILE="$DATA/$(basename $CMX_FILE .cmx)-cmx.txt"
+
+	kinc run export-cmx \
+	   --emx $EMX_FILE \
+	   --ccm $CCM_FILE \
+	   --cmx $CMX_FILE \
+	   --output $OUTFILE
+fi
+
+# threshold
+if [[ $DO_THRESHOLD = 1 ]]; then
+	mkdir -p $LOGS
+
+	kinc run rmt \
+	   --input $CMX_FILE \
+	   --log $RMT_FILE
+fi
+
+# extract
+if [[ $DO_EXTRACT = 1 ]]; then
+	NET_FILE="$DATA/$(basename $EMX_FILE .emx)-net.txt"
+	MINCORR=0
+	MAXCORR=1
+
+	kinc run extract \
+	   --emx $EMX_FILE \
+	   --ccm $CCM_FILE \
+	   --cmx $CMX_FILE \
+	   --output $NET_FILE \
+	   --mincorr $MINCORR \
+	   --maxcorr $MAXCORR
+fi
diff --git a/scripts/similarity.py b/scripts/similarity.py
new file mode 100644
index 0000000..e88f9bf
--- /dev/null
+++ b/scripts/similarity.py
@@ -0,0 +1,230 @@
+import argparse
+import matplotlib.pyplot as plt
+import numpy as np
+import pandas as pd
+import pprint
+import scipy.stats
+import seaborn as sns
+import sklearn.cluster
+import sklearn.mixture
+
+
+
+def create_gmm(n_clusters):
+	return sklearn.mixture.GaussianMixture(n_components=n_clusters)
+
+
+
+def create_kmeans(n_clusters):
+	return sklearn.cluster.KMeans(n_clusters=n_clusters, n_jobs=-1)
+
+
+
+def fetch_pair(emx, i, j, min_expression):
+	# extract pairwise data
+	X = emx.iloc[[i, j]].values.T
+
+	# initialize labels
+	y = np.zeros((X.shape[0],), dtype=int)
+
+	# mark thresholded samples
+	y[(X[:, 0] < min_expression) | (X[:, 1] < min_expression)] = -6
+
+	# mark nan samples
+	y[np.isnan(X[:, 0]) | np.isnan(X[:, 1])] = -9
+
+	return (X, y)
+
+
+
+def mark_outliers(X, labels, k, marker):
+	# extract samples in cluster k
+	mask = (labels == k)
+	x = np.copy(X[mask, 0])
+	y = np.copy(X[mask, 1])
+
+	# make sure cluster is not empty
+	if len(x) == 0 or len(y) == 0:
+		return
+
+	# sort arrays
+	x.sort()
+	y.sort()
+
+	# compute quartiles and thresholds for each axis
+	n = len(x)
+
+	Q1_x = x[n * 1 // 4]
+	Q3_x = x[n * 3 // 4]
+	T_x_min = Q1_x - 1.5 * (Q3_x - Q1_x)
+	T_x_max = Q3_x + 1.5 * (Q3_x - Q1_x)
+
+	Q1_y = y[n * 1 // 4]
+	Q3_y = y[n * 3 // 4]
+	T_y_min = Q1_y - 1.5 * (Q3_y - Q1_y)
+	T_y_max = Q3_y + 1.5 * (Q3_y - Q1_y)
+
+	# mark outliers
+	for i in range(len(labels)):
+		if labels[i] == k:
+			outlier_x = (X[i, 0] < T_x_min or T_x_max < X[i, 0])
+			outlier_y = (X[i, 1] < T_y_min or T_y_max < X[i, 1])
+
+			if outlier_x or outlier_y:
+				labels[i] = marker
+
+
+
+def compute_clustering(X, y, create_model, min_samples, min_clusters, max_clusters, criterion):
+	# extract clean pairwise data
+	mask = (y == 0)
+	X_clean = X[mask]
+	N = X_clean.shape[0]
+
+	# make sure there are enough samples
+	K = 0
+
+	if N >= min_samples:
+		# initialize clustering models
+		models = [create_model(K) for K in range(min_clusters, max_clusters+1)]
+		min_crit = float("inf")
+
+		# identify number of clusters
+		for model in models:
+			# fit model
+			model.fit(X_clean)
+
+			# compute criterion value
+			if criterion == "aic":
+				crit = model.aic(X_clean)
+			elif criterion == "bic":
+				crit = model.bic(X_clean)
+
+			# save the best model
+			if crit < min_crit:
+				min_crit = crit
+				K = len(model.weights_)
+				y[mask] = model.predict(X_clean)
+
+	return K, y
+
+
+
+def compute_correlation(X, y, k, method, min_samples, visualize):
+	# extract samples in cluster k
+	X_k = X[y == k]
+
+	# make sure there are enough samples
+	if X_k.shape[0] < min_samples:
+		return None, None
+
+	# compute correlation
+	corr, p = method(X_k[:, 0], X_k[:, 1])
+
+	# plot results
+	if visualize:
+		sns.jointplot(x=X_k[:, 0], y=X_k[:, 1], kind="reg", stat_func=method)
+		plt.show()
+
+	return corr, p
+
+
+
+if __name__ == "__main__":
+	# define clustering methods
+	CLUSTERING_METHODS = {
+		"none": None,
+		"gmm": create_gmm,
+		"kmeans": create_kmeans
+	}
+
+	# define correlation methods
+	CORRELATION_METHODS = {
+		"kendall": scipy.stats.kendalltau,
+		"pearson": scipy.stats.pearsonr,
+		"spearman": scipy.stats.spearmanr
+	}
+
+	# parse command-line arguments
+	parser = argparse.ArgumentParser()
+	parser.add_argument("-i", "--input", required=True, help="expression matrix file", dest="INPUT")
+	parser.add_argument("-o", "--output", required=True, help="correlation file", dest="OUTPUT")
+	parser.add_argument("--clusmethod", default="none", choices=["none", "gmm", "kmeans"], help="clustering method", dest="CLUSMETHOD")
+	parser.add_argument("--corrmethod", default="pearson", choices=["kendall", "pearson", "spearman"], help="correlation method", dest="CORRMETHOD")
+	parser.add_argument("--minexpr", type=float, default=-float("inf"), help="minimum expression threshold", dest="MINEXPR")
+	parser.add_argument("--minsamp", type=int, default=30, help="minimum sample size", dest="MINSAMP")
+	parser.add_argument("--minclus", type=int, default=1, help="minimum clusters", dest="MINCLUS")
+	parser.add_argument("--maxclus", type=int, default=5, help="maximum clusters", dest="MAXCLUS")
+	parser.add_argument("--crit", default="bic", choices=["aic", "bic"], help="model selection criterion", dest="CRITERION")
+	parser.add_argument("--preout", action="store_true", help="whether to remove pre-clustering outliers", dest="PREOUT")
+	parser.add_argument("--postout", action="store_true", help="whether to remove post-clustering outliers", dest="POSTOUT")
+	parser.add_argument("--mincorr", type=float, default=0, help="minimum absolute correlation threshold", dest="MINCORR")
+	parser.add_argument("--maxcorr", type=float, default=1, help="maximum absolute correlation threshold", dest="MAXCORR")
+	parser.add_argument("--pvalue", type=float, default=float("inf"), help="maximum p-value threshold for correlations", dest="MAXPVALUE")
+	parser.add_argument("--visualize", action="store_true", help="whether to visualize results", dest="VISUALIZE")
+
+	args = parser.parse_args()
+
+	# print arguments
+	pprint.pprint(vars(args))
+
+	# load data
+	emx = pd.read_table(args.INPUT)
+	cmx = open(args.OUTPUT, "w");
+
+	# iterate through each pair
+	for i in range(len(emx.index)):
+		for j in range(i):
+			# fetch pairwise input data
+			X, y = fetch_pair(emx, i, j, args.MINEXPR)
+
+			# remove pre-clustering outliers
+			if args.PREOUT:
+				mark_outliers(X, y, 0, -7)
+
+			# perform clustering
+			K = 1
+
+			if args.CLUSMETHOD != "none":
+				K, y = compute_clustering(X, y, CLUSTERING_METHODS[args.CLUSMETHOD], args.MINSAMP, args.MINCLUS, args.MAXCLUS, args.CRITERION)
+
+			print("%4d %4d %d" % (i, j, K))
+
+			# remove post-clustering outliers
+			if K > 1 and args.POSTOUT:
+				for k in range(K):
+					mark_outliers(X, y, k, -8)
+
+			# perform correlation
+			correlations = [compute_correlation(X, y, k, CORRELATION_METHODS[args.CORRMETHOD], args.MINSAMP, args.VISUALIZE) for k in range(K)]
+
+			# save correlation matrix
+			valid = [(corr != None and args.MINCORR <= abs(corr) and abs(corr) <= args.MAXCORR and p <= args.MAXPVALUE) for corr, p in correlations]
+			num_clusters = sum(valid)
+			cluster_idx = 0
+
+			for k in range(K):
+				corr, p = correlations[k]
+
+				# make sure correlation, p-value meets thresholds
+				if valid[k]:
+					# compute sample mask
+					y_k = np.copy(y)
+					y_k[(y_k >= 0) & (y_k != k)] = 0
+					y_k[y_k == k] = 1
+					y_k[y_k < 0] *= -1
+
+					sample_mask = "".join([str(y_i) for y_i in y_k])
+
+					# compute summary statistics
+					num_samples = sum(y_k == 1)
+					num_threshold = sum(y_k == 6)
+					num_preout = sum(y_k == 7)
+					num_postout = sum(y_k == 8)
+					num_missing = sum(y_k == 9)
+
+					# write correlation to file
+					cmx.write("%d\t%d\t%d\t%d\t%d\t%d\t%d\t%d\t%d\t%0.8f\t%s\n" % (i, j, cluster_idx, num_clusters, num_samples, num_missing, num_postout, num_preout, num_threshold, corr, sample_mask))
+
+					# increment cluster index
+					cluster_idx += 1
diff --git a/scripts/test-gmm.py b/scripts/test-gmm.py
deleted file mode 100644
index ca00d68..0000000
--- a/scripts/test-gmm.py
+++ /dev/null
@@ -1,59 +0,0 @@
-import matplotlib.pyplot as plt
-import pandas as pd
-import sklearn.mixture
-import sys
-
-
-
-if __name__ == "__main__":
-	if len(sys.argv) != 2:
-		print "usage: python test-gmm.py [infile]"
-		sys.exit(1)
-
-	# load data
-	emx = pd.read_csv(sys.argv[1], sep="\t")
-
-	# iterate through each pair
-	for i in xrange(len(emx.index)):
-		for j in xrange(i):
-			# extract pairwise data
-			X = emx.iloc[[i, j]].dropna(axis=1, how="any")
-			X = X.values.T
-			N = X.shape[0]
-
-			# make sure there are enough samples
-			min_K = 0
-			min_crit = float("inf")
-
-			if N >= 30:
-				# initialize clustering models
-				models = [sklearn.mixture.GaussianMixture(n_components=n+1) for n in xrange(5)]
-
-				# identify number of clusters
-				for k, model in enumerate(models):
-					# fit model
-					model.fit(X)
-
-					# save the best model
-					crit = model.aic(X)
-					if crit < min_crit:
-						min_K = len(model.weights_)
-						min_crit = crit
-
-				# plot clustering results
-				plt.subplots(1, len(models), True, True, figsize=(5 * len(models), 5))
-
-				for k, model in enumerate(models):
-					K = len(model.weights_)
-					crit = model.aic(X)
-					y = model.predict(X)
-
-					plt.subplot(1, len(models), k + 1)
-					plt.scatter(X[:, 0], X[:, 1], s=20, c=y, cmap="brg")
-					plt.title("N = %d, K = %d, crit = %g" % (N, K, crit))
-					plt.xlabel(emx.index[i])
-					plt.ylabel(emx.index[j])
-
-				plt.show()
-
-			print "%4d %4d: N = %4d, K = %4d" % (i, j, N, min_K)
diff --git a/scripts/test-vbgmm.py b/scripts/test-vbgmm.py
index d8ac6ba..743c97f 100644
--- a/scripts/test-vbgmm.py
+++ b/scripts/test-vbgmm.py
@@ -1,47 +1,129 @@
 import matplotlib.pyplot as plt
+import numpy as np
 import pandas as pd
 import sklearn.mixture
 import sys
 
 
 
+def fetch_pair(emx, i, j, min_expression):
+	# extract pairwise data
+	X = emx.iloc[[i, j]].values.T
+
+	# initialize labels
+	y = np.zeros((X.shape[0],), dtype=int)
+
+	# mark thresholded samples
+	y[(X[:, 0] < min_expression) | (X[:, 1] < min_expression)] = -6
+
+	# mark nan samples
+	y[np.isnan(X[:, 0]) | np.isnan(X[:, 1])] = -9
+
+	return X, y
+
+
+
+def compute_gmm(X, n_components):
+	# initialize clustering model
+	model = sklearn.mixture.GaussianMixture(n_components)
+
+	# fit clustering model
+	model.fit(X)
+
+	# save clustering results
+	K = n_components
+	y = model.predict(X)
+
+	# compute criterion value
+	crit = model.bic(X)
+
+	# print results
+	print("%4d %4d: %8s: K = %4d, crit = %g" % (i, j, "GMM", K, crit))
+
+	return K, y, crit
+
+
+
+def compute_vbgmm(X, n_components, weight_concentration_prior, weight_threshold):
+	# initialize clustering model
+	model = sklearn.mixture.BayesianGaussianMixture(n_components, weight_concentration_prior=weight_concentration_prior)
+
+	# fit clustering model
+	model.fit(X)
+
+	print("".join(["%8.3f" % w for w in model.weights_]))
+
+	# compute number of effective components
+	K = sum([(w > weight_threshold) for w in model.weights_])
+
+	# save clustering results
+	y = model.predict(X)
+
+	# print results
+	print("%4d %4d: %8s: y_0 = %g, K = %d" % (i, j, "VBGMM", weight_concentration_prior, K))
+
+	return K, y
+
+
+
 if __name__ == "__main__":
 	if len(sys.argv) != 2:
-		print "usage: python test-vbgmm.py [infile]"
+		print("usage: python test-vbgmm.py [infile]")
 		sys.exit(1)
 
+	# define parameters
+	min_expression = float("-inf")
+	min_samples = 30
+	min_clusters = 1
+	max_clusters = 5
+	weight_concentration_priors = [1e-6, 1e-3, 1e0, 1e3, 1e6]
+	weight_threshold = 0.05
+
 	# load data
-	emx = pd.read_csv(sys.argv[1], sep="\t")
+	emx = pd.read_table(sys.argv[1])
 
 	# iterate through each pair
-	for i in xrange(len(emx.index)):
-		for j in xrange(i):
-			# extract pairwise data
-			X = emx.iloc[[i, j]].dropna(axis=1, how="any")
-			X = X.values.T
-			N = X.shape[0]
+	for i in range(len(emx.index)):
+		for j in range(i):
+			# extract clean pairwise data
+			X, y = fetch_pair(emx, i, j, min_expression)
+			X = X[y == 0]
+
+			if len(X) < min_samples:
+				continue
+
+			# compute Gaussian mixture models
+			gmms = []
+
+			for n_components in range(min_clusters, max_clusters + 1):
+				gmms.append(compute_gmm(X, n_components))
 
-			# make sure there are enough samples
-			K = 0
+			# compute variational Bayesian Gaussian mixture models
+			vbgmms = []
 
-			if N >= 30:
-				# initialize clustering model
-				model = sklearn.mixture.BayesianGaussianMixture(n_components=5, weight_concentration_prior=1e3)
+			for weight_concentration_prior in weight_concentration_priors:
+				vbgmms.append(compute_vbgmm(X, max_clusters, weight_concentration_prior, weight_threshold))
 
-				# fit clustering model
-				model.fit(X)
+			# plot comparison of GMMs and VBGMMs
+			rows, cols = 2, max(len(gmms), len(vbgmms))
+			plt.figure(figsize=(5 * cols, 5 * rows))
 
-				print "".join(["%8.3f" % w for w in model.weights_])
+			for k in range(len(gmms)):
+				K, y, crit = gmms[k]
+
+				plt.subplot(rows, cols, k + 1)
+				plt.scatter(X[:, 0], X[:, 1], s=20, c=y, cmap="brg")
+				plt.title("GMM: K = %d, crit = %g" % (K, crit))
+				plt.xlabel(emx.index[i])
+				plt.ylabel(emx.index[j])
 
-				# compute number of effective components
-				K = sum([1 for w in model.weights_ if w > 0.05])
+			for k in range(len(vbgmms)):
+				K, y = vbgmms[k]
 
-				# plot clustering results
-				y = model.predict(X)
+				plt.subplot(rows, cols, cols + k + 1)
 				plt.scatter(X[:, 0], X[:, 1], s=20, c=y, cmap="brg")
-				plt.title("N = %d, K = %d" % (N, K))
+				plt.title("VBGMM: y_0 = %.0e, K = %d" % (weight_concentration_priors[k], K))
 				plt.xlabel(emx.index[i])
 				plt.ylabel(emx.index[j])
-				plt.show()
 
-			print "%4d %4d: N = %4d, K = %4d" % (i, j, N, K)
+			plt.show()
diff --git a/scripts/threshold.py b/scripts/threshold.py
new file mode 100644
index 0000000..7ba4b0e
--- /dev/null
+++ b/scripts/threshold.py
@@ -0,0 +1,273 @@
+import argparse
+import math
+import matplotlib.pyplot as plt
+import numpy as np
+import pandas as pd
+import pprint
+import scipy.interpolate
+import scipy.stats
+import seaborn as sns
+import sklearn.cluster
+import sklearn.mixture
+import sys
+
+
+
+def load_cmx(filename, num_genes, num_clusters):
+	netlist = pd.read_table(args.INPUT, header=None)
+	cmx = np.zeros((num_genes * num_clusters, num_genes * num_clusters), dtype=np.float32)
+
+	for idx in range(len(netlist.index)):
+		i = netlist.iloc[idx, 0]
+		j = netlist.iloc[idx, 1]
+		k = netlist.iloc[idx, 2]
+		r = netlist.iloc[idx, 9]
+
+		cmx[i * num_clusters + k, j * num_clusters + k] = r
+		cmx[j * num_clusters + k, i * num_clusters + k] = r
+
+	return cmx
+
+
+
+def powerlaw(args):
+	# load correlation matrix
+	S = load_cmx(args.INPUT, args.NUM_GENES, args.MAX_CLUSTERS)
+
+	# iterate until network is sufficiently scale-free
+	threshold = args.TSTART
+
+	while True:
+		# compute thresholded adjacency matrix
+		A = (abs(S) >= threshold)
+
+		# compute degree of each node
+		for i in range(A.shape[0]):
+			A[i, i] = 0
+
+		degrees = np.array([sum(A[i]) for i in range(A.shape[0])])
+
+		# compute degree distribution
+		bins = max(5, degrees.max())
+		hist, _ = np.histogram(degrees, bins=bins, range=(1, bins))
+		bin_edges = range(1, len(hist) + 1)
+
+		# modify histogram values to work with loglog plot
+		hist += 1
+
+		# plot degree distribution
+		if args.VISUALIZE:
+			plt.subplots(1, 2, figsize=(10, 5))
+			plt.subplot(121)
+			plt.plot(bin_edges, hist, "ko")
+			plt.subplot(122)
+			plt.loglog(bin_edges, hist, "ko")
+			plt.savefig("plots/powerlaw/%03d.png" % (int(threshold * 100)))
+			plt.close()
+
+		# compute correlation
+		x = np.log(bin_edges)
+		y = np.log(hist)
+
+		r, p = scipy.stats.pearsonr(x, y)
+
+		# output results of threshold test
+		print("%g\t%g\t%g" % (threshold, r, p))
+
+		# break if power law is satisfied
+		if r < 0 and p < 1e-20:
+			break
+
+		# decrement threshold and fail if minimum threshold is reached
+		threshold -= args.TSTEP
+		if threshold < args.TSTOP:
+			print("error: could not find an adequate threshold above stopping threshold")
+			sys.exit(0)
+
+	return threshold
+
+
+
+def compute_pruned_matrix(S, threshold):
+	S_pruned = np.copy(S)
+	S_pruned[abs(S) < threshold] = 0
+	S_pruned = S_pruned[~np.all(S_pruned == 0, axis=1)]
+	S_pruned = S_pruned[:, ~np.all(S_pruned == 0, axis=0)]
+
+	return S_pruned
+
+
+
+def compute_degenerate(eigens):
+	unique = []
+	
+	for i in range(len(eigens)):
+		if len(unique) == 0 or abs(eigens[i] - unique[-1]) > 1e-6:
+			unique.append(eigens[i])
+
+	return unique
+
+
+
+def compute_spacings(eigens, pace):
+	# extract eigenvalues for spline based on pace
+	x = eigens[::pace]
+	y = np.linspace(0, 1, len(x))
+
+	# compute spline
+	spl = scipy.interpolate.splrep(x, y)
+
+	# extract interpolated eigenvalues from spline
+	spline_eigens = scipy.interpolate.splev(eigens, spl)
+
+	# compute spacings between interpolated eigenvalues
+	spacings = np.empty(len(eigens) - 1)
+	
+	for i in range(len(spacings)):
+		spacings[i] = (spline_eigens[i + 1] - spline_eigens[i]) * len(eigens)
+
+	return spacings
+
+
+
+def compute_chi_square_pace(eigens, pace):
+	# compute eigenvalue spacings
+	spacings = compute_spacings(eigens, pace)
+
+	# compute nearest-neighbor spacing distribution
+	hist_min = 0.0
+	hist_max = 3.0
+	num_bins = 60
+	bin_width = (hist_max - hist_min) / num_bins
+
+	hist, _ = np.histogram(spacings, num_bins, (hist_min, hist_max))
+	
+	# compote chi-square value from nnsd
+	chi = 0
+	
+	for i in range(len(hist)):
+		# compute O_i, the number of elements in bin i
+		O_i = hist[i]
+
+		# compute E_i, the expected value of Poisson distribution for bin i
+		E_i = (math.exp(-i * bin_width) - math.exp(-(i + 1) * bin_width)) * len(eigens)
+
+		# update chi-square value based on difference between O_i and E_i
+		chi += (O_i - E_i) * (O_i - E_i) / E_i
+
+	print("pace: %d, chi: %g" % (pace, chi))
+
+	return chi
+
+
+
+def compute_chi_square(eigens):
+	# compute unique eigenvalues
+	unique = compute_degenerate(eigens)
+
+	print("eigenvalues: %d" % len(eigens))
+	print("unique eigenvalues: %d" % len(unique))
+
+	# make sure there are enough eigenvalues
+	if len(unique) < 50:
+		return -1
+
+	# perform several chi-square tests by varying the pace
+	chi = 0
+	num_tests = 0
+
+	for pace in range(10, 41):
+		# make sure there are enough eigenvalues for pace
+		if len(unique) / pace < 5:
+			break
+
+		chi += compute_chi_square_pace(unique, pace)
+		num_tests += 1
+	
+	# compute average of chi-square tests
+	chi /= num_tests
+
+	# return chi value
+	return chi
+
+
+
+def rmt(args):
+	# load correlation matrix
+	S = load_cmx(args.INPUT, args.NUM_GENES, args.MAX_CLUSTERS)
+
+	# iterate until chi value goes below 99.607 then above 200
+	final_threshold = 0
+	final_chi = float("inf")
+	max_chi = -float("inf")
+	threshold = args.TSTART
+
+	while max_chi < 200:
+		# compute pruned matrix
+		S_pruned = compute_pruned_matrix(S, threshold)
+
+		# make sure pruned matrix is not empty
+		chi = -1
+
+		if S_pruned.shape[0] > 0:
+			# compute eigenvalues of pruned matrix
+			eigens, _ = np.linalg.eigh(S_pruned)
+
+			# compute chi-square value from NNSD of eigenvalues
+			chi = compute_chi_square(eigens)
+
+		# make sure chi-square test succeeded
+		if chi != -1:
+			# save most recent chi-square value less than critical value
+			if chi < 99.607:
+				final_chi = chi
+				final_threshold = threshold
+
+			# save largest chi-square value which occurs after final_chi
+			if final_chi < 99.607 and chi > final_chi:
+				max_chi = chi
+
+		# output results of threshold test
+		print("%f\t%d\t%f" % (threshold, S_pruned.shape[0], chi))
+
+		# decrement threshold and fail if minimum threshold is reached
+		threshold -= args.TSTEP
+		if threshold < args.TSTOP:
+			print("error: could not find an adequate threshold above stopping threshold")
+			sys.exit(0)
+
+	return final_threshold
+
+
+
+if __name__ == "__main__":
+	# define threshold methods
+	METHODS = {
+		"powerlaw": powerlaw,
+		"rmt": rmt
+	}
+
+	# parse command-line arguments
+	parser = argparse.ArgumentParser()
+	parser.add_argument("-i", "--input", required=True, help="correlation matrix file", dest="INPUT")
+	parser.add_argument("--genes", type=int, required=True, help="number of genes", dest="NUM_GENES")
+	parser.add_argument("--method", default="rmt", choices=["powerlaw", "rmt"], help="thresholding method", dest="METHOD")
+	parser.add_argument("--tstart", type=float, default=0.99, help="starting threshold", dest="TSTART")
+	parser.add_argument("--tstep", type=float, default=0.001, help="threshold step size", dest="TSTEP")
+	parser.add_argument("--tstop", type=float, default=0.5, help="stopping threshold", dest="TSTOP")
+	parser.add_argument("--minclus", type=int, default=1, help="minimum clusters", dest="MIN_CLUSTERS")
+	parser.add_argument("--maxclus", type=int, default=5, help="maximum clusters", dest="MAX_CLUSTERS")
+	parser.add_argument("--visualize", action="store_true", help="whether to visualize results", dest="VISUALIZE")
+
+	args = parser.parse_args()
+
+	# print arguments
+	pprint.pprint(vars(args))
+
+	# load data
+	cmx = pd.read_table(args.INPUT)
+
+	# initialize method
+	compute_threshold = METHODS[args.METHOD]
+
+	print(compute_threshold(args))
diff --git a/scripts/validate.py b/scripts/validate.py
new file mode 100644
index 0000000..1cef709
--- /dev/null
+++ b/scripts/validate.py
@@ -0,0 +1,108 @@
+import argparse
+import numpy as np
+import pandas as pd
+
+
+
+def pairwise_error(pair_true, pair_test, K, column_idx):
+	error = 0.0
+
+	for k in range(K):
+		x_true = pair_true.iloc[k, column_idx]
+		x_test = pair_test.iloc[k, column_idx]
+
+		error += abs(x_true - x_test) / K
+
+	return error
+
+
+
+if __name__ ==  "__main__":
+	# parse command-line arguments
+	parser = argparse.ArgumentParser()
+	parser.add_argument("--true", required=True, help="true correlation file", dest="CMX_TRUE")
+	parser.add_argument("--test", required=True, help="test correlation file", dest="CMX_TEST")
+
+	args = parser.parse_args()
+
+	# load input data
+	cmx_true = pd.read_table(args.CMX_TRUE, header=None, index_col=False)
+	cmx_test = pd.read_table(args.CMX_TEST, header=None, index_col=False)
+
+	# compore number of pairs
+	print("Number of pairs (true): %d" % len(cmx_true.index))
+	print("Number of pairs (test): %d" % len(cmx_test.index))
+
+	# get list of all pairs
+	pairs_true = [(cmx_true.iloc[idx, 0], cmx_true.iloc[idx, 1]) for idx in cmx_true.index]
+	pairs_test = [(cmx_test.iloc[idx, 0], cmx_test.iloc[idx, 1]) for idx in cmx_test.index]
+	pairs = list(set(pairs_true + pairs_test))
+
+	pairs.sort()
+
+	# compute pairwise statistics
+	error_K = 0.0
+	error_N_c = 0.0
+	error_N_m = 0.0
+	error_N_t = 0.0
+	error_N_o1 = 0.0
+	error_N_o2 = 0.0
+	error_r = 0.0
+	error_S = 0.0
+
+	for idx in pairs:
+		# extract pair from each cmx
+		pair_true = cmx_true.loc[(cmx_true[0] == idx[0]) & (cmx_true[1] == idx[1])]
+		pair_test = cmx_test.loc[(cmx_test[0] == idx[0]) & (cmx_test[1] == idx[1])]
+
+		# compute error in number of clusters
+		K_true = 0 if pair_true.empty else pair_true.iloc[0, 3]
+		K_test = 0 if pair_test.empty else pair_test.iloc[0, 3]
+
+		error_K += abs(K_true - K_test) / len(pairs)
+
+		# report errors
+		if K_true != K_test:
+			print("%4d %4d: %d != %d" % (idx[0], idx[1], K_true, K_test))
+
+		# use smaller K for cluster-wise comparisons
+		K = min(K_true, K_test)
+
+		# compute error in clean sample size
+		error_N_c += pairwise_error(pair_true, pair_test, K, 4) / len(pairs)
+
+		# compute error in missing sample size
+		error_N_m += pairwise_error(pair_true, pair_test, K, 5) / len(pairs)
+
+		# compute error in thresholded sample size
+		error_N_t += pairwise_error(pair_true, pair_test, K, 6) / len(pairs)
+
+		# compute error in thresholded sample size
+		error_N_o1 += pairwise_error(pair_true, pair_test, K, 7) / len(pairs)
+
+		# compute error in thresholded sample size
+		error_N_o2 += pairwise_error(pair_true, pair_test, K, 8) / len(pairs)
+
+		# compute error in correlation
+		error_r += pairwise_error(pair_true, pair_test, K, 9) / len(pairs)
+
+		# compute error in sample mask
+		error_S_pair = 0.0
+
+		for k in range(K):
+			S_true = pair_true.iloc[k, 10]
+			S_test = pair_test.iloc[k, 10]
+
+			error_S_pair += sum([(s_true != s_test) for s_true, s_test in zip(S_true, S_test)]) / len(S_true) / K
+
+		error_S += error_S_pair / len(pairs)
+
+	print("\nError summary:")
+	print("  Number of clusters:       %8.3f" % (error_K))
+	print("  Clean sample size:        %8.3f" % (error_N_c))
+	print("  Missing sample size:      %8.3f" % (error_N_m))
+	print("  Thresholded sample size:  %8.3f" % (error_N_t))
+	print("  Pre-outlier sample size:  %8.3f" % (error_N_o1))
+	print("  Post-outlier sample size: %8.3f" % (error_N_o2))
+	print("  Correlation:              %8.3f" % (error_r))
+	print("  Sample mask:              %8.3f" % (error_S))
diff --git a/scripts/visualize.py b/scripts/visualize.py
new file mode 100644
index 0000000..e018ebd
--- /dev/null
+++ b/scripts/visualize.py
@@ -0,0 +1,83 @@
+import argparse
+import matplotlib.pyplot as plt
+import numpy as np
+import os
+import pandas as pd
+import scipy.stats
+import seaborn as sns
+
+
+
+if __name__ ==  "__main__":
+	# parse command-line arguments
+	parser = argparse.ArgumentParser()
+	parser.add_argument("-e", "--emx", required=True, help="expression matrix file", dest="EMX")
+	parser.add_argument("-n", "--netlist", required=True, help="netlist file", dest="NETLIST")
+	parser.add_argument("-o", "--output", required=True, help="output directory", dest="OUTPUT")
+	parser.add_argument("-s", "--scale", action="store_true", help="use a uniform global scale", dest="SCALE")
+
+	args = parser.parse_args()
+
+	# load input data
+	emx = pd.read_table(args.EMX, index_col=0)
+	netlist = pd.read_table(args.NETLIST)
+
+	print("Loaded expression matrix (%d genes, %d samples)" % emx.shape)
+	print("Loaded netlist (%d edges)" % len(netlist.index))
+
+	# setup plot limits
+	if args.SCALE:
+		limits = (emx.min().min(), emx.max().max())
+	else:
+		limits = None
+
+	# initialize output directory
+	if not os.path.exists(args.OUTPUT):
+		os.mkdir(args.OUTPUT)
+
+	# iterate through each network edge
+	for idx in netlist.index:
+		edge = netlist.iloc[idx]
+		x = edge["Source"]
+		y = edge["Target"]
+		k = edge["Cluster"]
+
+		print(x, y, k)
+
+		# extract pairwise data
+		labels = np.array([int(s) for s in edge["Samples"]])
+		mask1 = (labels != 9)
+
+		labels = labels[mask1]
+		mask2 = (labels == 1)
+
+		data = emx.loc[[x, y]].values[:, mask1]
+
+		# highlight samples in the edge
+		colors = np.array(["k" for _ in labels])
+		colors[mask2] = "r"
+
+		# compute Spearman correlation
+		r, p = scipy.stats.spearmanr(data[0, mask2], data[1, mask2])
+
+		# create figure
+		plt.subplots(1, 2, sharex=True, sharey=True, figsize=(10, 5))
+
+		# create density plot
+		plt.subplot(121)
+		plt.xlim(limits)
+		plt.ylim(limits)
+		sns.kdeplot(data[0], data[1], shade=True, shade_lowest=False)
+
+		# create scatter plot
+		plt.subplot(122)
+		plt.title("k=%d, samples=%d, spearmanr=%0.2f" % (k, edge["Cluster_Samples"], r))
+		plt.xlim(limits)
+		plt.ylim(limits)
+		plt.xlabel(x)
+		plt.ylabel(y)
+		plt.scatter(data[0], data[1], color="w", edgecolors=colors)
+
+		# save plot to file
+		plt.savefig("%s/%s_%s_%d.png" % (args.OUTPUT, x, y, k))
+		plt.close()
diff --git a/src/KINC.pri b/src/KINC.pri
index f4f1a1d..65f1e28 100644
--- a/src/KINC.pri
+++ b/src/KINC.pri
@@ -1,11 +1,9 @@
 
-# Default settings for MPI CXX include
-isEmpty(MPICXX) { MPICXX = "yes" }
-
 # Versions
 MAJOR_VERSION = 3
 MINOR_VERSION = 2
-REVISION = 0
+REVISION = 2
+
 VERSION = $${MAJOR_VERSION}.$${MINOR_VERSION}.$${REVISION}
 
 # Version compiler defines
@@ -17,24 +15,22 @@ DEFINES += \
 
 # Basic settings
 QT += core
-TEMPLATE = app
 QMAKE_CXX = mpic++
 CONFIG += c++11
 
-# Compiler defines
-DEFINES += QT_DEPRECATED_WARNINGS
-
-# External libraries
-LIBS += -lmpi
-equals(MPICXX,"yes") { LIBS += -lmpi_cxx }
-LIBS += -lacecore -lOpenCL -lgsl -lgslcblas -L$${PWD}/../build/libs -lkinccore
-
 # Used to ignore useless warnings with OpenCL
 QMAKE_CXXFLAGS += -Wno-ignored-attributes
 
-# Source files
-SOURCES += \
-    ../main.cpp \
+# Default settings for MPI CXX include
+isEmpty(MPICXX) { MPICXX = "yes" }
+
+# External libraries
+LIBS += \
+    -L$${PWD}/../build/libs -lkinccore \
+    -lacecore \
+    -lgsl -lgslcblas -llapack -llapacke \
+    -lOpenCL -lmpi
+equals(MPICXX,"yes") { LIBS += -lmpi_cxx }
 
 # Resource files
 RESOURCES += \
diff --git a/src/KINC.pro b/src/KINC.pro
index 4dfb103..29395d6 100644
--- a/src/KINC.pro
+++ b/src/KINC.pro
@@ -1,4 +1,8 @@
 
+# Minimum Qt version
+lessThan(QT_MAJOR_VERSION,5): error("Requires Qt 5")
+lessThan(QT_MINOR_VERSION,7): error("Requires Qt 5.7")
+
 # Default setting for GUI
 isEmpty(GUI) { GUI = "yes" }
 
@@ -8,10 +12,12 @@ TEMPLATE = subdirs
 # Subdir projects
 SUBDIRS += \
     core \
-    cli
+    cli \
+    tests
 
 # Dependencies
 cli.depends = core
+tests.depends = core
 
 # This is if GUI is enabled
 equals(GUI,"yes") {
diff --git a/src/cli/cli.pro b/src/cli/cli.pro
index 27caef9..17610dd 100644
--- a/src/cli/cli.pro
+++ b/src/cli/cli.pro
@@ -4,6 +4,7 @@ include (../KINC.pri)
 
 # Basic settings
 TARGET = kinc
+TEMPLATE = app
 
 # External libraries
 LIBS += -laceconsole
@@ -11,6 +12,10 @@ LIBS += -laceconsole
 # Compiler defines
 DEFINES += GUI=0
 
+# Source files
+SOURCES += \
+    ../main.cpp
+
 # Installation instructions
 isEmpty(PREFIX) { PREFIX = /usr/local }
 program.path = $${PREFIX}/bin
diff --git a/src/core/analyticfactory.cpp b/src/core/analyticfactory.cpp
index 0e409ad..21d4a6c 100644
--- a/src/core/analyticfactory.cpp
+++ b/src/core/analyticfactory.cpp
@@ -4,6 +4,7 @@
 #include "importcorrelationmatrix.h"
 #include "exportcorrelationmatrix.h"
 #include "similarity.h"
+#include "powerlaw.h"
 #include "rmt.h"
 #include "extract.h"
 
@@ -16,8 +17,13 @@ using namespace std;
 
 
 
+/*!
+ * Return the total number of analytic types that this program implements.
+ */
 quint16 AnalyticFactory::size() const
 {
+   EDEBUG_FUNC(this);
+
    return Total;
 }
 
@@ -26,8 +32,15 @@ quint16 AnalyticFactory::size() const
 
 
 
+/*!
+ * Return the display name for the given analytic type.
+ *
+ * @param type
+ */
 QString AnalyticFactory::name(quint16 type) const
 {
+   EDEBUG_FUNC(this,type);
+
    switch (type)
    {
    case ImportExpressionMatrixType: return "Import Expression Matrix";
@@ -35,7 +48,8 @@ QString AnalyticFactory::name(quint16 type) const
    case ImportCorrelationMatrixType: return "Import Correlation Matrix";
    case ExportCorrelationMatrixType: return "Export Correlation Matrix";
    case SimilarityType: return "Similarity";
-   case RMTType: return "RMT Thresholding";
+   case PowerLawType: return "Threshold (Power-law)";
+   case RMTType: return "Threshold (RMT)";
    case ExtractType: return "Extract Network";
    default: return QString();
    }
@@ -46,8 +60,15 @@ QString AnalyticFactory::name(quint16 type) const
 
 
 
+/*!
+ * Return the command line name for the given analytic type.
+ *
+ * @param type
+ */
 QString AnalyticFactory::commandName(quint16 type) const
 {
+   EDEBUG_FUNC(this,type);
+
    switch (type)
    {
    case ImportExpressionMatrixType: return "import-emx";
@@ -55,6 +76,7 @@ QString AnalyticFactory::commandName(quint16 type) const
    case ImportCorrelationMatrixType: return "import-cmx";
    case ExportCorrelationMatrixType: return "export-cmx";
    case SimilarityType: return "similarity";
+   case PowerLawType: return "powerlaw";
    case RMTType: return "rmt";
    case ExtractType: return "extract";
    default: return QString();
@@ -66,8 +88,15 @@ QString AnalyticFactory::commandName(quint16 type) const
 
 
 
+/*!
+ * Make and return a new abstract analytic object of the given type.
+ *
+ * @param type
+ */
 std::unique_ptr<EAbstractAnalytic> AnalyticFactory::make(quint16 type) const
 {
+   EDEBUG_FUNC(this,type);
+
    switch (type)
    {
    case ImportExpressionMatrixType: return unique_ptr<EAbstractAnalytic>(new ImportExpressionMatrix);
@@ -75,6 +104,7 @@ std::unique_ptr<EAbstractAnalytic> AnalyticFactory::make(quint16 type) const
    case ImportCorrelationMatrixType: return unique_ptr<EAbstractAnalytic>(new ImportCorrelationMatrix);
    case ExportCorrelationMatrixType: return unique_ptr<EAbstractAnalytic>(new ExportCorrelationMatrix);
    case SimilarityType: return unique_ptr<EAbstractAnalytic>(new Similarity);
+   case PowerLawType: return unique_ptr<EAbstractAnalytic>(new PowerLaw);
    case RMTType: return unique_ptr<EAbstractAnalytic>(new RMT);
    case ExtractType: return unique_ptr<EAbstractAnalytic>(new Extract);
    default: return nullptr;
diff --git a/src/core/analyticfactory.h b/src/core/analyticfactory.h
index c576795..0df612f 100644
--- a/src/core/analyticfactory.h
+++ b/src/core/analyticfactory.h
@@ -4,9 +4,17 @@
 
 
 
+/*!
+ * This class implements the ACE analytic factory for producing new analytic
+ * objects and giving basic information about all available analytic types.
+ */
 class AnalyticFactory : public EAbstractAnalyticFactory
 {
 public:
+   /*!
+    * Defines all available analytic types this program implements along with the
+    * total size.
+    */
    enum Type
    {
       ImportExpressionMatrixType = 0
@@ -14,6 +22,7 @@ class AnalyticFactory : public EAbstractAnalyticFactory
       ,ImportCorrelationMatrixType
       ,ExportCorrelationMatrixType
       ,SimilarityType
+      ,PowerLawType
       ,RMTType
       ,ExtractType
       ,Total
diff --git a/src/core/ccmatrix.cpp b/src/core/ccmatrix.cpp
index 872496b..7e5959e 100644
--- a/src/core/ccmatrix.cpp
+++ b/src/core/ccmatrix.cpp
@@ -1,60 +1,20 @@
 #include "ccmatrix.h"
+#include "ccmatrix_model.h"
 
 
 
-using namespace std;
-using namespace Pairwise;
-
-
-
-
-
-
+/*!
+ * Return a qt table model that represents this data object as a table.
+ */
 QAbstractTableModel* CCMatrix::model()
 {
-   return nullptr;
-}
+   EDEBUG_FUNC(this);
 
-
-
-
-
-
-QVariant CCMatrix::headerData(int section, Qt::Orientation orientation, int role) const
-{
-   // orientation is not used
-   Q_UNUSED(orientation);
-
-   // if role is not display return nothing
-   if ( role != Qt::DisplayRole )
-   {
-      return QVariant();
-   }
-
-   // get genes metadata and make sure it is an array
-   const EMetadata& genes {geneNames()};
-   if ( genes.isArray() )
+   if ( !_model )
    {
-      // make sure section is within limits of gene name array
-      if ( section >= 0 && section < genes.toArray().size() )
-      {
-         // return gene name
-         return genes.toArray().at(section).toString();
-      }
+      _model = new Model(this);
    }
-
-   // no gene found return nothing
-   return QVariant();
-}
-
-
-
-
-
-
-int CCMatrix::rowCount(const QModelIndex&) const
-{
-   return geneSize();
+   return _model;
 }
 
 
@@ -62,57 +22,24 @@ int CCMatrix::rowCount(const QModelIndex&) const
 
 
 
-int CCMatrix::columnCount(const QModelIndex&) const
+/*!
+ * Initialize this cluster matrix with a list of gene names, the max cluster
+ * size, and a list of sample names.
+ *
+ * @param geneNames
+ * @param maxClusterSize
+ * @param sampleNames
+ */
+void CCMatrix::initialize(const EMetaArray& geneNames, int maxClusterSize, const EMetaArray& sampleNames)
 {
-   return geneSize();
-}
-
-
-
-
-
+   EDEBUG_FUNC(this,&geneNames,maxClusterSize,&sampleNames);
 
-QVariant CCMatrix::data(const QModelIndex &index, int role) const
-{
-   // if role is not display return nothing
-   if ( role != Qt::DisplayRole )
-   {
-      return QVariant();
-   }
-
-   // if row and column are equal return empty string
-   if ( index.row() == index.column() )
-   {
-      return "";
-   }
-
-   // get constant pair and read in values
-   const Pair pair(this);
-   int x {index.row()};
-   int y {index.column()};
-   if ( y > x )
-   {
-      swap(x,y);
-   }
-   pair.read({x,y});
-
-   // Return value of pair as a string
-   return pair.toString();
-}
-
-
-
-
-
-
-void CCMatrix::initialize(const EMetadata &geneNames, int maxClusterSize, const EMetadata &sampleNames)
-{
-   // make sure sample names is an array and is not empty
-   if ( !sampleNames.isArray() || sampleNames.toArray().isEmpty() )
+   // make sure sample names is not empty
+   if ( sampleNames.isEmpty() )
    {
       E_MAKE_EXCEPTION(e);
       e.setTitle(tr("Domain Error"));
-      e.setDetails(tr("Sample names metadata is not an array or is empty."));
+      e.setDetails(tr("Sample names metadata is empty."));
       throw e;
    }
 
@@ -122,8 +49,8 @@ void CCMatrix::initialize(const EMetadata &geneNames, int maxClusterSize, const
    setMeta(metaObject);
 
    // save sample size and initialize base class
-   _sampleSize = sampleNames.toArray().size();
-   Matrix::initialize(geneNames, maxClusterSize, (_sampleSize + 1) / 2 * sizeof(qint8), DATA_OFFSET);
+   _sampleSize = sampleNames.size();
+   Matrix::initialize(geneNames, maxClusterSize, (_sampleSize + 1) / 2 * sizeof(qint8), SUBHEADER_SIZE);
 }
 
 
@@ -131,141 +58,12 @@ void CCMatrix::initialize(const EMetadata &geneNames, int maxClusterSize, const
 
 
 
-EMetadata CCMatrix::sampleNames() const
+/*!
+ * Return the list of correlation names in this correlation matrix.
+ */
+EMetaArray CCMatrix::sampleNames() const
 {
-   return meta().toObject().at("samples");
-}
-
-
-
-
-
-
-void CCMatrix::Pair::addCluster(int amount) const
-{
-   // keep adding a new list of sample masks for given amount
-   while ( amount-- > 0 )
-   {
-      _sampleMasks.append(QVector<qint8>(_cMatrix->_sampleSize, 0));
-   }
-}
-
-
-
-
-
-
-QString CCMatrix::Pair::toString() const
-{
-   // if there are no clusters return empty string
-   if ( _sampleMasks.isEmpty() )
-   {
-      return "";
-   }
-
-   // initialize list of strings and iterate through all clusters
-   QStringList ret;
-   for (const auto& sampleMask : _sampleMasks)
-   {
-      // initialize list of strings for sample mask and iterate through each sample
-      QString clusterString("(");
-      for (const auto& sample : sampleMask)
-      {
-         // add new sample token as hexadecimal allowing 16 different possible values
-         switch (sample)
-         {
-         case 0:
-         case 1:
-         case 2:
-         case 3:
-         case 4:
-         case 5:
-         case 6:
-         case 7:
-         case 8:
-         case 9:
-            clusterString.append(QString::number(sample));
-            break;
-         case 10:
-            clusterString.append("A");
-            break;
-         case 11:
-            clusterString.append("B");
-            break;
-         case 12:
-            clusterString.append("C");
-            break;
-         case 13:
-            clusterString.append("D");
-            break;
-         case 14:
-            clusterString.append("E");
-            break;
-         case 15:
-            clusterString.append("F");
-            break;
-         }
-      }
-
-      // join all cluster string into one string
-      ret << clusterString.append(')');
-   }
-
-   // join all clusters and return as string
-   return ret.join(',');
-}
-
-
-
+   EDEBUG_FUNC(this);
 
-
-
-void CCMatrix::Pair::writeCluster(EDataStream &stream, int cluster)
-{
-   // make sure cluster value is within range
-   if ( cluster >= 0 && cluster < _sampleMasks.size() )
-   {
-      // write each sample to output stream
-      auto& samples {_sampleMasks.at(cluster)};
-
-      for ( int i = 0; i < samples.size(); i += 2 )
-      {
-         qint8 value {(qint8)(samples[i] & 0x0F)};
-
-         if ( i + 1 < samples.size() )
-         {
-            value |= (samples[i + 1] << 4);
-         }
-
-         stream << value;
-      }
-   }
-}
-
-
-
-
-
-
-void CCMatrix::Pair::readCluster(const EDataStream &stream, int cluster) const
-{
-   // make sure cluster value is within range
-   if ( cluster >= 0 && cluster < _sampleMasks.size() )
-   {
-      // read each sample from input stream
-      auto& samples {_sampleMasks[cluster]};
-
-      for ( int i = 0; i < samples.size(); i += 2 )
-      {
-         qint8 value;
-         stream >> value;
-
-         samples[i] = value & 0x0F;
-
-         if ( i + 1 < samples.size() )
-         {
-            samples[i + 1] = (value >> 4) & 0x0F;
-         }
-      }
-   }
+   return meta().toObject().at("samples").toArray();
 }
diff --git a/src/core/ccmatrix.h b/src/core/ccmatrix.h
index a19109d..d9d9e69 100644
--- a/src/core/ccmatrix.h
+++ b/src/core/ccmatrix.h
@@ -4,52 +4,50 @@
 
 
 
+/*!
+ * This class implements the cluster matrix data object. A cluster matrix is a
+ * pairwise matrix where each pair-cluster element is a sample mask denoting
+ * whether a sample belongs in the cluster. The matrix data can be accessed
+ * using the pairwise iterator for this class.
+ */
 class CCMatrix : public Pairwise::Matrix
 {
    Q_OBJECT
 public:
    class Pair;
+public:
    virtual QAbstractTableModel* model() override final;
-   QVariant headerData(int section, Qt::Orientation orientation, int role) const;
-   int rowCount(const QModelIndex&) const;
-   int columnCount(const QModelIndex&) const;
-   QVariant data(const QModelIndex& index, int role) const;
-   void initialize(const EMetadata& geneNames, int maxClusterSize, const EMetadata& sampleNames);
-   EMetadata sampleNames() const;
+public:
+   void initialize(const EMetaArray& geneNames, int maxClusterSize, const EMetaArray& sampleNames);
+   EMetaArray sampleNames() const;
+   /*!
+    * Return the number of samples in the cluster matrix.
+    */
    int sampleSize() const { return _sampleSize; }
 private:
+   class Model;
+private:
+   /*!
+    * Write the sub-header to the data object file.
+    */
    virtual void writeHeader() { stream() << _sampleSize; }
+   /*!
+    * Read the sub-header from the data object file.
+    */
    virtual void readHeader() { stream() >> _sampleSize; }
-   static const int DATA_OFFSET {4};
+   /*!
+    * The size (in bytes) of the sub-header. The sub-header consists of the
+    * sample size.
+    */
+   constexpr static int SUBHEADER_SIZE {4};
+   /*!
+    * The number of samples in each sample mask.
+    */
    qint32 _sampleSize {0};
-};
-
-
-
-class CCMatrix::Pair : public Pairwise::Matrix::Pair
-{
-public:
-   Pair(CCMatrix* matrix):
-      Matrix::Pair(matrix),
-      _cMatrix(matrix)
-      {}
-   Pair(const CCMatrix* matrix):
-      Matrix::Pair(matrix),
-      _cMatrix(matrix)
-      {}
-   Pair() = default;
-   virtual void clearClusters() const { _sampleMasks.clear(); }
-   virtual void addCluster(int amount = 1) const;
-   virtual int clusterSize() const { return _sampleMasks.size(); }
-   virtual bool isEmpty() const { return _sampleMasks.isEmpty(); }
-   QString toString() const;
-   const qint8& at(int cluster, int sample) const { return _sampleMasks.at(cluster).at(sample); }
-   qint8& at(int cluster, int sample) { return _sampleMasks[cluster][sample]; }
-private:
-   virtual void writeCluster(EDataStream& stream, int cluster);
-   virtual void readCluster(const EDataStream& stream, int cluster) const;
-   mutable QVector<QVector<qint8>> _sampleMasks;
-   const CCMatrix* _cMatrix;
+   /*!
+    * Pointer to a qt table model for this class.
+    */
+  Model* _model {nullptr};
 };
 
 
diff --git a/src/core/ccmatrix_model.cpp b/src/core/ccmatrix_model.cpp
new file mode 100644
index 0000000..866dfc2
--- /dev/null
+++ b/src/core/ccmatrix_model.cpp
@@ -0,0 +1,134 @@
+#include "ccmatrix_model.h"
+#include "ccmatrix_pair.h"
+
+
+
+using namespace std;
+
+
+
+
+
+
+/*!
+ * Construct a table model for a cluster matrix.
+ *
+ * @param matrix
+ */
+CCMatrix::Model::Model(CCMatrix* matrix):
+   _matrix(matrix)
+{
+   EDEBUG_FUNC(this,matrix);
+
+   setParent(matrix);
+}
+
+
+
+
+
+
+/*!
+ * Return a header name for the table model using a given index.
+ *
+ * @param section
+ * @param orientation
+ * @param role
+ */
+QVariant CCMatrix::Model::headerData(int section, Qt::Orientation orientation, int role) const
+{
+   EDEBUG_FUNC(this,section,orientation,role);
+
+   // orientation is not used
+   Q_UNUSED(orientation);
+
+   // if role is not display return nothing
+   if ( role != Qt::DisplayRole )
+   {
+      return QVariant();
+   }
+
+   // get gene names
+   EMetaArray geneNames {_matrix->geneNames()};
+
+   // make sure section is within limits of gene name array
+   if ( section >= 0 && section < geneNames.size() )
+   {
+     // return gene name
+     return geneNames.at(section).toString();
+   }
+
+   // no gene found return nothing
+   return QVariant();
+}
+
+
+
+
+
+
+/*!
+ * Return the number of rows in the table model.
+ */
+int CCMatrix::Model::rowCount(const QModelIndex&) const
+{
+   EDEBUG_FUNC(this);
+
+   return _matrix->geneSize();
+}
+
+
+
+
+
+
+/*!
+ * Return the number of columns in the table model.
+ */
+int CCMatrix::Model::columnCount(const QModelIndex&) const
+{
+   EDEBUG_FUNC(this);
+
+   return _matrix->geneSize();
+}
+
+
+
+
+
+
+/*!
+ * Return a data element in the table model using the given index.
+ *
+ * @param index
+ * @param role
+ */
+QVariant CCMatrix::Model::data(const QModelIndex& index, int role) const
+{
+   EDEBUG_FUNC(this,&index,role);
+
+   // if role is not display return nothing
+   if ( role != Qt::DisplayRole )
+   {
+      return QVariant();
+   }
+
+   // if row and column are equal return empty string
+   if ( index.row() == index.column() )
+   {
+      return "";
+   }
+
+   // get constant pair and read in values
+   const Pair pair(_matrix);
+   int x {index.row()};
+   int y {index.column()};
+   if ( y > x )
+   {
+      swap(x,y);
+   }
+   pair.read({x,y});
+
+   // Return value of pair as a string
+   return pair.toString();
+}
diff --git a/src/core/ccmatrix_model.h b/src/core/ccmatrix_model.h
new file mode 100644
index 0000000..419344d
--- /dev/null
+++ b/src/core/ccmatrix_model.h
@@ -0,0 +1,28 @@
+#ifndef CCMATRIX_MODEL_H
+#define CCMATRIX_MODEL_H
+#include "ccmatrix.h"
+
+
+
+/*!
+ * This class implements the qt table model for the cluster matrix
+ * data object, which represents the cluster matrix as a table.
+ */
+class CCMatrix::Model : public QAbstractTableModel
+{
+public:
+   Model(CCMatrix* matrix);
+   virtual QVariant headerData(int section, Qt::Orientation orientation, int role) const override final;
+   virtual int rowCount(const QModelIndex& parent) const override final;
+   virtual int columnCount(const QModelIndex& parent) const override final;
+   virtual QVariant data(const QModelIndex& index, int role) const override final;
+private:
+   /*!
+    * Pointer to the data object for this table model.
+    */
+   CCMatrix* _matrix;
+};
+
+
+
+#endif
diff --git a/src/core/ccmatrix_pair.cpp b/src/core/ccmatrix_pair.cpp
new file mode 100644
index 0000000..98ab46a
--- /dev/null
+++ b/src/core/ccmatrix_pair.cpp
@@ -0,0 +1,161 @@
+#include "ccmatrix_pair.h"
+
+
+
+/*!
+ * Add one or more clusters to this pair.
+ *
+ * @param amount
+ */
+void CCMatrix::Pair::addCluster(int amount) const
+{
+   EDEBUG_FUNC(this,amount);
+
+   // keep adding a new list of sample masks for given amount
+   while ( amount-- > 0 )
+   {
+      _sampleMasks.append(QVector<qint8>(_cMatrix->_sampleSize, 0));
+   }
+}
+
+
+
+
+
+
+/*!
+ * Return the string representation of this pair, which is a comma-delimited
+ * string of each sample mask in the pair.
+ */
+QString CCMatrix::Pair::toString() const
+{
+   EDEBUG_FUNC(this);
+
+   // if there are no clusters return empty string
+   if ( _sampleMasks.isEmpty() )
+   {
+      return "";
+   }
+
+   // initialize list of strings and iterate through all clusters
+   QStringList ret;
+   for (const auto& sampleMask : _sampleMasks)
+   {
+      // initialize list of strings for sample mask and iterate through each sample
+      QString clusterString("(");
+      for (const auto& sample : sampleMask)
+      {
+         // add new sample token as hexadecimal allowing 16 different possible values
+         switch (sample)
+         {
+         case 0:
+         case 1:
+         case 2:
+         case 3:
+         case 4:
+         case 5:
+         case 6:
+         case 7:
+         case 8:
+         case 9:
+            clusterString.append(QString::number(sample));
+            break;
+         case 10:
+            clusterString.append("A");
+            break;
+         case 11:
+            clusterString.append("B");
+            break;
+         case 12:
+            clusterString.append("C");
+            break;
+         case 13:
+            clusterString.append("D");
+            break;
+         case 14:
+            clusterString.append("E");
+            break;
+         case 15:
+            clusterString.append("F");
+            break;
+         }
+      }
+
+      // join all cluster string into one string
+      ret << clusterString.append(')');
+   }
+
+   // join all clusters and return as string
+   return ret.join(',');
+}
+
+
+
+
+
+
+/*!
+ * Write a cluster in the iterator's pairwise data to the data object file.
+ *
+ * @param stream
+ * @param cluster
+ */
+void CCMatrix::Pair::writeCluster(EDataStream& stream, int cluster)
+{
+   EDEBUG_FUNC(this,&stream,cluster);
+
+   // make sure cluster value is within range
+   if ( cluster >= 0 && cluster < _sampleMasks.size() )
+   {
+      // write each sample to output stream
+      auto& samples {_sampleMasks.at(cluster)};
+
+      for ( int i = 0; i < samples.size(); i += 2 )
+      {
+         qint8 value {(qint8)(samples[i] & 0x0F)};
+
+         if ( i + 1 < samples.size() )
+         {
+            value |= (samples[i + 1] << 4);
+         }
+
+         stream << value;
+      }
+   }
+}
+
+
+
+
+
+
+/*!
+ * Read a cluster from the data object file into memory.
+ *
+ * @param stream
+ * @param cluster
+ */
+void CCMatrix::Pair::readCluster(const EDataStream& stream, int cluster) const
+{
+   EDEBUG_FUNC(this,&stream,cluster);
+
+   // make sure cluster value is within range
+   if ( cluster >= 0 && cluster < _sampleMasks.size() )
+   {
+      // read each sample from input stream
+      auto& samples {_sampleMasks[cluster]};
+
+      for ( int i = 0; i < samples.size(); i += 2 )
+      {
+         qint8 value;
+         stream >> value;
+
+         samples[i] = value & 0x0F;
+
+         if ( i + 1 < samples.size() )
+         {
+            samples[i + 1] = (value >> 4) & 0x0F;
+         }
+      }
+   }
+}
diff --git a/src/core/ccmatrix_pair.h b/src/core/ccmatrix_pair.h
new file mode 100644
index 0000000..53e6374
--- /dev/null
+++ b/src/core/ccmatrix_pair.h
@@ -0,0 +1,47 @@
+#ifndef CCMATRIX_PAIR_H
+#define CCMATRIX_PAIR_H
+#include "ccmatrix.h"
+#include "pairwise_matrix_pair.h"
+
+
+
+/*!
+ * This class implements the pairwise iterator for the cluster matrix data
+ * object. This class extends the behavior of the base pairwise iterator to read
+ * and write sample masks.
+ */
+class CCMatrix::Pair : public Pairwise::Matrix::Pair
+{
+public:
+   Pair(CCMatrix* matrix):
+      Matrix::Pair(matrix),
+      _cMatrix(matrix)
+      {}
+   Pair(const CCMatrix* matrix):
+      Matrix::Pair(matrix),
+      _cMatrix(matrix)
+      {}
+   Pair() = default;
+   virtual void clearClusters() const { _sampleMasks.clear(); }
+   virtual void addCluster(int amount = 1) const;
+   virtual int clusterSize() const { return _sampleMasks.size(); }
+   virtual bool isEmpty() const { return _sampleMasks.isEmpty(); }
+   QString toString() const;
+   const qint8& at(int cluster, int sample) const { return _sampleMasks.at(cluster).at(sample); }
+   qint8& at(int cluster, int sample) { return _sampleMasks[cluster][sample]; }
+private:
+   virtual void writeCluster(EDataStream& stream, int cluster);
+   virtual void readCluster(const EDataStream& stream, int cluster) const;
+   /*!
+    * Array of sample masks for the current pair.
+    */
+   mutable QVector<QVector<qint8>> _sampleMasks;
+   /*!
+    * Constant pointer to parent cluster matrix.
+    */
+   const CCMatrix* _cMatrix;
+};
+
+
+
+#endif
diff --git a/src/core/core.pro b/src/core/core.pro
index 8209dd6..a70e023 100644
--- a/src/core/core.pro
+++ b/src/core/core.pro
@@ -1,3 +1,7 @@
+
+# Include common settings
+include (../KINC.pri)
+
 # Basic Settings
 TARGET = kinccore
 TEMPLATE = lib
@@ -6,25 +10,22 @@ CONFIG += staticlib
 # Build settings
 DESTDIR = $$PWD/../../build/libs/
 
-# Qt libraries
-QT += core
-
-# Preprocessor defines
-DEFINES += QT_DEPRECATED_WARNINGS
-
-# Used to ignore useless warnings from OpenCL
-QMAKE_CXXFLAGS += -Wno-ignored-attributes
-
 # Source files
 SOURCES += \
    analyticfactory.cpp \
+   ccmatrix_model.cpp \
+   ccmatrix_pair.cpp \
    ccmatrix.cpp \
+   correlationmatrix_model.cpp \
+   correlationmatrix_pair.cpp \
    correlationmatrix.cpp \
    datafactory.cpp \
    exportcorrelationmatrix_input.cpp \
    exportcorrelationmatrix.cpp \
    exportexpressionmatrix_input.cpp \
    exportexpressionmatrix.cpp \
+   expressionmatrix_gene.cpp \
+   expressionmatrix_model.cpp \
    expressionmatrix.cpp \
    extract_input.cpp \
    extract.cpp \
@@ -32,21 +33,23 @@ SOURCES += \
    importcorrelationmatrix.cpp \
    importexpressionmatrix_input.cpp \
    importexpressionmatrix.cpp \
-   pairwise_clustering.cpp \
-   pairwise_correlation.cpp \
+   pairwise_clusteringmodel.cpp \
+   pairwise_correlationmodel.cpp \
    pairwise_gmm.cpp \
    pairwise_index.cpp \
-   pairwise_kmeans.cpp \
    pairwise_linalg.cpp \
+   pairwise_matrix_pair.cpp \
    pairwise_matrix.cpp \
    pairwise_pearson.cpp \
    pairwise_spearman.cpp \
+   powerlaw_input.cpp \
+   powerlaw.cpp \
    rmt_input.cpp \
    rmt.cpp \
    similarity_input.cpp \
    similarity_opencl_fetchpair.cpp \
    similarity_opencl_gmm.cpp \
-   similarity_opencl_kmeans.cpp \
+   similarity_opencl_outlier.cpp \
    similarity_opencl_pearson.cpp \
    similarity_opencl_spearman.cpp \
    similarity_opencl_worker.cpp \
@@ -59,13 +62,19 @@ SOURCES += \
 # Header files
 HEADERS += \
    analyticfactory.h \
+   ccmatrix_model.h \
+   ccmatrix_pair.h \
    ccmatrix.h \
+   correlationmatrix_model.h \
+   correlationmatrix_pair.h \
    correlationmatrix.h \
    datafactory.h \
    exportcorrelationmatrix_input.h \
    exportcorrelationmatrix.h \
    exportexpressionmatrix_input.h \
    exportexpressionmatrix.h \
+   expressionmatrix_gene.h \
+   expressionmatrix_model.h \
    expressionmatrix.h \
    extract_input.h \
    extract.h \
@@ -73,21 +82,23 @@ HEADERS += \
    importcorrelationmatrix.h \
    importexpressionmatrix_input.h \
    importexpressionmatrix.h \
-   pairwise_clustering.h \
-   pairwise_correlation.h \
+   pairwise_clusteringmodel.h \
+   pairwise_correlationmodel.h \
    pairwise_gmm.h \
    pairwise_index.h \
-   pairwise_kmeans.h \
    pairwise_linalg.h \
+   pairwise_matrix_pair.h \
    pairwise_matrix.h \
    pairwise_pearson.h \
    pairwise_spearman.h \
+   powerlaw_input.h \
+   powerlaw.h \
    rmt_input.h \
    rmt.h \
    similarity_input.h \
    similarity_opencl_fetchpair.h \
    similarity_opencl_gmm.h \
-   similarity_opencl_kmeans.h \
+   similarity_opencl_outlier.h \
    similarity_opencl_pearson.h \
    similarity_opencl_spearman.h \
    similarity_opencl_worker.h \
diff --git a/src/core/correlationmatrix.cpp b/src/core/correlationmatrix.cpp
index 87b7029..3c32ecf 100644
--- a/src/core/correlationmatrix.cpp
+++ b/src/core/correlationmatrix.cpp
@@ -1,83 +1,21 @@
 #include "correlationmatrix.h"
+#include "correlationmatrix_model.h"
+#include "correlationmatrix_pair.h"
 
 
 
-using namespace std;
-using namespace Pairwise;
-
-
-
-
-
-
+/*!
+ * Return a qt table model that represents this data object as a table.
+ */
 QAbstractTableModel* CorrelationMatrix::model()
 {
-   return nullptr;
-}
-
-
-
-
-
-
-QVariant CorrelationMatrix::headerData(int section, Qt::Orientation orientation, int role) const
-{
-   // orientation is not used
-   Q_UNUSED(orientation);
-
-   // if role is not display return nothing
-   if ( role != Qt::DisplayRole )
-   {
-      return QVariant();
-   }
-
-   // get genes metadata and make sure it is an array
-   const EMetadata& genes {geneNames()};
-   if ( genes.isArray() )
-   {
-      // make sure section is within limits of gene name array
-      if ( section >= 0 && section < genes.toArray().size() )
-      {
-         // return gene name
-         return genes.toArray().at(section).toString();
-      }
-   }
-
-   // no gene found return nothing
-   return QVariant();
-}
-
-
-
-
-
-
-QVariant CorrelationMatrix::data(const QModelIndex& index, int role) const
-{
-   // if role is not display return nothing
-   if ( role != Qt::DisplayRole )
-   {
-      return QVariant();
-   }
-
-   // if row and column are equal return one
-   if ( index.row() == index.column() )
-   {
-      return "1";
-   }
+   EDEBUG_FUNC(this);
 
-   // get constant pair and read in values
-   const Pair pair(this);
-   int x {index.row()};
-   int y {index.column()};
-   if ( y > x )
+   if ( !_model )
    {
-      swap(x,y);
+      _model = new Model(this);
    }
-   pair.read({x,y});
-
-   // Return value of pair as a string
-   return pair.toString();
+   return _model;
 }
 
 
@@ -85,34 +23,24 @@ QVariant CorrelationMatrix::data(const QModelIndex& index, int role) const
 
 
 
-int CorrelationMatrix::rowCount(const QModelIndex&) const
+/*!
+ * Initialize this correlation matrix with a list of gene names, the max cluster
+ * size, and a list of correlation names.
+ *
+ * @param geneNames
+ * @param maxClusterSize
+ * @param correlationNames
+ */
+void CorrelationMatrix::initialize(const EMetaArray& geneNames, int maxClusterSize, const EMetaArray& correlationNames)
 {
-   return geneSize();
-}
-
-
-
+   EDEBUG_FUNC(this,&geneNames,maxClusterSize,&correlationNames);
 
-
-
-int CorrelationMatrix::columnCount(const QModelIndex&) const
-{
-   return geneSize();
-}
-
-
-
-
-
-
-void CorrelationMatrix::initialize(const EMetadata &geneNames, int maxClusterSize, const EMetadata &correlationNames)
-{
-   // make sure correlation names is an array and is not empty
-   if ( !correlationNames.isArray() || correlationNames.toArray().isEmpty() )
+   // make sure correlation names is not empty
+   if ( correlationNames.isEmpty() )
    {
       E_MAKE_EXCEPTION(e);
       e.setTitle(tr("Domain Error"));
-      e.setDetails(tr("Correlation names metadata is not an array or is empty."));
+      e.setDetails(tr("Correlation names metadata is empty."));
       throw e;
    }
 
@@ -122,8 +50,8 @@ void CorrelationMatrix::initialize(const EMetadata &geneNames, int maxClusterSiz
    setMeta(metaObject);
 
    // save correlation size and initialize base class
-   _correlationSize = correlationNames.toArray().size();
-   Matrix::initialize(geneNames, maxClusterSize, _correlationSize * sizeof(float), DATA_OFFSET);
+   _correlationSize = correlationNames.size();
+   Matrix::initialize(geneNames, maxClusterSize, _correlationSize * sizeof(float), SUBHEADER_SIZE);
 }
 
 
@@ -131,9 +59,14 @@ void CorrelationMatrix::initialize(const EMetadata &geneNames, int maxClusterSiz
 
 
 
-EMetadata CorrelationMatrix::correlationNames() const
+/*!
+ * Return the list of correlation names in this correlation matrix.
+ */
+EMetaArray CorrelationMatrix::correlationNames() const
 {
-   return meta().toObject().at("correlations");
+   EDEBUG_FUNC(this);
+
+   return meta().toObject().at("correlations").toArray();
 }
 
 
@@ -141,16 +74,16 @@ EMetadata CorrelationMatrix::correlationNames() const
 
 
 
-QVector<float> CorrelationMatrix::dumpRawData() const
+/*!
+ * Return a list of correlation pairs in raw form.
+ */
+QVector<CorrelationMatrix::RawPair> CorrelationMatrix::dumpRawData() const
 {
-   // if there are no genes do nothing
-   if ( geneSize() == 0 )
-   {
-      return QVector<float>();
-   }
+   EDEBUG_FUNC(this);
 
-   // create new correlation matrix
-   QVector<float> data(geneSize() * geneSize() * maxClusterSize());
+   // create list of raw pairs
+   QVector<RawPair> pairs;
+   pairs.reserve(size());
 
    // iterate through all pairs
    Pair pair(this);
@@ -160,103 +93,18 @@ QVector<float> CorrelationMatrix::dumpRawData() const
       // read in next pair
       pair.readNext();
 
-      // load cluster data
-      int i = pair.index().getX();
-      int j = pair.index().getY();
+      // copy pair to raw list
+      RawPair rawPair;
+      rawPair.index = pair.index();
+      rawPair.correlations.resize(pair.clusterSize());
 
       for ( int k = 0; k < pair.clusterSize(); ++k )
       {
-         float correlation = pair.at(k, 0);
-
-         data[i * geneSize() * maxClusterSize() + j * maxClusterSize() + k] = correlation;
-         data[j * geneSize() * maxClusterSize() + i * maxClusterSize() + k] = correlation;
+         rawPair.correlations[k] = pair.at(k, 0);
       }
+      
+      pairs.append(rawPair);
    }
 
-   return data;
-}
-
-
-
-
-
-
-void CorrelationMatrix::Pair::addCluster(int amount) const
-{
-   // keep adding a new list of floats for given amount
-   while ( amount-- > 0 )
-   {
-      _correlations.append(QVector<float>(_cMatrix->_correlationSize, NAN));
-   }
-}
-
-
-
-
-
-
-QString CorrelationMatrix::Pair::toString() const
-{
-   // if there are no correlations simply return null
-   if ( _correlations.isEmpty() )
-   {
-      return tr("");
-   }
-
-   // initialize list of strings and iterate through all clusters
-   QStringList ret;
-   for (const auto& cluster : _correlations)
-   {
-      // initialize list of strings for cluster and iterate through each correlation
-      QStringList clusterStrings;
-      for (const auto& correlation : cluster)
-      {
-         // add correlation value as string
-         clusterStrings << QString::number(correlation);
-      }
-
-      // join all cluster strings into one string
-      ret << clusterStrings.join(',');
-   }
-
-   // join all clusters and return as string
-   return ret.join(',');
-}
-
-
-
-
-
-
-void CorrelationMatrix::Pair::writeCluster(EDataStream& stream, int cluster)
-{
-   // make sure cluster value is within range
-   if ( cluster >= 0 && cluster < _correlations.size() )
-   {
-      // write correlations per cluster to output stream
-      for (const auto& correlation : _correlations.at(cluster))
-      {
-         stream << correlation;
-      }
-   }
-}
-
-
-
-
-
-
-void CorrelationMatrix::Pair::readCluster(const EDataStream& stream, int cluster) const
-{
-   // make sure cluster value is within range
-   if ( cluster >= 0 && cluster < _correlations.size() )
-   {
-      // read correlations per cluster from input stream
-      for (int i = 0; i < _cMatrix->_correlationSize ;++i)
-      {
-         float value;
-         stream >> value;
-         _correlations[cluster][i] = value;
-      }
-   }
+   return pairs;
 }
diff --git a/src/core/correlationmatrix.h b/src/core/correlationmatrix.h
index c8a5c6c..718dec3 100644
--- a/src/core/correlationmatrix.h
+++ b/src/core/correlationmatrix.h
@@ -4,53 +4,52 @@
 
 
 
+/*!
+ * This class implements the correlation matrix data object. A correlation matrix
+ * is a pairwise matrix where each pair-cluster element is a correlation value. The
+ * matrix data can be accessed using the pairwise iterator for this class.
+ */
 class CorrelationMatrix : public Pairwise::Matrix
 {
    Q_OBJECT
 public:
    class Pair;
+public:
+   struct RawPair
+   {
+      Pairwise::Index index;
+      QVector<float> correlations;
+   };
+public:
    virtual QAbstractTableModel* model() override final;
-   QVariant headerData(int section, Qt::Orientation orientation, int role) const;
-   int rowCount(const QModelIndex&) const;
-   int columnCount(const QModelIndex&) const;
-   QVariant data(const QModelIndex& index, int role) const;
-   void initialize(const EMetadata& geneNames, int maxClusterSize, const EMetadata& correlationNames);
-   EMetadata correlationNames() const;
-   QVector<float> dumpRawData() const;
+public:
+   void initialize(const EMetaArray& geneNames, int maxClusterSize, const EMetaArray& correlationNames);
+   EMetaArray correlationNames() const;
+   QVector<RawPair> dumpRawData() const;
 private:
+   class Model;
+private:
+   /*!
+    * Write the sub-header to the data object file.
+    */
    virtual void writeHeader() { stream() << _correlationSize; }
+   /*!
+    * Read the sub-header from the data object file.
+    */
    virtual void readHeader() { stream() >> _correlationSize; }
-   static const int DATA_OFFSET {1};
+   /*!
+    * The size (in bytes) of the sub-header. The sub-header consists of the
+    * correlation size.
+    */
+   constexpr static int SUBHEADER_SIZE {1};
+   /*!
+    * The number of correlations in each pair-cluster.
+    */
    qint8 _correlationSize {0};
-};
-
-
-
-class CorrelationMatrix::Pair : public Pairwise::Matrix::Pair
-{
-public:
-   Pair(CorrelationMatrix* matrix):
-      Matrix::Pair(matrix),
-      _cMatrix(matrix)
-      {}
-   Pair(const CorrelationMatrix* matrix):
-      Matrix::Pair(matrix),
-      _cMatrix(matrix)
-      {}
-   Pair() = default;
-   virtual void clearClusters() const { _correlations.clear(); }
-   virtual void addCluster(int amount = 1) const;
-   virtual int clusterSize() const { return _correlations.size(); }
-   virtual bool isEmpty() const { return _correlations.isEmpty(); }
-   QString toString() const;
-   const float& at(int cluster, int correlation) const
-      { return _correlations.at(cluster).at(correlation); }
-   float& at(int cluster, int correlation) { return _correlations[cluster][correlation]; }
-private:
-   virtual void writeCluster(EDataStream& stream, int cluster);
-   virtual void readCluster(const EDataStream& stream, int cluster) const;
-   mutable QVector<QVector<float>> _correlations;
-   const CorrelationMatrix* _cMatrix;
+   /*!
+    * Pointer to a qt table model for this class.
+    */
+  Model* _model {nullptr};
 };
 
 
diff --git a/src/core/correlationmatrix_model.cpp b/src/core/correlationmatrix_model.cpp
new file mode 100644
index 0000000..696ff6d
--- /dev/null
+++ b/src/core/correlationmatrix_model.cpp
@@ -0,0 +1,134 @@
+#include "correlationmatrix_model.h"
+#include "correlationmatrix_pair.h"
+
+
+
+using namespace std;
+
+
+
+
+
+
+/*!
+ * Construct a table model for a correlation matrix.
+ *
+ * @param matrix
+ */
+CorrelationMatrix::Model::Model(CorrelationMatrix* matrix):
+   _matrix(matrix)
+{
+   EDEBUG_FUNC(this,matrix);
+
+   setParent(matrix);
+}
+
+
+
+
+
+
+/*!
+ * Return a header name for the table model using a given index.
+ *
+ * @param section
+ * @param orientation
+ * @param role
+ */
+QVariant CorrelationMatrix::Model::headerData(int section, Qt::Orientation orientation, int role) const
+{
+   EDEBUG_FUNC(this,section,orientation,role);
+
+   // orientation is not used
+   Q_UNUSED(orientation);
+
+   // if role is not display return nothing
+   if ( role != Qt::DisplayRole )
+   {
+      return QVariant();
+   }
+
+   // get gene names
+   EMetaArray geneNames {_matrix->geneNames()};
+
+   // make sure section is within limits of gene name array
+   if ( section >= 0 && section < geneNames.size() )
+   {
+     // return gene name
+     return geneNames.at(section).toString();
+   }
+
+   // no gene found return nothing
+   return QVariant();
+}
+
+
+
+
+
+
+/*!
+ * Return the number of rows in the table model.
+ */
+int CorrelationMatrix::Model::rowCount(const QModelIndex&) const
+{
+   EDEBUG_FUNC(this);
+
+   return _matrix->geneSize();
+}
+
+
+
+
+
+
+/*!
+ * Return the number of columns in the table model.
+ */
+int CorrelationMatrix::Model::columnCount(const QModelIndex&) const
+{
+   EDEBUG_FUNC(this);
+
+   return _matrix->geneSize();
+}
+
+
+
+
+
+
+/*!
+ * Return a data element in the table model using the given index.
+ *
+ * @param index
+ * @param role
+ */
+QVariant CorrelationMatrix::Model::data(const QModelIndex& index, int role) const
+{
+   EDEBUG_FUNC(this,&index,role);
+
+   // if role is not display return nothing
+   if ( role != Qt::DisplayRole )
+   {
+      return QVariant();
+   }
+
+   // if row and column are equal return empty string
+   if ( index.row() == index.column() )
+   {
+      return "";
+   }
+
+   // get constant pair and read in values
+   const Pair pair(_matrix);
+   int x {index.row()};
+   int y {index.column()};
+   if ( y > x )
+   {
+      swap(x,y);
+   }
+   pair.read({x,y});
+
+   // Return value of pair as a string
+   return pair.toString();
+}
diff --git a/src/core/correlationmatrix_model.h b/src/core/correlationmatrix_model.h
new file mode 100644
index 0000000..4318c1d
--- /dev/null
+++ b/src/core/correlationmatrix_model.h
@@ -0,0 +1,28 @@
+#ifndef CORRELATIONMATRIX_MODEL_H
+#define CORRELATIONMATRIX_MODEL_H
+#include "correlationmatrix.h"
+
+
+
+/*!
+ * This class implements the qt table model for the correlation matrix
+ * data object, which represents the correlation matrix as a table.
+ */
+class CorrelationMatrix::Model : public QAbstractTableModel
+{
+public:
+   Model(CorrelationMatrix* matrix);
+   virtual QVariant headerData(int section, Qt::Orientation orientation, int role) const override final;
+   virtual int rowCount(const QModelIndex& parent) const override final;
+   virtual int columnCount(const QModelIndex& parent) const override final;
+   virtual QVariant data(const QModelIndex& index, int role) const override final;
+private:
+   /*!
+    * Pointer to the data object for this table model.
+    */
+   CorrelationMatrix* _matrix;
+};
+
+
+
+#endif
diff --git a/src/core/correlationmatrix_pair.cpp b/src/core/correlationmatrix_pair.cpp
new file mode 100644
index 0000000..7c926f1
--- /dev/null
+++ b/src/core/correlationmatrix_pair.cpp
@@ -0,0 +1,112 @@
+#include "correlationmatrix_pair.h"
+
+
+
+/*!
+ * Add one or more clusters to this pair.
+ *
+ * @param amount
+ */
+void CorrelationMatrix::Pair::addCluster(int amount) const
+{
+   EDEBUG_FUNC(this,amount);
+
+   // keep adding a new list of floats for given amount
+   while ( amount-- > 0 )
+   {
+      _correlations.append(QVector<float>(_cMatrix->_correlationSize, NAN));
+   }
+}
+
+
+
+
+
+
+/*!
+ * Return the string representation of this pair, which is a comma-delimited
+ * string of each correlation in the pair.
+ */
+QString CorrelationMatrix::Pair::toString() const
+{
+   EDEBUG_FUNC(this);
+
+   // if there are no correlations simply return null
+   if ( _correlations.isEmpty() )
+   {
+      return tr("");
+   }
+
+   // initialize list of strings and iterate through all clusters
+   QStringList ret;
+   for (const auto& cluster : _correlations)
+   {
+      // initialize list of strings for cluster and iterate through each correlation
+      QStringList clusterStrings;
+      for (const auto& correlation : cluster)
+      {
+         // add correlation value as string
+         clusterStrings << QString::number(correlation);
+      }
+
+      // join all cluster strings into one string
+      ret << clusterStrings.join(',');
+   }
+
+   // join all clusters and return as string
+   return ret.join(',');
+}
+
+
+
+
+
+
+/*!
+ * Write a cluster in the iterator's pairwise data to the data object file.
+ *
+ * @param stream
+ * @param cluster
+ */
+void CorrelationMatrix::Pair::writeCluster(EDataStream& stream, int cluster)
+{
+   EDEBUG_FUNC(this,&stream,cluster);
+
+   // make sure cluster value is within range
+   if ( cluster >= 0 && cluster < _correlations.size() )
+   {
+      // write correlations per cluster to output stream
+      for (const auto& correlation : _correlations.at(cluster))
+      {
+         stream << correlation;
+      }
+   }
+}
+
+
+
+
+
+
+/*!
+ * Read a cluster from the data object file into memory.
+ *
+ * @param stream
+ * @param cluster
+ */
+void CorrelationMatrix::Pair::readCluster(const EDataStream& stream, int cluster) const
+{
+   EDEBUG_FUNC(this,&stream,cluster);
+
+   // make sure cluster value is within range
+   if ( cluster >= 0 && cluster < _correlations.size() )
+   {
+      // read correlations per cluster from input stream
+      for (int i = 0; i < _cMatrix->_correlationSize ;++i)
+      {
+         float value;
+         stream >> value;
+         _correlations[cluster][i] = value;
+      }
+   }
+}
diff --git a/src/core/correlationmatrix_pair.h b/src/core/correlationmatrix_pair.h
new file mode 100644
index 0000000..85bce3c
--- /dev/null
+++ b/src/core/correlationmatrix_pair.h
@@ -0,0 +1,48 @@
+#ifndef CORRELATIONMATRIX_PAIR_H
+#define CORRELATIONMATRIX_PAIR_H
+#include "correlationmatrix.h"
+#include "pairwise_matrix_pair.h"
+
+
+
+/*!
+ * This class implements the pairwise iterator for the correlation matrix data
+ * object. This class extends the behavior of the base pairwise iterator to read
+ * and write correlations.
+ */
+class CorrelationMatrix::Pair : public Pairwise::Matrix::Pair
+{
+public:
+   Pair(CorrelationMatrix* matrix):
+      Matrix::Pair(matrix),
+      _cMatrix(matrix)
+      {}
+   Pair(const CorrelationMatrix* matrix):
+      Matrix::Pair(matrix),
+      _cMatrix(matrix)
+      {}
+   Pair() = default;
+   virtual void clearClusters() const { _correlations.clear(); }
+   virtual void addCluster(int amount = 1) const;
+   virtual int clusterSize() const { return _correlations.size(); }
+   virtual bool isEmpty() const { return _correlations.isEmpty(); }
+   QString toString() const;
+   const float& at(int cluster, int correlation) const
+      { return _correlations.at(cluster).at(correlation); }
+   float& at(int cluster, int correlation) { return _correlations[cluster][correlation]; }
+private:
+   virtual void writeCluster(EDataStream& stream, int cluster);
+   virtual void readCluster(const EDataStream& stream, int cluster) const;
+   /*!
+    * Array of correlations for the current pair.
+    */
+   mutable QVector<QVector<float>> _correlations;
+   /*!
+    * Constant pointer to parent correlation matrix.
+    */
+   const CorrelationMatrix* _cMatrix;
+};
+
+
+
+#endif
diff --git a/src/core/datafactory.cpp b/src/core/datafactory.cpp
index c5812fa..60506f0 100644
--- a/src/core/datafactory.cpp
+++ b/src/core/datafactory.cpp
@@ -12,8 +12,13 @@ using namespace std;
 
 
 
+/*!
+ * Return the total number of data types this program implements.
+ */
 quint16 DataFactory::size() const
 {
+   EDEBUG_FUNC(this);
+
    return Total;
 }
 
@@ -22,8 +27,15 @@ quint16 DataFactory::size() const
 
 
 
+/*!
+ * Return the display name for the given data type.
+ *
+ * @param type
+ */
 QString DataFactory::name(quint16 type) const
 {
+   EDEBUG_FUNC(this,type);
+
    switch (type)
    {
    case ExpressionMatrixType: return "Expression Matrix";
@@ -38,8 +50,15 @@ QString DataFactory::name(quint16 type) const
 
 
 
+/*!
+ * Return the file extension for the given data type as a string.
+ *
+ * @param type
+ */
 QString DataFactory::fileExtension(quint16 type) const
 {
+   EDEBUG_FUNC(this,type);
+
    switch (type)
    {
    case ExpressionMatrixType: return "emx";
@@ -54,8 +73,15 @@ QString DataFactory::fileExtension(quint16 type) const
 
 
 
+/*!
+ * Make and return a new abstract data object of the given type.
+ *
+ * @param type
+ */
 unique_ptr<EAbstractData> DataFactory::make(quint16 type) const
 {
+   EDEBUG_FUNC(this,type);
+
    switch (type)
    {
    case ExpressionMatrixType: return unique_ptr<EAbstractData>(new ExpressionMatrix);
diff --git a/src/core/datafactory.h b/src/core/datafactory.h
index d7d39b6..88c921c 100644
--- a/src/core/datafactory.h
+++ b/src/core/datafactory.h
@@ -4,9 +4,17 @@
 
 
 
+/*!
+ * This class implements the ACE data factory for producing new data objects
+ * and giving basic information about all available data types.
+ */
 class DataFactory : public EAbstractDataFactory
 {
 public:
+   /*!
+    * Defines all available data types this program implements along with the total
+    * size.
+    */
    enum Type
    {
       ExpressionMatrixType = 0
diff --git a/src/core/exportcorrelationmatrix.cpp b/src/core/exportcorrelationmatrix.cpp
index 518d406..504513d 100644
--- a/src/core/exportcorrelationmatrix.cpp
+++ b/src/core/exportcorrelationmatrix.cpp
@@ -1,12 +1,27 @@
 #include "exportcorrelationmatrix.h"
 #include "exportcorrelationmatrix_input.h"
 #include "datafactory.h"
+#include "expressionmatrix_gene.h"
 
 
 
+using namespace std;
+
+
+
+
+
+
+/*!
+ * Return the total number of blocks this analytic must process as steps
+ * or blocks of work. This implementation uses a work block for writing
+ * each pair to the output file.
+ */
 int ExportCorrelationMatrix::size() const
 {
-   return 1;
+   EDEBUG_FUNC(this);
+
+   return _cmx->size();
 }
 
 
@@ -14,100 +29,111 @@ int ExportCorrelationMatrix::size() const
 
 
 
-void ExportCorrelationMatrix::process(const EAbstractAnalytic::Block* result)
+/*!
+ * Process the given index with a possible block of results if this analytic
+ * produces work blocks. This implementation uses only the index of the result
+ * block to determine which piece of work to do.
+ *
+ * @param result
+ */
+void ExportCorrelationMatrix::process(const EAbstractAnalytic::Block*)
 {
-   Q_UNUSED(result);
-
-   // initialize pair iterators
-   CorrelationMatrix::Pair cmxPair(_cmx);
-   CCMatrix::Pair ccmPair(_ccm);
+   EDEBUG_FUNC(this);
 
    // initialize workspace
    QString sampleMask(_ccm->sampleSize(), '0');
 
-   // create text stream to output file and write until end reached
-   QTextStream stream(_output);
-   stream.setRealNumberPrecision(6);
+   // read next pair
+   _cmxPair.readNext();
+   _ccmPair.read(_cmxPair.index());
 
-   // iterate through all pairs
-   while ( cmxPair.hasNext() )
+   // write pairwise data to output file
+   for ( int k = 0; k < _cmxPair.clusterSize(); k++ )
    {
-      // read next pair
-      cmxPair.readNext();
-
-      if ( cmxPair.clusterSize() > 1 )
+      float correlation = _cmxPair.at(k, 0);
+      int numSamples = 0;
+      int numMissing = 0;
+      int numPostOutliers = 0;
+      int numPreOutliers = 0;
+      int numThreshold = 0;
+
+      // if cluster data exists then use it
+      if ( _ccmPair.clusterSize() > 0 )
       {
-         ccmPair.read(cmxPair.index());
+         // compute summary statistics
+         for ( int i = 0; i < _ccm->sampleSize(); i++ )
+         {
+            switch ( _ccmPair.at(k, i) )
+            {
+            case 1:
+               numSamples++;
+               break;
+            case 6:
+               numThreshold++;
+               break;
+            case 7:
+               numPreOutliers++;
+               break;
+            case 8:
+               numPostOutliers++;
+               break;
+            case 9:
+               numMissing++;
+               break;
+            }
+         }
+
+         // write sample mask to string
+         for ( int i = 0; i < _ccm->sampleSize(); i++ )
+         {
+            sampleMask[i] = '0' + _ccmPair.at(k, i);
+         }
       }
 
-      // write pairwise data to output file
-      for ( int k = 0; k < cmxPair.clusterSize(); k++ )
+      // otherwise use expression data
+      else
       {
-         float correlation = cmxPair.at(k, 0);
-         int numSamples = 0;
-         int numMissing = 0;
-         int numPostOutliers = 0;
-         int numPreOutliers = 0;
-         int numThreshold = 0;
-
-         // if there are multiple clusters then use cluster data
-         if ( cmxPair.clusterSize() > 1 )
+         // read in gene expressions
+         ExpressionMatrix::Gene gene1(_emx);
+         ExpressionMatrix::Gene gene2(_emx);
+
+         gene1.read(_cmxPair.index().getX());
+         gene2.read(_cmxPair.index().getY());
+
+         // determine sample mask, summary statistics from expression data
+         for ( int i = 0; i < _emx->sampleSize(); ++i )
          {
-            // compute summary statistics
-            for ( int i = 0; i < _ccm->sampleSize(); i++ )
+            if ( isnan(gene1.at(i)) || isnan(gene2.at(i)) )
             {
-               switch ( ccmPair.at(k, i) )
-               {
-               case 1:
-                  numSamples++;
-                  break;
-               case 6:
-                  numThreshold++;
-                  break;
-               case 7:
-                  numPreOutliers++;
-                  break;
-               case 8:
-                  numPostOutliers++;
-                  break;
-               case 9:
-                  numMissing++;
-                  break;
-               }
+               sampleMask[i] = '9';
+               numMissing++;
             }
-
-            // write sample mask to string
-            for ( int i = 0; i < _ccm->sampleSize(); i++ )
+            else
             {
-               sampleMask[i] = '0' + ccmPair.at(k, i);
+               sampleMask[i] = '1';
+               numSamples++;
             }
          }
-
-         // else just initialize empty sample mask
-         else
-         {
-            sampleMask.fill('0');
-         }
-
-         // write cluster to output file
-         stream
-            << cmxPair.index().getX()
-            << "\t" << cmxPair.index().getY()
-            << "\t" << k
-            << "\t" << cmxPair.clusterSize()
-            << "\t" << numSamples
-            << "\t" << numMissing
-            << "\t" << numPostOutliers
-            << "\t" << numPreOutliers
-            << "\t" << numThreshold
-            << "\t" << correlation
-            << "\t" << sampleMask
-            << "\n";
       }
+
+      // write cluster to output file
+      _stream
+         << _cmxPair.index().getX()
+         << "\t" << _cmxPair.index().getY()
+         << "\t" << k
+         << "\t" << _cmxPair.clusterSize()
+         << "\t" << numSamples
+         << "\t" << numMissing
+         << "\t" << numPostOutliers
+         << "\t" << numPreOutliers
+         << "\t" << numThreshold
+         << "\t" << correlation
+         << "\t" << sampleMask
+         << "\n";
    }
 
    // make sure writing output file worked
-   if ( stream.status() != QTextStream::Ok )
+   if ( _stream.status() != QTextStream::Ok )
    {
       E_MAKE_EXCEPTION(e);
       e.setTitle(tr("File IO Error"));
@@ -121,8 +147,13 @@ void ExportCorrelationMatrix::process(const EAbstractAnalytic::Block* result)
 
 
 
+/*!
+ * Make a new input object and return its pointer.
+ */
 EAbstractAnalytic::Input* ExportCorrelationMatrix::makeInput()
 {
+   EDEBUG_FUNC(this);
+
    return new Input(this);
 }
 
@@ -131,13 +162,28 @@ EAbstractAnalytic::Input* ExportCorrelationMatrix::makeInput()
 
 
 
+/*!
+ * Initialize this analytic. This implementation checks to make sure the input
+ * data objects and output file have been set.
+ */
 void ExportCorrelationMatrix::initialize()
 {
-   if ( !_ccm || !_cmx || !_output )
+   EDEBUG_FUNC(this);
+
+   // make sure input/output arguments are valid
+   if ( !_emx || !_ccm || !_cmx || !_output )
    {
       E_MAKE_EXCEPTION(e);
       e.setTitle(tr("Invalid Argument"));
       e.setDetails(tr("Did not get valid input and/or output arguments."));
       throw e;
    }
+
+   // initialize pairwise iterators
+   _ccmPair = CCMatrix::Pair(_ccm);
+   _cmxPair = CorrelationMatrix::Pair(_cmx);
+
+   // initialize output file stream
+   _stream.setDevice(_output);
+   _stream.setRealNumberPrecision(8);
 }
diff --git a/src/core/exportcorrelationmatrix.h b/src/core/exportcorrelationmatrix.h
index efa16a7..9f965ca 100644
--- a/src/core/exportcorrelationmatrix.h
+++ b/src/core/exportcorrelationmatrix.h
@@ -2,11 +2,25 @@
 #define EXPORTCORRELATIONMATRIX_H
 #include <ace/core/core.h>
 
+#include "ccmatrix_pair.h"
 #include "ccmatrix.h"
+#include "correlationmatrix_pair.h"
 #include "correlationmatrix.h"
+#include "expressionmatrix.h"
 
 
 
+/*!
+ * This class implements the export correlation matrix analytic. This analytic
+ * takes two data objects, a correlation matrix and a cluster matrix, and writes
+ * a text file of correlations, where each line is a correlation that includes
+ * the pairwise index, correlation value, and sample mask, as well as several
+ * other fields which are not used but are required for this format. The analytic
+ * attempts to recreate these fields as much as is possible. The expression matrix
+ * that was used to produce the correlation matrix must also be provided in order
+ * to recreate sample masks for pairs with only one cluster, as these sample masks
+ * are not stored in the cluster matrix.
+ */
 class ExportCorrelationMatrix : public EAbstractAnalytic
 {
    Q_OBJECT
@@ -17,8 +31,27 @@ class ExportCorrelationMatrix : public EAbstractAnalytic
    virtual EAbstractAnalytic::Input* makeInput() override final;
    virtual void initialize();
 private:
+   /**
+    * Workspace variables to write to the output file
+    */
+   QTextStream _stream;
+   CCMatrix::Pair _ccmPair;
+   CorrelationMatrix::Pair _cmxPair;
+   /*!
+    * Pointer to the input expression matrix.
+    */
+   ExpressionMatrix* _emx {nullptr};
+   /*!
+    * Pointer to the input cluster matrix.
+    */
    CCMatrix* _ccm {nullptr};
+   /*!
+    * Pointer to the input correlation matrix.
+    */
    CorrelationMatrix* _cmx {nullptr};
+   /*!
+    * Pointer to the output text file.
+    */
    QFile* _output {nullptr};
 };
 
diff --git a/src/core/exportcorrelationmatrix_input.cpp b/src/core/exportcorrelationmatrix_input.cpp
index f2270a6..ad8af5f 100644
--- a/src/core/exportcorrelationmatrix_input.cpp
+++ b/src/core/exportcorrelationmatrix_input.cpp
@@ -3,18 +3,30 @@
 
 
 
+/*!
+ * Construct a new input object with the given analytic as its parent.
+ *
+ * @param parent
+ */
 ExportCorrelationMatrix::Input::Input(ExportCorrelationMatrix* parent):
    EAbstractAnalytic::Input(parent),
    _base(parent)
-{}
+{
+   EDEBUG_FUNC(this,parent);
+}
 
 
 
 
 
 
+/*!
+ * Return the total number of arguments this analytic type contains.
+ */
 int ExportCorrelationMatrix::Input::size() const
 {
+   EDEBUG_FUNC(this);
+
    return Total;
 }
 
@@ -23,10 +35,18 @@ int ExportCorrelationMatrix::Input::size() const
 
 
 
+/*!
+ * Return the argument type for a given index.
+ *
+ * @param index
+ */
 EAbstractAnalytic::Input::Type ExportCorrelationMatrix::Input::type(int index) const
 {
+   EDEBUG_FUNC(this,index);
+
    switch (index)
    {
+   case ExpressionData: return Type::DataIn;
    case ClusterData: return Type::DataIn;
    case CorrelationData: return Type::DataIn;
    case OutputFile: return Type::FileOut;
@@ -39,10 +59,27 @@ EAbstractAnalytic::Input::Type ExportCorrelationMatrix::Input::type(int index) c
 
 
 
+/*!
+ * Return data for a given role on an argument with the given index.
+ *
+ * @param index
+ * @param role
+ */
 QVariant ExportCorrelationMatrix::Input::data(int index, Role role) const
 {
+   EDEBUG_FUNC(this,index,role);
+
    switch (index)
    {
+   case ExpressionData:
+      switch (role)
+      {
+      case Role::CommandLineName: return QString("emx");
+      case Role::Title: return tr("Expression Matrix:");
+      case Role::WhatsThis: return tr("Input expression matrix containing gene expression data.");
+      case Role::DataType: return DataFactory::ExpressionMatrixType;
+      default: return QVariant();
+      }
    case ClusterData:
       switch (role)
       {
@@ -79,10 +116,16 @@ QVariant ExportCorrelationMatrix::Input::data(int index, Role role) const
 
 
 
-void ExportCorrelationMatrix::Input::set(int index, const QVariant& value)
+/*!
+ * Set an argument with the given index to the given value. This analytic has
+ * no basic arguments so this function does nothing.
+ *
+ * @param index
+ * @param value
+ */
+void ExportCorrelationMatrix::Input::set(int, const QVariant&)
 {
-   Q_UNUSED(index);
-   Q_UNUSED(value);
+   EDEBUG_FUNC(this);
 }
 
 
@@ -90,9 +133,21 @@ void ExportCorrelationMatrix::Input::set(int index, const QVariant& value)
 
 
 
+/*!
+ * Set a data argument with the given index to the given data object pointer.
+ *
+ * @param index
+ * @param data
+ */
 void ExportCorrelationMatrix::Input::set(int index, EAbstractData* data)
 {
-   if ( index == ClusterData )
+   EDEBUG_FUNC(this,index,data);
+
+   if ( index == ExpressionData )
+   {
+      _base->_emx = data->cast<ExpressionMatrix>();
+   }
+   else if ( index == ClusterData )
    {
       _base->_ccm = data->cast<CCMatrix>();
    }
@@ -107,8 +162,16 @@ void ExportCorrelationMatrix::Input::set(int index, EAbstractData* data)
 
 
 
+/*!
+ * Set a file argument with the given index to the given qt file pointer.
+ *
+ * @param index
+ * @param file
+ */
 void ExportCorrelationMatrix::Input::set(int index, QFile* file)
 {
+   EDEBUG_FUNC(this,index,file);
+
    if ( index == OutputFile )
    {
       _base->_output = file;
diff --git a/src/core/exportcorrelationmatrix_input.h b/src/core/exportcorrelationmatrix_input.h
index 074dd98..7b46222 100644
--- a/src/core/exportcorrelationmatrix_input.h
+++ b/src/core/exportcorrelationmatrix_input.h
@@ -4,13 +4,20 @@
 
 
 
+/*!
+ * This class implements the abstract input of the export correlation matrix analytic.
+ */
 class ExportCorrelationMatrix::Input : public EAbstractAnalytic::Input
 {
    Q_OBJECT
 public:
+   /*!
+    * Defines all input arguments for this analytic.
+    */
    enum Argument
    {
-      ClusterData = 0
+      ExpressionData = 0
+      ,ClusterData
       ,CorrelationData
       ,OutputFile
       ,Total
@@ -23,6 +30,9 @@ class ExportCorrelationMatrix::Input : public EAbstractAnalytic::Input
    virtual void set(int index, EAbstractData* data) override final;
    virtual void set(int index, QFile* file) override final;
 private:
+   /*!
+    * Pointer to the base analytic for this object.
+    */
    ExportCorrelationMatrix* _base;
 };
 
diff --git a/src/core/exportexpressionmatrix.cpp b/src/core/exportexpressionmatrix.cpp
index f6bbb11..2d962e1 100644
--- a/src/core/exportexpressionmatrix.cpp
+++ b/src/core/exportexpressionmatrix.cpp
@@ -1,15 +1,23 @@
 #include "exportexpressionmatrix.h"
 #include "exportexpressionmatrix_input.h"
 #include "datafactory.h"
+#include "expressionmatrix_gene.h"
 
 
 
 
 
 
+/*!
+ * Return the total number of blocks this analytic must process as steps
+ * or blocks of work. This implementation uses a work block for writing the
+ * sample names and a work block for writing each gene.
+ */
 int ExportExpressionMatrix::size() const
 {
-   return 1;
+   EDEBUG_FUNC(this);
+
+   return 1 + _input->geneSize();
 }
 
 
@@ -17,79 +25,74 @@ int ExportExpressionMatrix::size() const
 
 
 
+/*!
+ * Process the given index with a possible block of results if this analytic
+ * produces work blocks. This implementation uses only the index of the result
+ * block to determine which piece of work to do.
+ *
+ * @param result
+ */
 void ExportExpressionMatrix::process(const EAbstractAnalytic::Block* result)
 {
-   Q_UNUSED(result);
-
-   // use expression declaration
-   using Expression = ExpressionMatrix::Expression;
-   using Transform = ExpressionMatrix::Transform;
+   EDEBUG_FUNC(this,result);
 
-   // get gene names, sample names, and transform
-   EMetaArray geneNames = _input->getGeneNames().toArray();
-   EMetaArray sampleNames = _input->getSampleNames().toArray();
-   Transform transform = _input->getTransform();
+   // write the sample names in the first step
+   if ( result->index() == 0 )
+   {
+      // get sample names
+      EMetaArray sampleNames {_input->sampleNames()};
 
-   // create text stream to output file
-   QTextStream stream(_output);
-   stream.setRealNumberPrecision(12);
+      // initialize output file stream
+      _stream.setDevice(_output);
+      _stream.setRealNumberPrecision(8);
 
-   // write sample names
-   for ( int i = 0; i < _input->getSampleSize(); i++ )
-   {
-      stream << sampleNames.at(i).toString() << "\t";
+      // write sample names
+      for ( int i = 0; i < _input->sampleSize(); i++ )
+      {
+         _stream << sampleNames.at(i).toString() << "\t";
+      }
+      _stream << "\n";
    }
-   stream << "\n";
 
-   // write each gene to a line in output file
-   ExpressionMatrix::Gene gene(_input);
-   for ( int i = 0; i < _input->getGeneSize(); i++ )
+   // write each gene to the output file in a separate step
+   else
    {
+      // get gene index
+      int i = result->index() - 1;
+
+      // get gene name
+      QString geneName {_input->geneNames().at(i).toString()};
+
       // load gene from expression matrix
+      ExpressionMatrix::Gene gene(_input);
       gene.read(i);
 
       // write gene name
-      stream << geneNames.at(i).toString();
+      _stream << geneName;
 
       // write expression values
-      for ( int j = 0; j < _input->getSampleSize(); j++ )
+      for ( int j = 0; j < _input->sampleSize(); j++ )
       {
-         Expression value {gene.at(j)};
+         float value {gene.at(j)};
 
          // if value is NAN use the no sample token
          if ( std::isnan(value) )
          {
-            stream << "\t" << _noSampleToken;
+            _stream << "\t" << _nanToken;
          }
 
          // else this is a normal floating point expression
          else
          {
-            // apply transform and write value
-            switch (transform)
-            {
-            case Transform::None:
-               break;
-            case Transform::NLog:
-               value = exp(value);
-               break;
-            case Transform::Log2:
-               value = pow(2, value);
-               break;
-            case Transform::Log10:
-               value = pow(10, value);
-               break;
-            }
-
-            stream << "\t" << value;
+            _stream << "\t" << value;
          }
       }
 
-      stream << "\n";
+      _stream << "\n";
    }
 
    // make sure writing output file worked
-   if ( stream.status() != QTextStream::Ok )
+   if ( _stream.status() != QTextStream::Ok )
    {
       E_MAKE_EXCEPTION(e);
       e.setTitle(tr("File IO Error"));
@@ -103,8 +106,13 @@ void ExportExpressionMatrix::process(const EAbstractAnalytic::Block* result)
 
 
 
+/*!
+ * Make a new input object and return its pointer.
+ */
 EAbstractAnalytic::Input* ExportExpressionMatrix::makeInput()
 {
+   EDEBUG_FUNC(this);
+
    return new Input(this);
 }
 
@@ -113,8 +121,14 @@ EAbstractAnalytic::Input* ExportExpressionMatrix::makeInput()
 
 
 
+/*!
+ * Initialize this analytic. This implementation checks to make sure the input
+ * data object and output file have been set.
+ */
 void ExportExpressionMatrix::initialize()
 {
+   EDEBUG_FUNC(this);
+
    if ( !_input || !_output )
    {
       E_MAKE_EXCEPTION(e);
diff --git a/src/core/exportexpressionmatrix.h b/src/core/exportexpressionmatrix.h
index ca2923e..8e2d451 100644
--- a/src/core/exportexpressionmatrix.h
+++ b/src/core/exportexpressionmatrix.h
@@ -6,6 +6,13 @@
 
 
 
+/*!
+ * This class implements the export expression matrix analytic. This analytic
+ * writes an expression matrix to a text file as table; that is, with each row
+ * on a line, each value separated by whitespace, and the first row and column
+ * containing the row names and column names, respectively. Elements which are
+ * NAN in the expression matrix are written as the given NAN token.
+ */
 class ExportExpressionMatrix : public EAbstractAnalytic
 {
    Q_OBJECT
@@ -16,9 +23,22 @@ class ExportExpressionMatrix : public EAbstractAnalytic
    virtual EAbstractAnalytic::Input* makeInput() override final;
    virtual void initialize();
 private:
+   /**
+    * Workspace variables to write to the output file
+    */
+   QTextStream _stream;
+   /*!
+    * Pointer to the input expression matrix.
+    */
    ExpressionMatrix* _input {nullptr};
+   /*!
+    * Pointer to the output text file.
+    */
    QFile* _output {nullptr};
-   QString _noSampleToken;
+   /*!
+    * The string token used to represent NAN values.
+    */
+   QString _nanToken {"NA"};
 };
 
 
diff --git a/src/core/exportexpressionmatrix_input.cpp b/src/core/exportexpressionmatrix_input.cpp
index bf38a67..02b7c58 100644
--- a/src/core/exportexpressionmatrix_input.cpp
+++ b/src/core/exportexpressionmatrix_input.cpp
@@ -6,18 +6,30 @@
 
 
 
+/*!
+ * Construct a new input object with the given analytic as its parent.
+ *
+ * @param parent
+ */
 ExportExpressionMatrix::Input::Input(ExportExpressionMatrix* parent):
    EAbstractAnalytic::Input(parent),
    _base(parent)
-{}
+{
+   EDEBUG_FUNC(this,parent);
+}
 
 
 
 
 
 
+/*!
+ * Return the total number of arguments this analytic type contains.
+ */
 int ExportExpressionMatrix::Input::size() const
 {
+   EDEBUG_FUNC(this);
+
    return Total;
 }
 
@@ -26,13 +38,20 @@ int ExportExpressionMatrix::Input::size() const
 
 
 
+/*!
+ * Return the argument type for a given index.
+ *
+ * @param index
+ */
 EAbstractAnalytic::Input::Type ExportExpressionMatrix::Input::type(int index) const
 {
+   EDEBUG_FUNC(this,index);
+
    switch (index)
    {
    case InputData: return Type::DataIn;
    case OutputFile: return Type::FileOut;
-   case NoSampleToken: return Type::String;
+   case NANToken: return Type::String;
    default: return Type::Boolean;
    }
 }
@@ -42,8 +61,16 @@ EAbstractAnalytic::Input::Type ExportExpressionMatrix::Input::type(int index) co
 
 
 
+/*!
+ * Return data for a given role on an argument with the given index.
+ *
+ * @param index
+ * @param role
+ */
 QVariant ExportExpressionMatrix::Input::data(int index, Role role) const
 {
+   EDEBUG_FUNC(this,index,role);
+
    switch (index)
    {
    case InputData:
@@ -64,12 +91,13 @@ QVariant ExportExpressionMatrix::Input::data(int index, Role role) const
       case Role::FileFilters: return tr("Text file %1").arg("(*.txt)");
       default: return QVariant();
       }
-   case NoSampleToken:
+   case NANToken:
       switch (role)
       {
       case Role::CommandLineName: return QString("nan");
-      case Role::Title: return tr("No Sample Token:");
+      case Role::Title: return tr("NAN Token:");
       case Role::WhatsThis: return tr("Expected token for expressions that have no value.");
+      case Role::Default: return "NA";
       default: return QVariant();
       }
    default: return QVariant();
@@ -81,12 +109,20 @@ QVariant ExportExpressionMatrix::Input::data(int index, Role role) const
 
 
 
+/*!
+ * Set an argument with the given index to the given value.
+ *
+ * @param index
+ * @param value
+ */
 void ExportExpressionMatrix::Input::set(int index, const QVariant& value)
 {
+   EDEBUG_FUNC(this,index,&value);
+
    switch (index)
    {
-   case NoSampleToken:
-      _base->_noSampleToken = value.toString();
+   case NANToken:
+      _base->_nanToken = value.toString();
       break;
    }
 }
@@ -96,8 +132,16 @@ void ExportExpressionMatrix::Input::set(int index, const QVariant& value)
 
 
 
+/*!
+ * Set a data argument with the given index to the given data object pointer.
+ *
+ * @param index
+ * @param data
+ */
 void ExportExpressionMatrix::Input::set(int index, EAbstractData* data)
 {
+   EDEBUG_FUNC(this,index,data);
+
    if ( index == InputData )
    {
       _base->_input = data->cast<ExpressionMatrix>();
@@ -109,8 +153,16 @@ void ExportExpressionMatrix::Input::set(int index, EAbstractData* data)
 
 
 
+/*!
+ * Set a file argument with the given index to the given qt file pointer.
+ *
+ * @param index
+ * @param file
+ */
 void ExportExpressionMatrix::Input::set(int index, QFile* file)
 {
+   EDEBUG_FUNC(this,index,file);
+
    if ( index == OutputFile )
    {
       _base->_output = file;
diff --git a/src/core/exportexpressionmatrix_input.h b/src/core/exportexpressionmatrix_input.h
index 9c2dacb..420c6e1 100644
--- a/src/core/exportexpressionmatrix_input.h
+++ b/src/core/exportexpressionmatrix_input.h
@@ -4,15 +4,21 @@
 
 
 
+/*!
+ * This class implements the abstract input of the export expression matrix analytic.
+ */
 class ExportExpressionMatrix::Input : public EAbstractAnalytic::Input
 {
    Q_OBJECT
 public:
+   /*!
+    * Defines all input arguments for this analytic.
+    */
    enum Argument
    {
       InputData = 0
       ,OutputFile
-      ,NoSampleToken
+      ,NANToken
       ,Total
    };
    explicit Input(ExportExpressionMatrix* parent);
@@ -23,6 +29,9 @@ class ExportExpressionMatrix::Input : public EAbstractAnalytic::Input
    virtual void set(int index, EAbstractData* data) override final;
    virtual void set(int index, QFile* file) override final;
 private:
+   /*!
+    * Pointer to the base analytic for this object.
+    */
    ExportExpressionMatrix* _base;
 };
 
diff --git a/src/core/expressionmatrix.cpp b/src/core/expressionmatrix.cpp
index 188b5d6..a25bd2f 100644
--- a/src/core/expressionmatrix.cpp
+++ b/src/core/expressionmatrix.cpp
@@ -1,27 +1,18 @@
 #include "expressionmatrix.h"
+#include "expressionmatrix_model.h"
+//
 
 
 
-
-
-
-const QStringList ExpressionMatrix::TRANSFORM_NAMES
-{
-   "none"
-   ,"natural logarithm"
-   ,"logarithm base 2"
-   ,"logarithm base 10"
-};
-
-
-
-
-
-
+/*!
+ * Return the index of the first byte in this data object after the end of
+ * the data section. Defined as the header size plus the size of the matrix data.
+ */
 qint64 ExpressionMatrix::dataEnd() const
 {
-   // calculate and return end of data
-   return DATA_OFFSET + ((qint64)_geneSize * (qint64)_sampleSize * sizeof(Expression));
+   EDEBUG_FUNC(this);
+
+   return _headerSize + (qint64)_geneSize * (qint64)_sampleSize * sizeof(float);
 }
 
 
@@ -29,10 +20,17 @@ qint64 ExpressionMatrix::dataEnd() const
 
 
 
+/*!
+ * Read in the data of an existing data object that was just opened.
+ */
 void ExpressionMatrix::readData()
 {
-   // read header
+   EDEBUG_FUNC(this);
+
+   // seek to the beginning of the data
    seek(0);
+
+   // read the header
    stream() >> _geneSize >> _sampleSize;
 }
 
@@ -41,24 +39,21 @@ void ExpressionMatrix::readData()
 
 
 
+/*!
+ * Initialize this data object's data to a null state.
+ */
 void ExpressionMatrix::writeNewData()
 {
-   // initialize metadata
-   setMeta(EMetadata(EMetadata::Object));
-
-   // initialize header
-   seek(0);
-   stream() << _geneSize << _sampleSize;
-}
-
-
-
+   EDEBUG_FUNC(this);
 
+   // initialize metadata object
+   setMeta(EMetaObject());
 
+   // seek to the beginning of the data
+   seek(0);
 
-QAbstractTableModel* ExpressionMatrix::model()
-{
-   return nullptr;
+   // write the header
+   stream() << _geneSize << _sampleSize;
 }
 
 
@@ -66,10 +61,18 @@ QAbstractTableModel* ExpressionMatrix::model()
 
 
 
+/*!
+ * Finalize this data object's data after the analytic that created it has
+ * finished giving it new data.
+ */
 void ExpressionMatrix::finish()
 {
-   // write header
+   EDEBUG_FUNC(this);
+
+   // seek to the beginning of the data
    seek(0);
+
+   // write the header
    stream() << _geneSize << _sampleSize;
 }
 
@@ -78,55 +81,18 @@ void ExpressionMatrix::finish()
 
 
 
-QVariant ExpressionMatrix::headerData(int section, Qt::Orientation orientation, int role) const
+/*!
+ * Return a qt table model that represents this data object as a table.
+ */
+QAbstractTableModel* ExpressionMatrix::model()
 {
-   // if this is not display role return nothing
-   if ( role != Qt::DisplayRole )
-   {
-      return QVariant();
-   }
+   EDEBUG_FUNC(this);
 
-   // get metadata root and figure out orientation
-   switch (orientation)
-   {
-   case Qt::Vertical:
-   {
-      // get gene names and make sure it is array
-      EMetadata genes {meta().toObject().at("genes")};
-      if ( genes.isArray() )
-      {
-         // make sure section is within limits of array
-         if ( section >= 0 && section < genes.toArray().size() )
-         {
-            // return gene name
-            return genes.toArray().at(section).toString();
-         }
-      }
-
-      // if no gene name found return nothing
-      return QVariant();
-   }
-   case Qt::Horizontal:
+   if ( !_model )
    {
-      // get sample names and make sure it is array
-      EMetadata samples {meta().toObject().at("samples")};
-      if ( samples.isArray() )
-      {
-         // make sure section is within limits of array
-         if ( section >= 0 && section < samples.toArray().size() )
-         {
-            // return sample name
-            return samples.toArray().at(section).toString();
-         }
-      }
-
-      // if no sample name found return nothing
-      return QVariant();
-   }
-   default:
-      // unknown orientation so return nothing
-      return QVariant();
+      _model = new Model(this);
    }
+   return _model;
 }
 
 
@@ -134,10 +100,13 @@ QVariant ExpressionMatrix::headerData(int section, Qt::Orientation orientation,
 
 
 
-int ExpressionMatrix::rowCount(const QModelIndex& parent) const
+/*!
+ * Return the number of genes (rows) in this expression matrix.
+ */
+qint32 ExpressionMatrix::geneSize() const
 {
-   // return gene size for row count
-   Q_UNUSED(parent);
+   EDEBUG_FUNC(this);
+
    return _geneSize;
 }
 
@@ -146,10 +115,13 @@ int ExpressionMatrix::rowCount(const QModelIndex& parent) const
 
 
 
-int ExpressionMatrix::columnCount(const QModelIndex& parent) const
+/*!
+ * Return the number of samples (columns) in this expression matrix.
+ */
+qint32 ExpressionMatrix::sampleSize() const
 {
-   // return sample size for column count
-   Q_UNUSED(parent);
+   EDEBUG_FUNC(this);
+
    return _sampleSize;
 }
 
@@ -158,29 +130,14 @@ int ExpressionMatrix::columnCount(const QModelIndex& parent) const
 
 
 
-QVariant ExpressionMatrix::data(const QModelIndex& index, int role) const
+/*!
+ * Return the list of gene names in this expression matrix.
+ */
+EMetaArray ExpressionMatrix::geneNames() const
 {
-   // if role is not display return nothing
-   if ( role != Qt::DisplayRole )
-   {
-      return QVariant();
-   }
-
-   // if index is out of range return nothing
-   if ( index.row() >= _geneSize || index.column() >= _sampleSize )
-   {
-      return QVariant();
-   }
-
-   // make input variable and seek to position of queried expression
-   Expression value;
-   seek(DATA_OFFSET+(((index.row()*_sampleSize)+index.column())*sizeof(Expression)));
+   EDEBUG_FUNC(this);
 
-   // read expression from file
-   stream() >> value;
-
-   // return expression
-   return value;
+   return meta().toObject().at("genes").toArray();
 }
 
 
@@ -188,42 +145,14 @@ QVariant ExpressionMatrix::data(const QModelIndex& index, int role) const
 
 
 
-void ExpressionMatrix::initialize(QStringList geneNames, QStringList sampleNames)
+/*!
+ * Return the list of sample names in this expression matrix.
+ */
+EMetaArray ExpressionMatrix::sampleNames() const
 {
-   // create metadata array of gene names
-   EMetaArray metaGeneNames;
-   for ( auto& geneName : geneNames )
-   {
-      metaGeneNames.append(geneName);
-   }
-
-   // create metadata array of sample names
-   EMetaArray metaSampleNames;
-   for ( auto& sampleName : sampleNames )
-   {
-      metaSampleNames.append(sampleName);
-   }
-
-   // insert gene and sample names to data object's metadata
-   EMetaObject metaObject {meta().toObject()};
-   metaObject.insert("genes", metaGeneNames);
-   metaObject.insert("samples", metaSampleNames);
-   setMeta(metaObject);
-
-   // set gene and sample size
-   _geneSize = geneNames.size();
-   _sampleSize = sampleNames.size();
-}
-
-
-
+   EDEBUG_FUNC(this);
 
-
-
-ExpressionMatrix::Transform ExpressionMatrix::getTransform() const
-{
-   QString transformName {meta().toObject().at("transform").toString()};
-   return static_cast<Transform>(TRANSFORM_NAMES.indexOf(transformName));
+   return meta().toObject().at("samples").toArray();
 }
 
 
@@ -231,46 +160,32 @@ ExpressionMatrix::Transform ExpressionMatrix::getTransform() const
 
 
 
-void ExpressionMatrix::setTransform(ExpressionMatrix::Transform transform)
+/*!
+ * Return an array of this expression matrix's data in row-major order.
+ */
+QVector<float> ExpressionMatrix::dumpRawData() const
 {
-   auto& transformName {TRANSFORM_NAMES.at(static_cast<int>(transform))};
-
-   EMetaObject metaObject {meta().toObject()};
-   metaObject.insert("transform", transformName);
-   setMeta(metaObject);
-}
-
-
+   EDEBUG_FUNC(this);
 
-
-
-
-qint64 ExpressionMatrix::getRawSize() const
-{
-   return (qint64)_geneSize * (qint64)_sampleSize;
-}
-
-
-
-
-
-
-ExpressionMatrix::Expression* ExpressionMatrix::dumpRawData() const
-{
-   // if there are no genes do nothing
+   // return empty array if expression matrix is empty
    if ( _geneSize == 0 )
    {
-      return nullptr;
+      return QVector<float>();
    }
 
-   // create new floating point array and populate with all gene expressions
-   Expression* ret {new Expression[getRawSize()]};
-   for (int i = 0; i < _geneSize ;++i)
+   // allocate an array with the same size as the expression matrix
+   QVector<float> ret(_geneSize*_sampleSize);
+
+   // seek to the beginning of the expression data
+   seekExpression(0,0);
+
+   // write each expression to the array
+   for (float& sample: ret)
    {
-      readGene(i,&ret[i*_sampleSize]);
+      stream() >> sample;
    }
 
-   // return new float array
+   // return the array
    return ret;
 }
 
@@ -279,74 +194,40 @@ ExpressionMatrix::Expression* ExpressionMatrix::dumpRawData() const
 
 
 
-EMetadata ExpressionMatrix::getGeneNames() const
-{
-   return meta().toObject().at("genes");
-}
-
-
-
-
-
-
-EMetadata ExpressionMatrix::getSampleNames() const
-{
-   return meta().toObject().at("samples");
-}
-
-
-
-
-
-
-void ExpressionMatrix::readGene(int index, Expression* expressions) const
+/*!
+ * Initialize this expression matrix with a list of gene names and a list of
+ * sample names.
+ *
+ * @param geneNames
+ * @param sampleNames
+ */
+void ExpressionMatrix::initialize(const QStringList& geneNames, const QStringList& sampleNames)
 {
-   // seek to position of beginning of gene's expressions
-   seek(DATA_OFFSET + (index * _sampleSize * sizeof(Expression)));
+   EDEBUG_FUNC(this,&geneNames,&sampleNames);
 
-   // read in all expressions for gene as block of floats
-   for ( int i = 0; i < _sampleSize; ++i )
+   // create a metadata array of gene names
+   EMetaArray metaGeneNames;
+   for ( auto& geneName : geneNames )
    {
-      stream() >> expressions[i];
+      metaGeneNames.append(geneName);
    }
-}
-
-
 
-
-
-
-void ExpressionMatrix::writeGene(int index, const Expression* expressions)
-{
-   // seek to position of beginning of gene's expressions
-   seek(DATA_OFFSET + (index * _sampleSize * sizeof(Expression)));
-
-   // overwrite all expressions for gene as block of floats
-   for ( int i = 0; i < _sampleSize; ++i )
+   // create a metadata array of sample names
+   EMetaArray metaSampleNames;
+   for ( auto& sampleName : sampleNames )
    {
-      stream() << expressions[i];
+      metaSampleNames.append(sampleName);
    }
-}
 
+   // save the gene names and sample names to metadata
+   EMetaObject metaObject {meta().toObject()};
+   metaObject.insert("genes",metaGeneNames);
+   metaObject.insert("samples",metaSampleNames);
+   setMeta(metaObject);
 
-
-
-
-
-void ExpressionMatrix::Gene::read(int index) const
-{
-   // make sure given gene index is within range
-   if ( index < 0 || index >= _matrix->_geneSize )
-   {
-      E_MAKE_EXCEPTION(e);
-      e.setTitle(tr("Domain Error"));
-      e.setDetails(tr("Attempting to read gene %1 when maximum is %2.").arg(index)
-                   .arg(_matrix->_geneSize-1));
-      throw e;
-   }
-
-   // read gene expressions from data object
-   _matrix->readGene(index,_expressions);
+   // initialize the gene size and sample size accordingly
+   _geneSize = geneNames.size();
+   _sampleSize = sampleNames.size();
 }
 
 
@@ -354,39 +235,30 @@ void ExpressionMatrix::Gene::read(int index) const
 
 
 
-void ExpressionMatrix::Gene::write(int index)
+/*!
+ * Seek to a particular expression in this expression matrix given a gene index
+ * and a sample index.
+ *
+ * @param gene
+ * @param sample
+ */
+void ExpressionMatrix::seekExpression(int gene, int sample) const
 {
-   // make sure given gene index is within range
-   if ( index < 0 || index >= _matrix->_geneSize )
-   {
-      E_MAKE_EXCEPTION(e);
-      e.setTitle(tr("Domain Error"));
-      e.setDetails(tr("Attempting to write gene %1 when maximum is %2.").arg(index)
-                   .arg(_matrix->_geneSize-1));
-      throw e;
-   }
+   EDEBUG_FUNC(this,gene,sample);
 
-   // write gene expressions to data object
-   _matrix->writeGene(index,_expressions);
-}
-
-
-
-
-
-
-ExpressionMatrix::Expression& ExpressionMatrix::Gene::at(int index)
-{
-   // make sure given sample index is within range
-   if ( index < 0 || index >= _matrix->_sampleSize )
+   // make sure that the indices are valid
+   if ( gene < 0 || gene >= _geneSize || sample < 0 || sample >= _sampleSize )
    {
       E_MAKE_EXCEPTION(e);
-      e.setTitle(tr("Domain Error"));
-      e.setDetails(tr("Attempting to access gene expression %1 when maximum is %2.").arg(index)
-                   .arg(_matrix->_sampleSize-1));
+      e.setTitle(tr("Invalid Argument"));
+      e.setDetails(tr("Invalid (gene,sample) index (%1,%2) with size of (%1,%2).")
+                   .arg(gene)
+                   .arg(sample)
+                   .arg(_geneSize)
+                   .arg(_sampleSize));
       throw e;
    }
 
-   // return gene expression
-   return _expressions[index];
+   // seek to the specified position in the data
+   seek(_headerSize + ((qint64)gene*(qint64)_sampleSize + (qint64)sample)*sizeof(float));
 }
diff --git a/src/core/expressionmatrix.h b/src/core/expressionmatrix.h
index e11d9fc..6a84d99 100644
--- a/src/core/expressionmatrix.h
+++ b/src/core/expressionmatrix.h
@@ -1,68 +1,55 @@
 #ifndef EXPRESSIONMATRIX_H
 #define EXPRESSIONMATRIX_H
 #include <ace/core/core.h>
+//
 
 
 
+/*!
+ * This class implements the expression matrix data object. An expression matrix
+ * is a matrix of real numbers whose rows represent genes and whose columns
+ * represent samples. The matrix data can be accessed using the gene interator,
+ * which iterates through each gene (row) in the matrix.
+ */
 class ExpressionMatrix : public EAbstractData
 {
    Q_OBJECT
 public:
-   using Expression = float;
-   static const QStringList TRANSFORM_NAMES;
-   enum class Transform
-   {
-      None
-      ,NLog
-      ,Log2
-      ,Log10
-   };
    class Gene;
+public:
    virtual qint64 dataEnd() const override final;
    virtual void readData() override final;
    virtual void writeNewData() override final;
-   virtual QAbstractTableModel* model() override final;
    virtual void finish() override final;
-   QVariant headerData(int section, Qt::Orientation orientation, int role) const;
-   int rowCount(const QModelIndex& parent) const;
-   int columnCount(const QModelIndex& parent) const;
-   QVariant data(const QModelIndex& index, int role) const;
-   void initialize(QStringList geneNames, QStringList sampleNames);
-   Transform getTransform() const;
-   void setTransform(Transform scale);
-   qint32 getGeneSize() const { return _geneSize; }
-   qint32 getSampleSize() const { return _sampleSize; }
-   qint64 getRawSize() const;
-   Expression* dumpRawData() const;
-   EMetadata getGeneNames() const;
-   EMetadata getSampleNames() const;
-private:
-   void readGene(int index, Expression* expressions) const;
-   void writeGene(int index, const Expression* expressions);
-   static const int DATA_OFFSET {8};
-   qint32 _geneSize {0};
-   qint32 _sampleSize {0};
-};
-
-
-
-class ExpressionMatrix::Gene
-{
+   virtual QAbstractTableModel* model() override final;
 public:
-   Gene(ExpressionMatrix* matrix):
-      _expressions(new Expression[matrix->_sampleSize]),
-      _matrix(matrix)
-      {}
-   Gene(const Gene&) = delete;
-   ~Gene() { delete _expressions; }
-   void read(int index) const;
-   void write(int index);
-   Expression& at(int index);
-   const Expression& at(int index) const;
-   Expression& operator[](int index) { return _expressions[index]; }
+   qint32 geneSize() const;
+   qint32 sampleSize() const;
+   EMetaArray geneNames() const;
+   EMetaArray sampleNames() const;
+   QVector<float> dumpRawData() const;
+   void initialize(const QStringList& geneNames, const QStringList& sampleNames);
+private:
+   class Model;
 private:
-   Expression* _expressions;
-   ExpressionMatrix* _matrix;
+   void seekExpression(int gene, int sample) const;
+   /*!
+    * The header size (in bytes) at the beginning of the file. The header
+    * consists of the gene size and the sample size.
+    */
+   constexpr static const qint64 _headerSize {8};
+   /*!
+    * The number of genes (rows) in the expression matrix.
+    */
+   qint32 _geneSize;
+   /*!
+    * The number of samples (columns) in the expression matrix.
+    */
+   qint32 _sampleSize;
+   /*!
+    * Pointer to a qt table model for this class.
+    */
+   Model* _model {nullptr};
 };
 
 
diff --git a/src/core/expressionmatrix_gene.cpp b/src/core/expressionmatrix_gene.cpp
new file mode 100644
index 0000000..7d11e1b
--- /dev/null
+++ b/src/core/expressionmatrix_gene.cpp
@@ -0,0 +1,222 @@
+#include "expressionmatrix_gene.h"
+//
+
+
+
+
+
+
+/*!
+ * Return the expression value at the given index.
+ *
+ * @param index
+ */
+float& ExpressionMatrix::Gene::operator[](int index)
+{
+   EDEBUG_FUNC(this,index);
+
+   // make sure the index is valid
+   if ( index < 0 || index >= _matrix->_sampleSize )
+   {
+      E_MAKE_EXCEPTION(e);
+      e.setTitle(tr("Domain Error"));
+      e.setDetails(tr("Attempting to access gene expression %1 when maximum is %2.").arg(index)
+                   .arg(_matrix->_sampleSize-1));
+      throw e;
+   }
+
+   // return the specified value
+   return _expressions[index];
+}
+
+
+
+
+
+
+/*!
+ * Construct a gene iterator for an expression matrix. Additionally, if the
+ * matrix is already initialized, read the first gene into memory.
+ *
+ * @param matrix
+ * @param isInitialized
+ */
+ExpressionMatrix::Gene::Gene(ExpressionMatrix* matrix, bool isInitialized):
+   _matrix(matrix),
+   _expressions(new float[matrix->sampleSize()])
+{
+   EDEBUG_FUNC(this,matrix,isInitialized);
+
+   if ( isInitialized )
+   {
+      read(_index);
+   }
+}
+
+
+
+
+
+
+/*!
+ * Destruct a gene iterator.
+ */
+ExpressionMatrix::Gene::~Gene()
+{
+   EDEBUG_FUNC(this);
+
+   delete[] _expressions;
+}
+
+
+
+
+
+
+/*!
+ * Read a row of the expression matrix from the data object file into memory.
+ *
+ * @param index
+ */
+void ExpressionMatrix::Gene::read(int index)
+{
+   EDEBUG_FUNC(this,index);
+
+   // make sure the index is valid
+   if ( index < 0 || index >= _matrix->_geneSize )
+   {
+      E_MAKE_EXCEPTION(e);
+      e.setTitle(tr("Domain Error"));
+      e.setDetails(tr("Attempting to read gene %1 when maximum is %2.").arg(index)
+                   .arg(_matrix->_geneSize-1));
+      throw e;
+   }
+
+   // seek to the beginning of the specified row in the data object file
+   _matrix->seekExpression(index,0);
+
+   // read the entire row into memory
+   for (int i = 0; i < _matrix->sampleSize() ;++i)
+   {
+      _matrix->stream() >> _expressions[i];
+   }
+
+   // set the iterator's current index
+   _index = index;
+}
+
+
+
+
+
+
+/*!
+ * Read the next row of the expression matrix into memory.
+ */
+bool ExpressionMatrix::Gene::readNext()
+{
+   EDEBUG_FUNC(this);
+
+   // make sure that there is another row in the expression matrix
+   if ( (_index + 1) >= _matrix->_geneSize )
+   {
+      return false;
+   }
+
+   // read the next row
+   read(_index + 1);
+
+   // return success
+   return true;
+}
+
+
+
+
+
+
+/*!
+ * Write the iterator's row data to the data object file corresponding to
+ * the given row index.
+ *
+ * @param index
+ */
+void ExpressionMatrix::Gene::write(int index)
+{
+   EDEBUG_FUNC(this,index);
+
+   // make sure the index is valid
+   if ( index < 0 || index >= _matrix->_geneSize )
+   {
+      E_MAKE_EXCEPTION(e);
+      e.setTitle(tr("Domain Error"));
+      e.setDetails(tr("Attempting to write gene %1 when maximum is %2.").arg(index)
+                   .arg(_matrix->_geneSize-1));
+      throw e;
+   }
+
+   // seek to the beginning of the specified row in the data object file
+   _matrix->seekExpression(index,0);
+
+   // write the entire row to the data object
+   for (int i = 0; i < _matrix->sampleSize() ;++i)
+   {
+      _matrix->stream() << _expressions[i];
+   }
+
+   // set the iterator's current index
+   _index = index;
+}
+
+
+
+
+
+
+/*!
+ * Write the iterator's row data to the next row in the data object file.
+ */
+bool ExpressionMatrix::Gene::writeNext()
+{
+   EDEBUG_FUNC(this);
+
+   // make sure there is another row in the expression matrix
+   if ( (_index + 1) >= _matrix->_geneSize )
+   {
+      return false;
+   }
+
+   // write to the next row
+   write(_index + 1);
+
+   // return success
+   return true;
+}
+
+
+
+
+
+
+/*!
+ * Return the expression value at the given index.
+ *
+ * @param index
+ */
+float ExpressionMatrix::Gene::at(int index) const
+{
+   EDEBUG_FUNC(this,index);
+
+   // make sure the index is valid
+   if ( index < 0 || index >= _matrix->_sampleSize )
+   {
+      E_MAKE_EXCEPTION(e);
+      e.setTitle(tr("Domain Error"));
+      e.setDetails(tr("Attempting to access gene expression %1 when maximum is %2.").arg(index)
+                   .arg(_matrix->_sampleSize-1));
+      throw e;
+   }
+
+   // return the specified value
+   return _expressions[index];
+}
diff --git a/src/core/expressionmatrix_gene.h b/src/core/expressionmatrix_gene.h
new file mode 100644
index 0000000..4aeef15
--- /dev/null
+++ b/src/core/expressionmatrix_gene.h
@@ -0,0 +1,43 @@
+#ifndef EXPRESSIONMATRIX_GENE_H
+#define EXPRESSIONMATRIX_GENE_H
+#include "expressionmatrix.h"
+//
+
+
+
+/*!
+ * This class implements the gene iterator for the expression matrix data
+ * object. The gene iterator can read from or write to any gene (row) in the
+ * expression matrix, or simply iterate through each row. The iterator stores
+ * only one row of expression data in memory at a time.
+ */
+class ExpressionMatrix::Gene
+{
+public:
+   float& operator[](int index);
+public:
+   Gene(ExpressionMatrix* matrix, bool isInitialized = false);
+   ~Gene();
+   void read(int index);
+   bool readNext();
+   void write(int index);
+   bool writeNext();
+   float at(int index) const;
+private:
+   /*!
+    * Pointer to the parent expression matrix.
+    */
+   ExpressionMatrix* _matrix;
+   /*!
+    * The iterator's current position in the expression matrix.
+    */
+   int _index {0};
+   /*!
+    * Pointer to the expression data of the current gene.
+    */
+   float* _expressions;
+};
+
+
+
+#endif
diff --git a/src/core/expressionmatrix_model.cpp b/src/core/expressionmatrix_model.cpp
new file mode 100644
index 0000000..2a5ddee
--- /dev/null
+++ b/src/core/expressionmatrix_model.cpp
@@ -0,0 +1,148 @@
+#include "expressionmatrix_model.h"
+//
+
+
+
+
+
+
+/*!
+ * Construct a table model for an expression matrix.
+ *
+ * @param matrix
+ */
+ExpressionMatrix::Model::Model(ExpressionMatrix* matrix):
+   _matrix(matrix)
+{
+   EDEBUG_FUNC(this,matrix);
+
+   setParent(matrix);
+}
+
+
+
+
+
+
+/*!
+ * Return a header name for the table model using a given index and
+ * orientation (row / column).
+ *
+ * @param section
+ * @param orientation
+ * @param role
+ */
+QVariant ExpressionMatrix::Model::headerData(int section, Qt::Orientation orientation, int role) const
+{
+   EDEBUG_FUNC(this,section,orientation,role);
+
+   // make sure the role is valid
+   if ( role != Qt::DisplayRole )
+   {
+      return QVariant();
+   }
+
+   // determine whether to return a row name or column name
+   switch (orientation)
+   {
+   case Qt::Vertical:
+   {
+      // get gene names
+      EMetaArray geneNames {_matrix->geneNames()};
+
+      // make sure the index is valid
+      if ( section >= 0 && section < geneNames.size() )
+      {
+         // return the specified row name
+         return geneNames.at(section).toString();
+      }
+
+      // otherwise return empty string
+      return QVariant();
+   }
+   case Qt::Horizontal:
+   {
+      // get sample names
+      EMetaArray samples {_matrix->sampleNames()};
+
+      // make sure the index is valid
+      if ( section >= 0 && section < samples.size() )
+      {
+         // return the specified column name
+         return samples.at(section).toString();
+      }
+
+      // otherwise return empty string
+      return QVariant();
+   }
+   default:
+      // return empty string if orientation is not valid
+      return QVariant();
+   }
+}
+
+
+
+
+
+
+/*!
+ * Return the number of rows in the table model.
+ */
+int ExpressionMatrix::Model::rowCount(const QModelIndex&) const
+{
+   EDEBUG_FUNC(this);
+
+   return _matrix->_geneSize;
+}
+
+
+
+
+
+
+/*!
+ * Return the number of columns in the table model.
+ */
+int ExpressionMatrix::Model::columnCount(const QModelIndex&) const
+{
+   EDEBUG_FUNC(this);
+
+   return _matrix->_sampleSize;
+}
+
+
+
+
+
+
+/*!
+ * Return a data element in the table model using the given index.
+ *
+ * @param index
+ * @param role
+ */
+QVariant ExpressionMatrix::Model::data(const QModelIndex& index, int role) const
+{
+   EDEBUG_FUNC(this,&index,role);
+
+   // make sure the index and role are valid
+   if ( !index.isValid() || role != Qt::DisplayRole )
+   {
+      return QVariant();
+   }
+
+   // make sure the index is within the bounds of the expression matrix
+   if ( index.row() >= _matrix->_geneSize || index.column() >= _matrix->_sampleSize )
+   {
+      return QVariant();
+   }
+
+   // get the specified value from the expression matrix
+   float value;
+   _matrix->seekExpression(index.row(),index.column());
+   _matrix->stream() >> value;
+
+   // return the specified value
+   return value;
+}
diff --git a/src/core/expressionmatrix_model.h b/src/core/expressionmatrix_model.h
new file mode 100644
index 0000000..a5b29e3
--- /dev/null
+++ b/src/core/expressionmatrix_model.h
@@ -0,0 +1,29 @@
+#ifndef EXPRESSIONMATRIX_MODEL_H
+#define EXPRESSIONMATRIX_MODEL_H
+#include "expressionmatrix.h"
+//
+
+
+
+/*!
+ * This class implements the qt table model for the expression matrix
+ * data object, which represents the expression matrix as a table.
+ */
+class ExpressionMatrix::Model : public QAbstractTableModel
+{
+public:
+   Model(ExpressionMatrix* matrix);
+   virtual QVariant headerData(int section, Qt::Orientation orientation, int role) const override final;
+   virtual int rowCount(const QModelIndex& parent) const override final;
+   virtual int columnCount(const QModelIndex& parent) const override final;
+   virtual QVariant data(const QModelIndex& index, int role) const override final;
+private:
+   /*!
+    * Pointer to the data object for this table model.
+    */
+   ExpressionMatrix* _matrix;
+};
+
+
+
+#endif
diff --git a/src/core/extract.cpp b/src/core/extract.cpp
index c62fe2f..50684f8 100644
--- a/src/core/extract.cpp
+++ b/src/core/extract.cpp
@@ -1,6 +1,7 @@
 #include "extract.h"
 #include "extract_input.h"
 #include "datafactory.h"
+#include "expressionmatrix_gene.h"
 
 
 
@@ -11,9 +12,16 @@ using namespace std;
 
 
 
+/*!
+ * Return the total number of blocks this analytic must process as steps
+ * or blocks of work. This implementation uses a work block for writing
+ * each pair to the output file.
+ */
 int Extract::size() const
 {
-   return 1;
+   EDEBUG_FUNC(this);
+
+   return _cmx->size();
 }
 
 
@@ -21,251 +29,290 @@ int Extract::size() const
 
 
 
+/*!
+ * Process the given index with a possible block of results if this analytic
+ * produces work blocks. This implementation uses only the index of the result
+ * block to determine which piece of work to do.
+ *
+ * @param result
+ */
 void Extract::process(const EAbstractAnalytic::Block* result)
 {
-   Q_UNUSED(result);
+   EDEBUG_FUNC(this,result);
+
+   // write pair according to the output format
+   switch ( _outputFormat )
+   {
+   case OutputFormat::Text:
+      writeTextFormat(result->index());
+      break;
+   case OutputFormat::GraphML:
+      writeGraphMLFormat(result->index());
+      break;
+   }
+}
+
+
+
+
 
-   // initialize pair iterators
-   CorrelationMatrix::Pair cmxPair(_cmx);
-   CCMatrix::Pair ccmPair(_ccm);
+
+/*!
+ * Write the next pair using the text format.
+ *
+ * @param index
+ */
+void Extract::writeTextFormat(int index)
+{
+   EDEBUG_FUNC(this);
 
    // get gene names
-   EMetaArray geneNames {_cmx->geneNames().toArray()};
+   EMetaArray geneNames {_cmx->geneNames()};
 
    // initialize workspace
    QString sampleMask(_ccm->sampleSize(), '0');
 
-   // create text stream to output file and write until end reached
-   QTextStream stream(_output);
-   stream.setRealNumberPrecision(6);
-
    // write header to file
-   stream
-      << "Source"
-      << "\t" << "Target"
-      << "\t" << "sc"
-      << "\t" << "Interaction"
-      << "\t" << "Cluster"
-      << "\t" << "Num_Clusters"
-      << "\t" << "Cluster_Samples"
-      << "\t" << "Missing_Samples"
-      << "\t" << "Cluster_Outliers"
-      << "\t" << "Pair_Outliers"
-      << "\t" << "Too_Low"
-      << "\t" << "Samples"
-      << "\n";
-
-   // increment through all gene pairs
-   while ( cmxPair.hasNext() )
+   if ( index == 0 )
    {
-      // read next gene pair
-      cmxPair.readNext();
+      _stream
+         << "Source"
+         << "\t" << "Target"
+         << "\t" << "sc"
+         << "\t" << "Interaction"
+         << "\t" << "Cluster"
+         << "\t" << "Num_Clusters"
+         << "\t" << "Cluster_Samples"
+         << "\t" << "Missing_Samples"
+         << "\t" << "Cluster_Outliers"
+         << "\t" << "Pair_Outliers"
+         << "\t" << "Too_Low"
+         << "\t" << "Samples"
+         << "\n";
+   }
 
-      if ( cmxPair.clusterSize() > 1 )
+   // read next pair
+   _cmxPair.readNext();
+   _ccmPair.read(_cmxPair.index());
+
+   // write pairwise data to output file
+   for ( int k = 0; k < _cmxPair.clusterSize(); k++ )
+   {
+      QString source {geneNames.at(_cmxPair.index().getX()).toString()};
+      QString target {geneNames.at(_cmxPair.index().getY()).toString()};
+      float correlation {_cmxPair.at(k, 0)};
+      QString interaction {"co"};
+      int numSamples {0};
+      int numMissing {0};
+      int numPostOutliers {0};
+      int numPreOutliers {0};
+      int numThreshold {0};
+
+      // exclude cluster if correlation is not within thresholds
+      if ( fabs(correlation) < _minCorrelation || _maxCorrelation < fabs(correlation) )
       {
-         ccmPair.read(cmxPair.index());
+         continue;
       }
 
-      // write gene pair data to output file
-      for ( int k = 0; k < cmxPair.clusterSize(); k++ )
+      // if cluster data exists then use it
+      if ( _ccmPair.clusterSize() > 0 )
       {
-         auto& source {geneNames.at(cmxPair.index().getX()).toString()};
-         auto& target {geneNames.at(cmxPair.index().getY()).toString()};
-         float correlation {cmxPair.at(k, 0)};
-         QString interaction {"co"};
-         int numSamples {0};
-         int numMissing {0};
-         int numPostOutliers {0};
-         int numPreOutliers {0};
-         int numThreshold {0};
-
-         // exclude cluster if correlation is not within thresholds
-         if ( fabs(correlation) < _minCorrelation || _maxCorrelation < fabs(correlation) )
-         {
-            continue;
-         }
-
-         // if there are multiple clusters then use cluster data
-         if ( cmxPair.clusterSize() > 1 )
+         // compute summary statistics
+         for ( int i = 0; i < _ccm->sampleSize(); i++ )
          {
-            // compute summary statistics
-            for ( int i = 0; i < _ccm->sampleSize(); i++ )
+            switch ( _ccmPair.at(k, i) )
             {
-               switch ( ccmPair.at(k, i) )
-               {
-               case 1:
-                  numSamples++;
-                  break;
-               case 6:
-                  numThreshold++;
-                  break;
-               case 7:
-                  numPreOutliers++;
-                  break;
-               case 8:
-                  numPostOutliers++;
-                  break;
-               case 9:
-                  numMissing++;
-                  break;
-               }
-            }
-
-            // write sample mask to string
-            for ( int i = 0; i < _ccm->sampleSize(); i++ )
-            {
-               sampleMask[i] = '0' + ccmPair.at(k, i);
+            case 1:
+               numSamples++;
+               break;
+            case 6:
+               numThreshold++;
+               break;
+            case 7:
+               numPreOutliers++;
+               break;
+            case 8:
+               numPostOutliers++;
+               break;
+            case 9:
+               numMissing++;
+               break;
             }
          }
 
-         // otherwise use expression data
-         else
+         // write sample mask to string
+         for ( int i = 0; i < _ccm->sampleSize(); i++ )
          {
-            // read in gene expressions
-            ExpressionMatrix::Gene gene1(_emx);
-            ExpressionMatrix::Gene gene2(_emx);
+            sampleMask[i] = '0' + _ccmPair.at(k, i);
+         }
+      }
+
+      // otherwise use expression data
+      else
+      {
+         // read in gene expressions
+         ExpressionMatrix::Gene gene1(_emx);
+         ExpressionMatrix::Gene gene2(_emx);
 
-            gene1.read(cmxPair.index().getX());
-            gene2.read(cmxPair.index().getY());
+         gene1.read(_cmxPair.index().getX());
+         gene2.read(_cmxPair.index().getY());
 
-            // determine sample mask from expression data
-            for ( int i = 0; i < _emx->getSampleSize(); ++i )
+         // determine sample mask, summary statistics from expression data
+         for ( int i = 0; i < _emx->sampleSize(); ++i )
+         {
+            if ( isnan(gene1.at(i)) || isnan(gene2.at(i)) )
             {
-               if ( isnan(gene1.at(i)) || isnan(gene2.at(i)) )
-               {
-                  sampleMask[i] = '9';
-               }
-               else
-               {
-                  sampleMask[i] = '1';
-               }
+               sampleMask[i] = '9';
+               numMissing++;
+            }
+            else
+            {
+               sampleMask[i] = '1';
+               numSamples++;
             }
          }
-
-         // write cluster to output file
-         stream
-            << source
-            << "\t" << target
-            << "\t" << correlation
-            << "\t" << interaction
-            << "\t" << k
-            << "\t" << cmxPair.clusterSize()
-            << "\t" << numSamples
-            << "\t" << numMissing
-            << "\t" << numPostOutliers
-            << "\t" << numPreOutliers
-            << "\t" << numThreshold
-            << "\t" << sampleMask
-            << "\n";
       }
+
+      // write cluster to output file
+      _stream
+         << source
+         << "\t" << target
+         << "\t" << correlation
+         << "\t" << interaction
+         << "\t" << k
+         << "\t" << _cmxPair.clusterSize()
+         << "\t" << numSamples
+         << "\t" << numMissing
+         << "\t" << numPostOutliers
+         << "\t" << numPreOutliers
+         << "\t" << numThreshold
+         << "\t" << sampleMask
+         << "\n";
    }
 
    // make sure writing output file worked
-   if ( stream.status() != QTextStream::Ok )
+   if ( _stream.status() != QTextStream::Ok )
    {
       E_MAKE_EXCEPTION(e);
       e.setTitle(tr("File IO Error"));
       e.setDetails(tr("Qt Text Stream encountered an unknown error."));
       throw e;
    }
+}
 
-   // reset gene pair iterator
-   cmxPair.reset();
 
-   // create text stream to graphml file and write until end reached
-   stream.setDevice(_graphml);
 
-   // write header to file
-   stream
-      << "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
-      << "<graphml xmlns=\"http://graphml.graphdrawing.org/xmlns\"\n"
-      << "    xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n"
-      << "    xsi:schemaLocation=\"http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd\">\n"
-      << "  <graph id=\"G\" edgedefault=\"undirected\">\n";
-
-   // write each node to file
-   for ( int i = 0; i < _cmx->geneSize(); i++ )
+
+
+
+/*!
+ * Write the next pair using the GraphML format.
+ *
+ * @param index
+ */
+void Extract::writeGraphMLFormat(int index)
+{
+   EDEBUG_FUNC(this);
+
+   // get gene names
+   EMetaArray geneNames {_cmx->geneNames()};
+
+   // initialize workspace
+   QString sampleMask(_ccm->sampleSize(), '0');
+
+   if ( index == 0 )
    {
-      auto& id {geneNames.at(i).toString()};
+      // write header to file
+      _stream
+         << "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
+         << "<graphml xmlns=\"http://graphml.graphdrawing.org/xmlns\"\n"
+         << "    xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n"
+         << "    xsi:schemaLocation=\"http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd\">\n"
+         << "  <graph id=\"G\" edgedefault=\"undirected\">\n";
+
+      // write node list to file
+      for ( int i = 0; i < _cmx->geneSize(); i++ )
+      {
+         QString id {geneNames.at(i).toString()};
 
-      stream << "    <node id=\"" << id << "\"/>\n";
+         _stream << "    <node id=\"" << id << "\"/>\n";
+      }
+   }
+
+   // read next pair
+   _cmxPair.readNext();
+
+   if ( _cmxPair.clusterSize() > 1 )
+   {
+      _ccmPair.read(_cmxPair.index());
    }
 
-   // increment through all gene pairs
-   while ( cmxPair.hasNext() )
+   // write pairwise data to net file
+   for ( int k = 0; k < _cmxPair.clusterSize(); k++ )
    {
-      // read next gene pair
-      cmxPair.readNext();
+      QString source {geneNames.at(_cmxPair.index().getX()).toString()};
+      QString target {geneNames.at(_cmxPair.index().getY()).toString()};
+      float correlation {_cmxPair.at(k, 0)};
 
-      if ( cmxPair.clusterSize() > 1 )
+      // exclude edge if correlation is not within thresholds
+      if ( fabs(correlation) < _minCorrelation || _maxCorrelation < fabs(correlation) )
       {
-         ccmPair.read(cmxPair.index());
+         continue;
       }
 
-      // write gene pair edges to file
-      for ( int k = 0; k < cmxPair.clusterSize(); k++ )
+      // if there are multiple clusters then use cluster data
+      if ( _cmxPair.clusterSize() > 1 )
       {
-         auto& source {geneNames.at(cmxPair.index().getX()).toString()};
-         auto& target {geneNames.at(cmxPair.index().getY()).toString()};
-         float correlation {cmxPair.at(k, 0)};
-
-         // exclude edge if correlation is not within thresholds
-         if ( fabs(correlation) < _minCorrelation || _maxCorrelation < fabs(correlation) )
+         // write sample mask to string
+         for ( int i = 0; i < _ccm->sampleSize(); i++ )
          {
-            continue;
+            sampleMask[i] = '0' + _ccmPair.at(k, i);
          }
+      }
 
-         // if there are multiple clusters then use cluster data
-         if ( cmxPair.clusterSize() > 1 )
+      // otherwise use expression data
+      else
+      {
+         // read in gene expressions
+         ExpressionMatrix::Gene gene1(_emx);
+         ExpressionMatrix::Gene gene2(_emx);
+
+         gene1.read(_cmxPair.index().getX());
+         gene2.read(_cmxPair.index().getY());
+
+         // determine sample mask from expression data
+         for ( int i = 0; i < _emx->sampleSize(); ++i )
          {
-            // write sample mask to string
-            for ( int i = 0; i < _ccm->sampleSize(); i++ )
+            if ( isnan(gene1.at(i)) || isnan(gene2.at(i)) )
             {
-               sampleMask[i] = '0' + ccmPair.at(k, i);
+               sampleMask[i] = '9';
             }
-         }
-
-         // otherwise use expression data
-         else
-         {
-            // read in gene expressions
-            ExpressionMatrix::Gene gene1(_emx);
-            ExpressionMatrix::Gene gene2(_emx);
-
-            gene1.read(cmxPair.index().getX());
-            gene2.read(cmxPair.index().getY());
-
-            // determine sample mask from expression data
-            for ( int i = 0; i < _emx->getSampleSize(); ++i )
+            else
             {
-               if ( isnan(gene1.at(i)) || isnan(gene2.at(i)) )
-               {
-                  sampleMask[i] = '9';
-               }
-               else
-               {
-                  sampleMask[i] = '1';
-               }
+               sampleMask[i] = '1';
             }
          }
-
-         // write edge to file
-         stream
-            << "    <edge"
-            << " source=\"" << source << "\""
-            << " target=\"" << target << "\""
-            << " samples=\"" << sampleMask << "\""
-            << "/>\n";
       }
+
+      // write edge to file
+      _stream
+         << "    <edge"
+         << " source=\"" << source << "\""
+         << " target=\"" << target << "\""
+         << " samples=\"" << sampleMask << "\""
+         << "/>\n";
    }
 
    // write footer to file
-   stream
-      << "  </graph>\n"
-      << "</graphml>\n";
+   if ( index == size() - 1 )
+   {
+      _stream
+         << "  </graph>\n"
+         << "</graphml>\n";
+   }
 
-   // make sure writing graphml file worked
-   if ( stream.status() != QTextStream::Ok )
+   // make sure writing output file worked
+   if ( _stream.status() != QTextStream::Ok )
    {
       E_MAKE_EXCEPTION(e);
       e.setTitle(tr("File IO Error"));
@@ -279,8 +326,13 @@ void Extract::process(const EAbstractAnalytic::Block* result)
 
 
 
+/*!
+ * Make a new input object and return its pointer.
+ */
 EAbstractAnalytic::Input* Extract::makeInput()
 {
+   EDEBUG_FUNC(this);
+
    return new Input(this);
 }
 
@@ -289,8 +341,15 @@ EAbstractAnalytic::Input* Extract::makeInput()
 
 
 
+/*!
+ * Initialize this analytic. This implementation checks to make sure the input
+ * data objects and output file have been set.
+ */
 void Extract::initialize()
 {
+   EDEBUG_FUNC(this);
+
+   // make sure input/output arguments are valid
    if ( !_emx || !_ccm || !_cmx || !_output )
    {
       E_MAKE_EXCEPTION(e);
@@ -298,4 +357,12 @@ void Extract::initialize()
       e.setDetails(tr("Did not get valid input and/or output arguments."));
       throw e;
    }
+
+   // initialize pairwise iterators
+   _ccmPair = CCMatrix::Pair(_ccm);
+   _cmxPair = CorrelationMatrix::Pair(_cmx);
+
+   // initialize output file stream
+   _stream.setDevice(_output);
+   _stream.setRealNumberPrecision(8);
 }
diff --git a/src/core/extract.h b/src/core/extract.h
index d36e0b8..ed3f7e9 100644
--- a/src/core/extract.h
+++ b/src/core/extract.h
@@ -2,12 +2,23 @@
 #define EXTRACT_H
 #include <ace/core/core.h>
 
+#include "ccmatrix_pair.h"
 #include "ccmatrix.h"
+#include "correlationmatrix_pair.h"
 #include "correlationmatrix.h"
 #include "expressionmatrix.h"
 
 
 
+/*!
+ * This class implements the extract analytic. This analytic is very similar to
+ * the export correlation matrix analytic, except for a few differences: (1) this
+ * analytic uses a slightly different format for the text file, (2) this analytic
+ * can apply a correlation threshold, and (3) this analytic can optionally write
+ * a GraphML file. The key difference is that this analytic "extracts" a network
+ * from the correlation matrix and writes an edge list rather than a correlation
+ * list.
+ */
 class Extract : public EAbstractAnalytic
 {
    Q_OBJECT
@@ -18,12 +29,55 @@ class Extract : public EAbstractAnalytic
    virtual EAbstractAnalytic::Input* makeInput() override final;
    virtual void initialize();
 private:
+   /*!
+   * Defines the output formats this analytic supports.
+   */
+   enum class OutputFormat
+   {
+     /*!
+      * Text format
+      */
+     Text
+     /*!
+      * GraphML format
+      */
+     ,GraphML
+   };
+   void writeTextFormat(int index);
+   void writeGraphMLFormat(int index);
+   /**
+    * Workspace variables to write to the output file
+    */
+   QTextStream _stream;
+   CCMatrix::Pair _ccmPair;
+   CorrelationMatrix::Pair _cmxPair;
+   /*!
+    * Pointer to the input expression matrix.
+    */
    ExpressionMatrix* _emx {nullptr};
+   /*!
+    * Pointer to the input cluster matrix.
+    */
    CCMatrix* _ccm {nullptr};
+   /*!
+    * Pointer to the input correlation matrix.
+    */
    CorrelationMatrix* _cmx {nullptr};
+   /*!
+    * The output format to use.
+    */
+   OutputFormat _outputFormat {OutputFormat::Text};
+   /*!
+    * Pointer to the output text file.
+    */
    QFile* _output {nullptr};
-   QFile* _graphml {nullptr};
+   /*!
+    * The minimum (absolute) correlation threshold.
+    */
    float _minCorrelation {0.85};
+   /*!
+    * The maximum (absolute) correlation threshold.
+    */
    float _maxCorrelation {1.00};
 };
 
diff --git a/src/core/extract_input.cpp b/src/core/extract_input.cpp
index 8be4809..5d264de 100644
--- a/src/core/extract_input.cpp
+++ b/src/core/extract_input.cpp
@@ -3,25 +3,46 @@
 
 
 
-using namespace std;
+/*!
+ * String list of output formats for this analytic that correspond exactly
+ * to its enumeration. Used for handling the output format argument for this
+ * input object.
+ */
+const QStringList Extract::Input::FORMAT_NAMES
+{
+   "text"
+   ,"graphml"
+};
 
 
 
 
 
 
+/*!
+ * Construct a new input object with the given analytic as its parent.
+ *
+ * @param parent
+ */
 Extract::Input::Input(Extract* parent):
    EAbstractAnalytic::Input(parent),
    _base(parent)
-{}
+{
+   EDEBUG_FUNC(this,parent);
+}
 
 
 
 
 
 
+/*!
+ * Return the total number of arguments this analytic type contains.
+ */
 int Extract::Input::size() const
 {
+   EDEBUG_FUNC(this);
+
    return Total;
 }
 
@@ -30,15 +51,22 @@ int Extract::Input::size() const
 
 
 
+/*!
+ * Return the argument type for a given index.
+ *
+ * @param index
+ */
 EAbstractAnalytic::Input::Type Extract::Input::type(int index) const
 {
+   EDEBUG_FUNC(this,index);
+
    switch (index)
    {
    case ExpressionData: return Type::DataIn;
    case ClusterData: return Type::DataIn;
    case CorrelationData: return Type::DataIn;
+   case OutputFormatArg: return Type::Selection;
    case OutputFile: return Type::FileOut;
-   case GraphMLFile: return Type::FileOut;
    case MinCorrelation: return Type::Double;
    case MaxCorrelation: return Type::Double;
    default: return Type::Boolean;
@@ -50,8 +78,16 @@ EAbstractAnalytic::Input::Type Extract::Input::type(int index) const
 
 
 
+/*!
+ * Return data for a given role on an argument with the given index.
+ *
+ * @param index
+ * @param role
+ */
 QVariant Extract::Input::data(int index, Role role) const
 {
+   EDEBUG_FUNC(this,index,role);
+
    switch (index)
    {
    case ExpressionData:
@@ -81,22 +117,22 @@ QVariant Extract::Input::data(int index, Role role) const
       case Role::DataType: return DataFactory::CorrelationMatrixType;
       default: return QVariant();
       }
-   case OutputFile:
+   case OutputFormatArg:
       switch (role)
       {
-      case Role::CommandLineName: return QString("output");
-      case Role::Title: return tr("Output File:");
-      case Role::WhatsThis: return tr("Output text file that will contain network edges.");
-      case Role::FileFilters: return tr("Text file %1").arg("(*.txt)");
+      case Role::CommandLineName: return QString("format");
+      case Role::Title: return tr("Output Format:");
+      case Role::WhatsThis: return tr("Format to use for the output file.");
+      case Role::SelectionValues: return FORMAT_NAMES;
+      case Role::Default: return "text";
       default: return QVariant();
       }
-   case GraphMLFile:
+   case OutputFile:
       switch (role)
       {
-      case Role::CommandLineName: return QString("graphml");
-      case Role::Title: return tr("GraphML File:");
-      case Role::WhatsThis: return tr("Output text file that will contain network in GraphML format.");
-      case Role::FileFilters: return tr("GraphML file %1").arg("(*.graphml)");
+      case Role::CommandLineName: return QString("output");
+      case Role::Title: return tr("Output File:");
+      case Role::WhatsThis: return tr("Output file that will contain network in the specified format.");
       default: return QVariant();
       }
    case MinCorrelation:
@@ -130,10 +166,21 @@ QVariant Extract::Input::data(int index, Role role) const
 
 
 
+/*!
+ * Set an argument with the given index to the given value.
+ *
+ * @param index
+ * @param value
+ */
 void Extract::Input::set(int index, const QVariant& value)
 {
+   EDEBUG_FUNC(this,index,&value);
+
    switch (index)
    {
+   case OutputFormatArg:
+      _base->_outputFormat = static_cast<OutputFormat>(FORMAT_NAMES.indexOf(value.toString()));
+      break;
    case MinCorrelation:
       _base->_minCorrelation = value.toFloat();
       break;
@@ -148,8 +195,16 @@ void Extract::Input::set(int index, const QVariant& value)
 
 
 
+/*!
+ * Set a data argument with the given index to the given data object pointer.
+ *
+ * @param index
+ * @param data
+ */
 void Extract::Input::set(int index, EAbstractData* data)
 {
+   EDEBUG_FUNC(this,index,data);
+
    if ( index == ExpressionData )
    {
       _base->_emx = data->cast<ExpressionMatrix>();
@@ -169,14 +224,18 @@ void Extract::Input::set(int index, EAbstractData* data)
 
 
 
+/*!
+ * Set a file argument with the given index to the given qt file pointer.
+ *
+ * @param index
+ * @param file
+ */
 void Extract::Input::set(int index, QFile* file)
 {
+   EDEBUG_FUNC(this,index,file);
+
    if ( index == OutputFile )
    {
       _base->_output = file;
    }
-   else if ( index == GraphMLFile )
-   {
-      _base->_graphml = file;
-   }
 }
diff --git a/src/core/extract_input.h b/src/core/extract_input.h
index 45803cd..eefaf0f 100644
--- a/src/core/extract_input.h
+++ b/src/core/extract_input.h
@@ -4,17 +4,23 @@
 
 
 
+/*!
+ * This class implements the abstract input of the extract analytic.
+ */
 class Extract::Input : public EAbstractAnalytic::Input
 {
    Q_OBJECT
 public:
+   /*!
+    * Defines all input arguments for this analytic.
+    */
    enum Argument
    {
       ExpressionData = 0
       ,ClusterData
       ,CorrelationData
+      ,OutputFormatArg
       ,OutputFile
-      ,GraphMLFile
       ,MinCorrelation
       ,MaxCorrelation
       ,Total
@@ -27,6 +33,10 @@ class Extract::Input : public EAbstractAnalytic::Input
    virtual void set(int index, EAbstractData* data) override final;
    virtual void set(int index, QFile* file) override final;
 private:
+   static const QStringList FORMAT_NAMES;
+   /*!
+    * Pointer to the base analytic for this object.
+    */
    Extract* _base;
 };
 
diff --git a/src/core/importcorrelationmatrix.cpp b/src/core/importcorrelationmatrix.cpp
index d905090..e437a88 100644
--- a/src/core/importcorrelationmatrix.cpp
+++ b/src/core/importcorrelationmatrix.cpp
@@ -1,16 +1,22 @@
 #include "importcorrelationmatrix.h"
 #include "importcorrelationmatrix_input.h"
 #include "datafactory.h"
-#include "pairwise_index.h"
 
 
 
 
 
 
+/*!
+ * Return the total number of blocks this analytic must process as steps
+ * or blocks of work. This implementation uses a work block for each line
+ * of the input file.
+ */
 int ImportCorrelationMatrix::size() const
 {
-   return 1;
+   EDEBUG_FUNC(this);
+
+   return _numLines;
 }
 
 
@@ -19,132 +25,106 @@ int ImportCorrelationMatrix::size() const
 
 
 
+/*!
+ * Process the given index with a possible block of results if this analytic
+ * produces work blocks. This implementation uses only the index of the result
+ * block to determine which piece of work to do.
+ *
+ * @param result
+ */
 void ImportCorrelationMatrix::process(const EAbstractAnalytic::Block* result)
 {
-   Q_UNUSED(result);
+   EDEBUG_FUNC(this,result);
 
-   // build gene name metadata
-   EMetaArray metaGeneNames;
-   for ( int i = 0; i < _geneSize; ++i )
-   {
-      metaGeneNames.append(QString::number(i));
-   }
+   // read a line from input file
+   QString line = _stream.readLine();
+   auto words = line.splitRef(QRegExp("\\s+"), QString::SkipEmptyParts);
 
-   // build sample name metadata
-   EMetaArray metaSampleNames;
-   for ( int i = 0; i < _sampleSize; ++i )
+   // make sure the line is valid
+   if ( words.size() == 11 )
    {
-      metaSampleNames.append(QString::number(i));
-   }
-
-   // build correlation name metadata
-   EMetaArray metaCorrelationNames;
-   metaCorrelationNames.append(_correlationName);
+      int geneX = words[0].toInt();
+      int geneY = words[1].toInt();
+      float correlation = words[9].toFloat();
+      QStringRef sampleMask = words[10];
 
-   // initialize output data
-   _ccm->initialize(metaGeneNames, _maxClusterSize, metaSampleNames);
-   _cmx->initialize(metaGeneNames, _maxClusterSize, metaCorrelationNames);
-
-   Pairwise::Index index;
-   CCMatrix::Pair ccmPair(_ccm);
-   CorrelationMatrix::Pair cmxPair(_cmx);
+      // make sure sample mask has correct length
+      if ( sampleMask.size() != _sampleSize )
+      {
+         E_MAKE_EXCEPTION(e);
+         e.setTitle(tr("Parsing Error"));
+         e.setDetails(tr("Encountered sample mask with invalid length %1. Sample size is %2.")
+            .arg(sampleMask.size())
+            .arg(_sampleSize));
+         throw e;
+      }
 
-   // create text stream from input file and read until end reached
-   QTextStream stream(_input);
-   while ( !stream.atEnd() )
-   {
-      // read a line from text file
-      QString line = stream.readLine();
-      auto words = line.splitRef(QRegExp("\\s+"), QString::SkipEmptyParts);
+      // save previous pair when new pair is read
+      Pairwise::Index nextIndex(geneX, geneY);
 
-      // make sure the line is valid
-      if ( words.size() == 11 )
+      if ( _index != nextIndex )
       {
-         int geneX = words[0].toInt();
-         int geneY = words[1].toInt();
-         float correlation = words[9].toFloat();
-         QStringRef sampleMask = words[10];
-
-         // make sure sample mask has correct length
-         if ( sampleMask.size() != _sampleSize )
+         // save pairs
+         if ( _ccmPair.clusterSize() > 1 )
          {
-            E_MAKE_EXCEPTION(e);
-            e.setTitle(tr("Parsing Error"));
-            e.setDetails(tr("Encountered sample mask with invalid length %1. "
-                            "Sample size is %2.")
-                         .arg(sampleMask.size()).arg(_sampleSize));
-            throw e;
+            _ccmPair.write(_index);
          }
 
-         // save previous pair when new pair is read
-         Pairwise::Index nextIndex(geneX, geneY);
-
-         if ( index != nextIndex )
+         if ( _cmxPair.clusterSize() > 0 )
          {
-            // save pairs
-            if ( ccmPair.clusterSize() > 1 )
-            {
-               ccmPair.write(index);
-            }
-
-            if ( cmxPair.clusterSize() > 0 )
-            {
-               cmxPair.write(index);
-            }
-
-            // reset pairs
-            ccmPair.clearClusters();
-            cmxPair.clearClusters();
-
-            // update index
-            index = nextIndex;
+            _cmxPair.write(_index);
          }
 
-         // append data to ccm pair and cmx pair
-         int cluster = ccmPair.clusterSize();
+         // reset pairs
+         _ccmPair.clearClusters();
+         _cmxPair.clearClusters();
 
-         ccmPair.addCluster();
-         cmxPair.addCluster();
+         // update index
+         _index = nextIndex;
+      }
 
-         for ( int i = 0; i < sampleMask.size(); ++i )
-         {
-            ccmPair.at(cluster, i) = sampleMask[i].digitValue();
-         }
+      // append data to ccm pair and cmx pair
+      int cluster = _ccmPair.clusterSize();
 
-         cmxPair.at(cluster, 0) = correlation;
-      }
+      _ccmPair.addCluster();
+      _cmxPair.addCluster();
 
-      // save last pair
-      if ( ccmPair.clusterSize() > 1 )
+      for ( int i = 0; i < sampleMask.size(); ++i )
       {
-         ccmPair.write(index);
+         _ccmPair.at(cluster, i) = sampleMask[i].digitValue();
       }
 
-      if ( cmxPair.clusterSize() > 0 )
-      {
-         cmxPair.write(index);
-      }
+      _cmxPair.at(cluster, 0) = correlation;
+   }
+
+   // otherwise throw an error
+   else
+   {
+      E_MAKE_EXCEPTION(e);
+      e.setTitle(tr("Parsing Error"));
+      e.setDetails(tr("Encountered line with incorrect amount of fields. "
+                      "Read %1 fields when there should have been %2.")
+         .arg(words.size())
+         .arg(11));
+      throw e;
+   }
 
-      // skip empty lines and lines with '#' markers
-      else if ( words.size() != 1 && words.size() != 0 )
+   // save last pair
+   if ( result->index() == _numLines - 1 )
+   {
+      if ( _ccmPair.clusterSize() > 1 )
       {
-         continue;
+         _ccmPair.write(_index);
       }
 
-      // otherwise throw an error
-      else
+      if ( _cmxPair.clusterSize() > 0 )
       {
-         E_MAKE_EXCEPTION(e);
-         e.setTitle(tr("Parsing Error"));
-         e.setDetails(tr("Encountered line with incorrect amount of fields. "
-                         "Read %1 fields when there should have been %2.")
-                      .arg(words.size()).arg(11));
-         throw e;
+         _cmxPair.write(_index);
       }
    }
 
    // make sure reading input file worked
-   if ( stream.status() != QTextStream::Ok )
+   if ( _stream.status() != QTextStream::Ok )
    {
       E_MAKE_EXCEPTION(e);
       e.setTitle(tr("File IO Error"));
@@ -158,8 +138,13 @@ void ImportCorrelationMatrix::process(const EAbstractAnalytic::Block* result)
 
 
 
+/*!
+ * Make a new input object and return its pointer.
+ */
 EAbstractAnalytic::Input* ImportCorrelationMatrix::makeInput()
 {
+   EDEBUG_FUNC(this);
+
    return new Input(this);
 }
 
@@ -168,8 +153,16 @@ EAbstractAnalytic::Input* ImportCorrelationMatrix::makeInput()
 
 
 
+/*!
+ * Initialize this analytic. This implementation checks to make sure the input
+ * file and output data objects have been set, and that a correlation name was
+ * provided.
+ */
 void ImportCorrelationMatrix::initialize()
 {
+   EDEBUG_FUNC(this);
+
+   // make sure input/output arguments are valid
    if ( !_input || !_ccm || !_cmx )
    {
       E_MAKE_EXCEPTION(e);
@@ -178,6 +171,7 @@ void ImportCorrelationMatrix::initialize()
       throw e;
    }
 
+   // make sure correlation name is valid
    if ( _correlationName.isEmpty() )
    {
       E_MAKE_EXCEPTION(e);
@@ -185,4 +179,54 @@ void ImportCorrelationMatrix::initialize()
       e.setDetails(tr("Correlation name is required."));
       throw e;
    }
+
+   // initialize input file stream
+   _stream.setDevice(_input);
+
+   // count the number of lines in the input file
+   _numLines = 0;
+
+   while ( !_stream.atEnd() )
+   {
+      _stream.readLine();
+      _numLines++;
+   }
+
+   // return stream to beginning of the input file
+   _stream.seek(0);
+
+   // make sure reading input file worked
+   if ( _stream.status() != QTextStream::Ok )
+   {
+      E_MAKE_EXCEPTION(e);
+      e.setTitle(tr("File IO Error"));
+      e.setDetails(tr("Qt Text Stream encountered an unknown error."));
+      throw e;
+   }
+
+   // build gene name metadata
+   EMetaArray metaGeneNames;
+   for ( int i = 0; i < _geneSize; ++i )
+   {
+      metaGeneNames.append(QString::number(i));
+   }
+
+   // build sample name metadata
+   EMetaArray metaSampleNames;
+   for ( int i = 0; i < _sampleSize; ++i )
+   {
+      metaSampleNames.append(QString::number(i));
+   }
+
+   // build correlation name metadata
+   EMetaArray metaCorrelationNames;
+   metaCorrelationNames.append(_correlationName);
+
+   // initialize output data
+   _ccm->initialize(metaGeneNames, _maxClusterSize, metaSampleNames);
+   _cmx->initialize(metaGeneNames, _maxClusterSize, metaCorrelationNames);
+
+   // initialize pairwise iterators
+   _ccmPair = CCMatrix::Pair(_ccm);
+   _cmxPair = CorrelationMatrix::Pair(_cmx);
 }
diff --git a/src/core/importcorrelationmatrix.h b/src/core/importcorrelationmatrix.h
index c62aaf2..271ac52 100644
--- a/src/core/importcorrelationmatrix.h
+++ b/src/core/importcorrelationmatrix.h
@@ -2,11 +2,24 @@
 #define IMPORTCORRELATIONMATRIX_H
 #include <ace/core/core.h>
 
+#include "ccmatrix_pair.h"
 #include "ccmatrix.h"
+#include "correlationmatrix_pair.h"
 #include "correlationmatrix.h"
 
 
 
+/*!
+ * This class implements the import correlation matrix analytic. This analytic
+ * reads in a text file of correlations, where each line is a correlation that
+ * includes the pairwise index, correlation value, and sample mask, as well as
+ * several other fields which are not used. This analytic produces two data
+ * objects: a correlation matrix containing the pairwise correlations, and a
+ * cluster matrix containing the sample masks for each pairwise cluster. There
+ * are several fields which are not represented in the text file and therefore
+ * must be specified manually, including the gene size, sample size, max cluster
+ * size, and correlation name.
+ */
 class ImportCorrelationMatrix : public EAbstractAnalytic
 {
    Q_OBJECT
@@ -17,12 +30,42 @@ class ImportCorrelationMatrix : public EAbstractAnalytic
    virtual EAbstractAnalytic::Input* makeInput() override final;
    virtual void initialize();
 private:
+   /**
+    * Workspace variables to read from the input file.
+    */
+   QTextStream _stream;
+   int _numLines {0};
+   Pairwise::Index _index {0};
+   CCMatrix::Pair _ccmPair;
+   CorrelationMatrix::Pair _cmxPair;
+   /*!
+    * Pointer to the input text file.
+    */
    QFile* _input {nullptr};
+   /*!
+    * Pointer to the output cluster matrix.
+    */
    CCMatrix* _ccm {nullptr};
+   /*!
+    * Pointer to the output correlation matrix.
+    */
    CorrelationMatrix* _cmx {nullptr};
+   /*!
+    * The number of genes in the correlation matrix.
+    */
    qint32 _geneSize {0};
+   /*!
+    * The maximum number of clusters allowed in a single pair of the
+    * correlation matrix.
+    */
    qint32 _maxClusterSize {1};
+   /*!
+    * The number of samples in the sample masks of the cluster matrix.
+    */
    qint32 _sampleSize {0};
+   /*!
+    * The name of the correlation used in the correlation matrix.
+    */
    QString _correlationName;
 };
 
diff --git a/src/core/importcorrelationmatrix_input.cpp b/src/core/importcorrelationmatrix_input.cpp
index 1601bf4..985ab9e 100644
--- a/src/core/importcorrelationmatrix_input.cpp
+++ b/src/core/importcorrelationmatrix_input.cpp
@@ -4,18 +4,30 @@
 
 
 
+/*!
+ * Construct a new input object with the given analytic as its parent.
+ *
+ * @param parent
+ */
 ImportCorrelationMatrix::Input::Input(ImportCorrelationMatrix* parent):
    EAbstractAnalytic::Input(parent),
    _base(parent)
-{}
+{
+   EDEBUG_FUNC(this,parent);
+}
 
 
 
 
 
 
+/*!
+ * Return the total number of arguments this analytic type contains.
+ */
 int ImportCorrelationMatrix::Input::size() const
 {
+   EDEBUG_FUNC(this);
+
    return Total;
 }
 
@@ -24,8 +36,15 @@ int ImportCorrelationMatrix::Input::size() const
 
 
 
+/*!
+ * Return the argument type for a given index.
+ *
+ * @param index
+ */
 EAbstractAnalytic::Input::Type ImportCorrelationMatrix::Input::type(int index) const
 {
+   EDEBUG_FUNC(this,index);
+
    switch (index)
    {
    case InputFile: return Type::FileIn;
@@ -44,8 +63,16 @@ EAbstractAnalytic::Input::Type ImportCorrelationMatrix::Input::type(int index) c
 
 
 
+/*!
+ * Return data for a given role on an argument with the given index.
+ *
+ * @param index
+ * @param role
+ */
 QVariant ImportCorrelationMatrix::Input::data(int index, Role role) const
 {
+   EDEBUG_FUNC(this,index,role);
+
    switch (index)
    {
    case InputFile:
@@ -122,8 +149,16 @@ QVariant ImportCorrelationMatrix::Input::data(int index, Role role) const
 
 
 
+/*!
+ * Set an argument with the given index to the given value.
+ *
+ * @param index
+ * @param value
+ */
 void ImportCorrelationMatrix::Input::set(int index, const QVariant& value)
 {
+   EDEBUG_FUNC(this,index,&value);
+
    switch (index)
    {
    case GeneSize:
@@ -146,8 +181,16 @@ void ImportCorrelationMatrix::Input::set(int index, const QVariant& value)
 
 
 
+/*!
+ * Set a file argument with the given index to the given qt file pointer.
+ *
+ * @param index
+ * @param file
+ */
 void ImportCorrelationMatrix::Input::set(int index, QFile* file)
 {
+   EDEBUG_FUNC(this,index,file);
+
    if ( index == InputFile )
    {
       _base->_input = file;
@@ -159,8 +202,16 @@ void ImportCorrelationMatrix::Input::set(int index, QFile* file)
 
 
 
+/*!
+ * Set a data argument with the given index to the given data object pointer.
+ *
+ * @param index
+ * @param data
+ */
 void ImportCorrelationMatrix::Input::set(int index, EAbstractData* data)
 {
+   EDEBUG_FUNC(this,index,data);
+
    if ( index == ClusterData )
    {
       _base->_ccm = data->cast<CCMatrix>();
diff --git a/src/core/importcorrelationmatrix_input.h b/src/core/importcorrelationmatrix_input.h
index cfd2ed3..0ae567d 100644
--- a/src/core/importcorrelationmatrix_input.h
+++ b/src/core/importcorrelationmatrix_input.h
@@ -4,10 +4,16 @@
 
 
 
+/*!
+ * This class implements the abstract input of the import correlation matrix analytic.
+ */
 class ImportCorrelationMatrix::Input : public EAbstractAnalytic::Input
 {
    Q_OBJECT
 public:
+   /*!
+    * Defines all input arguments for this analytic.
+    */
    enum Argument
    {
       InputFile = 0
@@ -27,6 +33,9 @@ class ImportCorrelationMatrix::Input : public EAbstractAnalytic::Input
    virtual void set(int index, QFile* file) override final;
    virtual void set(int index, EAbstractData* data) override final;
 private:
+   /*!
+    * Pointer to the base analytic for this object.
+    */
    ImportCorrelationMatrix* _base;
 };
 
diff --git a/src/core/importexpressionmatrix.cpp b/src/core/importexpressionmatrix.cpp
index 3c6d30c..3472ea2 100644
--- a/src/core/importexpressionmatrix.cpp
+++ b/src/core/importexpressionmatrix.cpp
@@ -1,15 +1,24 @@
 #include "importexpressionmatrix.h"
 #include "importexpressionmatrix_input.h"
 #include "datafactory.h"
+#include "expressionmatrix_gene.h"
 
 
 
 
 
 
+/*!
+ * Return the total number of blocks this analytic must process as steps
+ * or blocks of work. This implementation uses a work block for reading
+ * each line of the input file, plus one work block to create the output
+ * data object.
+ */
 int ImportExpressionMatrix::size() const
 {
-   return 1;
+   EDEBUG_FUNC(this);
+
+   return _numLines + 1;
 }
 
 
@@ -17,74 +26,77 @@ int ImportExpressionMatrix::size() const
 
 
 
+/*!
+ * Process the given index with a possible block of results if this analytic
+ * produces work blocks. This implementation uses only the index of the result
+ * block to determine which piece of work to do.
+ *
+ * @param result
+ */
 void ImportExpressionMatrix::process(const EAbstractAnalytic::Block* result)
 {
-   Q_UNUSED(result);
-
-   // use expression declaration
-   using Expression = ExpressionMatrix::Expression;
+   EDEBUG_FUNC(this, result);
 
-   // structure for building list of genes
-   struct Gene
+   // read or create the sample names in the first step
+   if ( result->index() == 0 )
    {
-      Gene(int size)
-      {
-         expressions = new Expression[size];
-      }
-      ~Gene()
+      // seek to the beginning of the input file
+      _stream.seek(0);
+
+      // if sample size is not zero then build sample name list
+      if ( _sampleSize != 0 )
       {
-         delete[] expressions;
+         for (int i = 0; i < _sampleSize ;++i)
+         {
+            _sampleNames.append(QString::number(i));
+         }
       }
 
-      Expression* expressions;
-   };
+      // otherwise read sample names from first line
+      else
+      {
+         // read a line from the input file
+         QString line = _stream.readLine();
+         auto words = line.splitRef(QRegExp("\\s+"), QString::SkipEmptyParts);
 
-   // initialize gene expression linked list
-   QList<Gene *> genes;
+         // parse the sample names
+         _sampleSize = words.size();
 
-   // initialize gene and sample name lists
-   QStringList geneNames;
-   QStringList sampleNames;
+         for ( auto& word : words )
+         {
+            _sampleNames.append(word.toString());
+         }
 
-   // if sample size is not zero then build sample name list
-   if ( _sampleSize != 0 )
-   {
-      for (int i = 0; i < _sampleSize ;++i)
-      {
-         sampleNames.append(QString::number(i));
+         // make sure reading input file worked
+         if ( _stream.status() != QTextStream::Ok )
+         {
+            E_MAKE_EXCEPTION(e);
+            e.setTitle(tr("File IO Error"));
+            e.setDetails(tr("Qt Text Stream encountered an unknown error."));
+            throw e;
+         }
       }
    }
 
-   // create text stream from input file and read until end reached
-   QTextStream stream(_input);
-   while ( !stream.atEnd() )
+   // read each gene from the input file in a separate step
+   else if ( result->index() < _numLines )
    {
-      // read a line from text file
-      QString line = stream.readLine();
+      // read a line from the input file
+      QString line = _stream.readLine();
       auto words = line.splitRef(QRegExp("\\s+"), QString::SkipEmptyParts);
 
-      // read sample names from first line
-      if ( _sampleSize == 0 )
-      {
-         _sampleSize = words.size();
-         for ( auto& word : words )
-         {
-            sampleNames.append(word.toString());
-         }
-      }
-
       // make sure the number of words matches expected sample size
-      else if ( words.size() == _sampleSize + 1 )
+      if ( words.size() == _sampleSize + 1 )
       {
          // read row from text file into gene
-         Gene* gene {new Gene(_sampleSize)};
+         Gene gene(_sampleSize);
 
          for ( int i = 1; i < words.size(); ++i )
          {
-            // if word matches no sample token string set it as such
-            if ( words.at(i) == _noSampleToken )
+            // if word matches the nan token then set it as such
+            if ( words.at(i) == _nanToken )
             {
-               gene->expressions[i-1] = NAN;
+               gene.expressions[i-1] = NAN;
             }
 
             // else this is a normal floating point expression
@@ -92,7 +104,7 @@ void ImportExpressionMatrix::process(const EAbstractAnalytic::Block* result)
             {
                // read in the floating point value
                bool ok;
-               Expression value = words.at(i).toDouble(&ok);
+               gene.expressions[i-1] = words.at(i).toDouble(&ok);
 
                // make sure reading worked
                if ( !ok )
@@ -100,32 +112,16 @@ void ImportExpressionMatrix::process(const EAbstractAnalytic::Block* result)
                   E_MAKE_EXCEPTION(e);
                   e.setTitle(tr("Parsing Error"));
                   e.setDetails(tr("Failed to read expression value \"%1\" for gene %2.")
-                               .arg(words.at(i).toString()).arg(words.at(0).toString()));
+                     .arg(words.at(i).toString())
+                     .arg(words.at(0).toString()));
                   throw e;
                }
-
-               // apply transform and append expression to gene
-               switch (_transform)
-               {
-               case Transform::None:
-                  gene->expressions[i-1] = value;
-                  break;
-               case Transform::NLog:
-                  gene->expressions[i-1] = log(value);
-                  break;
-               case Transform::Log2:
-                  gene->expressions[i-1] = log2(value);
-                  break;
-               case Transform::Log10:
-                  gene->expressions[i-1] = log10(value);
-                  break;
-               }
             }
          }
 
          // append gene data and gene name
-         genes.append(gene);
-         geneNames.append(words.at(0).toString());
+         _genes.append(gene);
+         _geneNames.append(words.at(0).toString());
       }
 
       // otherwise throw an error
@@ -135,38 +131,42 @@ void ImportExpressionMatrix::process(const EAbstractAnalytic::Block* result)
          e.setTitle(tr("Parsing Error"));
          e.setDetails(tr("Encountered gene expression line with incorrect amount of fields. "
                          "Read in %1 fields when it should have been %2. Gene name is %3.")
-                      .arg(words.size()-1).arg(_sampleSize).arg(words.at(0).toString()));
+            .arg(words.size()-1)
+            .arg(_sampleSize)
+            .arg(words.at(0).toString()));
+         throw e;
+      }
+
+      // make sure reading input file worked
+      if ( _stream.status() != QTextStream::Ok )
+      {
+         E_MAKE_EXCEPTION(e);
+         e.setTitle(tr("File IO Error"));
+         e.setDetails(tr("Qt Text Stream encountered an unknown error."));
          throw e;
       }
    }
 
-   // make sure reading input file worked
-   if ( stream.status() != QTextStream::Ok )
+   // create the output data object in the final step
+   else if ( result->index() == _numLines )
    {
-      E_MAKE_EXCEPTION(e);
-      e.setTitle(tr("File IO Error"));
-      e.setDetails(tr("Qt Text Stream encountered an unknown error."));
-      throw e;
-   }
+      // initialize expression matrix
+      _output->initialize(_geneNames, _sampleNames);
 
-   // initialize expression matrix
-   _output->initialize(geneNames, sampleNames);
+      // iterate through each gene
+      ExpressionMatrix::Gene gene(_output);
 
-   // iterate through each gene
-   ExpressionMatrix::Gene gene(_output);
-   for ( int i = 0; i < _output->getGeneSize(); ++i )
-   {
-      // save each gene to expression matrix
-      for ( int j = 0; j < _output->getSampleSize(); ++j )
+      for ( int i = 0; i < _output->geneSize(); ++i )
       {
-         gene[j] = genes[i]->expressions[j];
-      }
+         // save each gene to expression matrix
+         for ( int j = 0; j < _output->sampleSize(); ++j )
+         {
+            gene[j] = _genes[i].expressions[j];
+         }
 
-      gene.write(i);
+         gene.write(i);
+      }
    }
-
-   // set transform used in expression matrix
-   _output->setTransform(_transform);
 }
 
 
@@ -174,8 +174,13 @@ void ImportExpressionMatrix::process(const EAbstractAnalytic::Block* result)
 
 
 
+/*!
+ * Make a new input object and return its pointer.
+ */
 EAbstractAnalytic::Input* ImportExpressionMatrix::makeInput()
 {
+   EDEBUG_FUNC(this);
+
    return new Input(this);
 }
 
@@ -184,8 +189,15 @@ EAbstractAnalytic::Input* ImportExpressionMatrix::makeInput()
 
 
 
+/*!
+ * Initialize this analytic. This implementation checks to make sure the input
+ * file and output data object have been set.
+ */
 void ImportExpressionMatrix::initialize()
 {
+   EDEBUG_FUNC(this);
+
+   // make sure input/output arguments are valid
    if ( !_input || !_output )
    {
       E_MAKE_EXCEPTION(e);
@@ -193,4 +205,25 @@ void ImportExpressionMatrix::initialize()
       e.setDetails(tr("Did not get valid input and/or output arguments."));
       throw e;
    }
+
+   // initialize input file stream
+   _stream.setDevice(_input);
+
+   // count the number of lines in the input file
+   _numLines = 0;
+
+   while ( !_stream.atEnd() )
+   {
+      _stream.readLine();
+      _numLines++;
+   }
+
+   // make sure reading input file worked
+   if ( _stream.status() != QTextStream::Ok )
+   {
+      E_MAKE_EXCEPTION(e);
+      e.setTitle(tr("File IO Error"));
+      e.setDetails(tr("Qt Text Stream encountered an unknown error."));
+      throw e;
+   }
 }
diff --git a/src/core/importexpressionmatrix.h b/src/core/importexpressionmatrix.h
index 0b8e34f..b368635 100644
--- a/src/core/importexpressionmatrix.h
+++ b/src/core/importexpressionmatrix.h
@@ -6,6 +6,15 @@
 
 
 
+/*!
+ * This class implements the import expression matrix analytic. This analytic
+ * reads in a text file which contains a matrix as a table; that is, with each row
+ * on a line, each value separated by whitespace, and the first row and column
+ * containing the row names and column names, respectively. Elements which have
+ * the given NAN token are read in as NAN. If the sample names are not in the
+ * input file, the user must provide the number of samples to the analytic, and
+ * the samples will be given integer names.
+ */
 class ImportExpressionMatrix : public EAbstractAnalytic
 {
    Q_OBJECT
@@ -16,12 +25,43 @@ class ImportExpressionMatrix : public EAbstractAnalytic
    virtual EAbstractAnalytic::Input* makeInput() override final;
    virtual void initialize();
 private:
-   using Transform = ExpressionMatrix::Transform;
+   /**
+    * Structure used to load gene expression data
+    */
+   struct Gene
+   {
+      Gene() = default;
+      Gene(int size)
+      {
+         expressions.resize(size);
+      }
+
+      QVector<float> expressions;
+   };
+   /**
+    * Workspace variables to read from the input file.
+    */
+   QTextStream _stream;
+   int _numLines {0};
+   QVector<Gene> _genes;
+   QStringList _geneNames;
+   QStringList _sampleNames;
+   /*!
+    * Pointer to the input text file.
+    */
    QFile* _input {nullptr};
+   /*!
+    * Pointer to the output expression matrix.
+    */
    ExpressionMatrix* _output {nullptr};
-   QString _noSampleToken;
+   /*!
+    * The string token used to represent NAN values.
+    */
+   QString _nanToken {"NA"};
+   /*!
+    * The number of samples to read.
+    */
    qint32 _sampleSize {0};
-   Transform _transform {Transform::None};
 };
 
 
diff --git a/src/core/importexpressionmatrix_input.cpp b/src/core/importexpressionmatrix_input.cpp
index 91b1f9f..3cb59c7 100644
--- a/src/core/importexpressionmatrix_input.cpp
+++ b/src/core/importexpressionmatrix_input.cpp
@@ -6,18 +6,30 @@
 
 
 
+/*!
+ * Construct a new input object with the given analytic as its parent.
+ *
+ * @param parent
+ */
 ImportExpressionMatrix::Input::Input(ImportExpressionMatrix* parent):
    EAbstractAnalytic::Input(parent),
    _base(parent)
-{}
+{
+   EDEBUG_FUNC(this,parent);
+}
 
 
 
 
 
 
+/*!
+ * Return the total number of arguments this analytic type contains.
+ */
 int ImportExpressionMatrix::Input::size() const
 {
+   EDEBUG_FUNC(this);
+
    return Total;
 }
 
@@ -26,15 +38,21 @@ int ImportExpressionMatrix::Input::size() const
 
 
 
+/*!
+ * Return the argument type for a given index.
+ *
+ * @param index
+ */
 EAbstractAnalytic::Input::Type ImportExpressionMatrix::Input::type(int index) const
 {
+   EDEBUG_FUNC(this,index);
+
    switch (index)
    {
    case InputFile: return Type::FileIn;
    case OutputData: return Type::DataOut;
-   case NoSampleToken: return Type::String;
+   case NANToken: return Type::String;
    case SampleSize: return Type::Integer;
-   case TransformType: return Type::Selection;
    default: return Type::Boolean;
    }
 }
@@ -44,8 +62,16 @@ EAbstractAnalytic::Input::Type ImportExpressionMatrix::Input::type(int index) co
 
 
 
+/*!
+ * Return data for a given role on an argument with the given index.
+ *
+ * @param index
+ * @param role
+ */
 QVariant ImportExpressionMatrix::Input::data(int index, Role role) const
 {
+   EDEBUG_FUNC(this,index,role);
+
    switch (index)
    {
    case InputFile:
@@ -66,12 +92,13 @@ QVariant ImportExpressionMatrix::Input::data(int index, Role role) const
       case Role::DataType: return DataFactory::ExpressionMatrixType;
       default: return QVariant();
       }
-   case NoSampleToken:
+   case NANToken:
       switch (role)
       {
       case Role::CommandLineName: return QString("nan");
-      case Role::Title: return tr("No Sample Token:");
+      case Role::Title: return tr("NAN Token:");
       case Role::WhatsThis: return tr("Expected token for expressions that have no value.");
+      case Role::Default: return "NA";
       default: return QVariant();
       }
    case SampleSize:
@@ -85,16 +112,6 @@ QVariant ImportExpressionMatrix::Input::data(int index, Role role) const
       case Role::Maximum: return std::numeric_limits<int>::max();
       default: return QVariant();
       }
-   case TransformType:
-      switch (role)
-      {
-      case Role::CommandLineName: return QString("transform");
-      case Role::Title: return tr("Transform:");
-      case Role::WhatsThis: return tr("Element-wise transformation to apply to expression data.");
-      case Role::Default: return ExpressionMatrix::TRANSFORM_NAMES.first();
-      case Role::SelectionValues: return ExpressionMatrix::TRANSFORM_NAMES;
-      default: return QVariant();
-      }
    default: return QVariant();
    }
 }
@@ -104,18 +121,23 @@ QVariant ImportExpressionMatrix::Input::data(int index, Role role) const
 
 
 
+/*!
+ * Set an argument with the given index to the given value.
+ *
+ * @param index
+ * @param value
+ */
 void ImportExpressionMatrix::Input::set(int index, const QVariant& value)
 {
+   EDEBUG_FUNC(this,index,&value);
+
    switch (index)
    {
    case SampleSize:
       _base->_sampleSize = value.toInt();
       break;
-   case NoSampleToken:
-      _base->_noSampleToken = value.toString();
-      break;
-   case TransformType:
-      _base->_transform = static_cast<Transform>(ExpressionMatrix::TRANSFORM_NAMES.indexOf(value.toString()));
+   case NANToken:
+      _base->_nanToken = value.toString();
       break;
    }
 }
@@ -125,8 +147,16 @@ void ImportExpressionMatrix::Input::set(int index, const QVariant& value)
 
 
 
+/*!
+ * Set a file argument with the given index to the given qt file pointer.
+ *
+ * @param index
+ * @param file
+ */
 void ImportExpressionMatrix::Input::set(int index, QFile* file)
 {
+   EDEBUG_FUNC(this,index,file);
+
    if ( index == InputFile )
    {
       _base->_input = file;
@@ -138,8 +168,16 @@ void ImportExpressionMatrix::Input::set(int index, QFile* file)
 
 
 
+/*!
+ * Set a data argument with the given index to the given data object pointer.
+ *
+ * @param index
+ * @param data
+ */
 void ImportExpressionMatrix::Input::set(int index, EAbstractData* data)
 {
+   EDEBUG_FUNC(this,index,data);
+
    if ( index == OutputData )
    {
       _base->_output = data->cast<ExpressionMatrix>();
diff --git a/src/core/importexpressionmatrix_input.h b/src/core/importexpressionmatrix_input.h
index 45f56b4..6bae85b 100644
--- a/src/core/importexpressionmatrix_input.h
+++ b/src/core/importexpressionmatrix_input.h
@@ -4,17 +4,22 @@
 
 
 
+/*!
+ * This class implements the abstract input of the import expression matrix analytic.
+ */
 class ImportExpressionMatrix::Input : public EAbstractAnalytic::Input
 {
    Q_OBJECT
 public:
+   /*!
+    * Defines all input arguments for this analytic.
+    */
    enum Argument
    {
       InputFile = 0
       ,OutputData
-      ,NoSampleToken
+      ,NANToken
       ,SampleSize
-      ,TransformType
       ,Total
    };
    explicit Input(ImportExpressionMatrix* parent);
@@ -25,6 +30,9 @@ class ImportExpressionMatrix::Input : public EAbstractAnalytic::Input
    virtual void set(int index, QFile* file) override final;
    virtual void set(int index, EAbstractData* data) override final;
 private:
+   /*!
+    * Pointer to the base analytic for this object.
+    */
    ImportExpressionMatrix* _base;
 };
 
diff --git a/src/core/pairwise_clustering.cpp b/src/core/pairwise_clustering.cpp
deleted file mode 100644
index ef26772..0000000
--- a/src/core/pairwise_clustering.cpp
+++ /dev/null
@@ -1,172 +0,0 @@
-#include "pairwise_clustering.h"
-
-
-
-using namespace Pairwise;
-
-
-
-
-
-
-void Clustering::initialize(ExpressionMatrix* input)
-{
-   // pre-allocate workspace
-   _workLabels.resize(input->getSampleSize());
-}
-
-
-
-
-
-
-qint8 Clustering::compute(
-   const QVector<Vector2>& X,
-   int numSamples,
-   QVector<qint8>& labels,
-   int minSamples,
-   qint8 minClusters,
-   qint8 maxClusters,
-   Criterion criterion,
-   bool removePreOutliers,
-   bool removePostOutliers)
-{
-   // remove pre-clustering outliers
-   if ( removePreOutliers )
-   {
-      markOutliers(X, numSamples, 0, labels, 0, -7);
-      markOutliers(X, numSamples, 1, labels, 0, -7);
-   }
-
-   // perform clustering only if there are enough samples
-   qint8 bestK = 0;
-
-   if ( numSamples >= minSamples )
-   {
-      float bestValue = INFINITY;
-
-      for ( qint8 K = minClusters; K <= maxClusters; ++K )
-      {
-         // run each clustering model
-         bool success = fit(X, numSamples, K, _workLabels);
-
-         if ( !success )
-         {
-            continue;
-         }
-
-         // evaluate model
-         float value = INFINITY;
-
-         switch (criterion)
-         {
-         case Criterion::BIC:
-            value = computeBIC(K, logLikelihood(), numSamples, 2);
-            break;
-         case Criterion::ICL:
-            value = computeICL(K, logLikelihood(), numSamples, 2, entropy());
-            break;
-         }
-
-         // save the best model
-         if ( value < bestValue )
-         {
-            bestK = K;
-            bestValue = value;
-
-            for ( int i = 0, j = 0; i < numSamples; ++i )
-            {
-               if ( labels[i] >= 0 )
-               {
-                  labels[i] = _workLabels[j];
-                  ++j;
-               }
-            }
-         }
-      }
-   }
-
-   if ( bestK > 1 )
-   {
-      // remove post-clustering outliers
-      if ( removePostOutliers )
-      {
-         for ( qint8 k = 0; k < bestK; ++k )
-         {
-            markOutliers(X, numSamples, 0, labels, k, -8);
-            markOutliers(X, numSamples, 1, labels, k, -8);
-         }
-      }
-   }
-
-   return bestK;
-}
-
-
-
-
-
-
-void Clustering::markOutliers(const QVector<Vector2>& X, int N, int j, QVector<qint8>& labels, qint8 cluster, qint8 marker)
-{
-   // compute x_sorted = X[:, j], filtered and sorted
-   QVector<float> x_sorted;
-   x_sorted.reserve(N);
-
-   for ( int i = 0; i < N; i++ )
-   {
-      if ( labels[i] == cluster || labels[i] == marker )
-      {
-         x_sorted.append(X[i].s[j]);
-      }
-   }
-
-   if ( x_sorted.size() == 0 )
-   {
-      return;
-   }
-
-   std::sort(x_sorted.begin(), x_sorted.end());
-
-   // compute quartiles, interquartile range, upper and lower bounds
-   const int n = x_sorted.size();
-
-   float Q1 = x_sorted[n * 1 / 4];
-   float Q3 = x_sorted[n * 3 / 4];
-
-   float T_min = Q1 - 1.5f * (Q3 - Q1);
-   float T_max = Q3 + 1.5f * (Q3 - Q1);
-
-   // mark outliers
-   for ( int i = 0; i < N; ++i )
-   {
-      if ( labels[i] == cluster && (X[i].s[j] < T_min || T_max < X[i].s[j]) )
-      {
-         labels[i] = marker;
-      }
-   }
-}
-
-
-
-
-
-
-float Clustering::computeBIC(int K, float logL, int N, int D)
-{
-   int p = K * (1 + D + D * D);
-
-   return log(N) * p - 2 * logL;
-}
-
-
-
-
-
-
-float Clustering::computeICL(int K, float logL, int N, int D, float E)
-{
-   int p = K * (1 + D + D * D);
-
-   return log(N) * p - 2 * logL - 2 * E;
-}
diff --git a/src/core/pairwise_clustering.h b/src/core/pairwise_clustering.h
deleted file mode 100644
index 00aa417..0000000
--- a/src/core/pairwise_clustering.h
+++ /dev/null
@@ -1,48 +0,0 @@
-#ifndef PAIRWISE_CLUSTERING_H
-#define PAIRWISE_CLUSTERING_H
-#include <ace/core/core.h>
-
-#include "ccmatrix.h"
-#include "expressionmatrix.h"
-#include "pairwise_linalg.h"
-#include "pairwise_index.h"
-
-namespace Pairwise
-{
-   enum class Criterion
-   {
-      BIC
-      ,ICL
-   };
-
-   class Clustering
-   {
-   public:
-      void initialize(ExpressionMatrix* input);
-      qint8 compute(
-         const QVector<Vector2>& X,
-         int numSamples,
-         QVector<qint8>& labels,
-         int minSamples,
-         qint8 minClusters,
-         qint8 maxClusters,
-         Criterion criterion,
-         bool removePreOutliers,
-         bool removePostOutliers
-      );
-
-   protected:
-      virtual bool fit(const QVector<Vector2>& X, int N, int K, QVector<qint8>& labels) = 0;
-      virtual float logLikelihood() const = 0;
-      virtual float entropy() const = 0;
-
-   private:
-      void markOutliers(const QVector<Vector2>& X, int N, int j, QVector<qint8>& labels, qint8 cluster, qint8 marker);
-      float computeBIC(int K, float logL, int N, int D);
-      float computeICL(int K, float logL, int N, int D, float E);
-
-      QVector<qint8> _workLabels;
-   };
-}
-
-#endif
diff --git a/src/core/pairwise_clusteringmodel.cpp b/src/core/pairwise_clusteringmodel.cpp
new file mode 100644
index 0000000..cd0104a
--- /dev/null
+++ b/src/core/pairwise_clusteringmodel.cpp
@@ -0,0 +1,103 @@
+#include "pairwise_clusteringmodel.h"
+
+
+
+using namespace Pairwise;
+
+
+
+
+
+
+/*!
+ * Construct an abstract pairwise clustering model.
+ *
+ * @param emx
+ */
+ClusteringModel::ClusteringModel(ExpressionMatrix* emx)
+{
+   // pre-allocate workspace
+   _workLabels.resize(emx->sampleSize());
+}
+
+
+
+
+
+
+/*!
+ * Determine the number of clusters in a pairwise data array. Several sub-models,
+ * each one having a different number of clusters, are fit to the data and the
+ * sub-model with the best criterion value is selected. The data array should
+ * only contain samples that have a non-negative label.
+ *
+ * @param data
+ * @param numSamples
+ * @param labels
+ * @param minSamples
+ * @param minClusters
+ * @param maxClusters
+ * @param criterion
+ */
+qint8 ClusteringModel::compute(
+   const QVector<Vector2>& data,
+   int numSamples,
+   QVector<qint8>& labels,
+   int minSamples,
+   qint8 minClusters,
+   qint8 maxClusters,
+   Criterion criterion)
+{
+   // perform clustering only if there are enough samples
+   qint8 bestK = 0;
+
+   if ( numSamples >= minSamples )
+   {
+      float bestValue = INFINITY;
+
+      for ( qint8 K = minClusters; K <= maxClusters; ++K )
+      {
+         // run each clustering sub-model
+         bool success = fit(data, numSamples, K, _workLabels);
+
+         if ( !success )
+         {
+            continue;
+         }
+
+         // compute the criterion value of the sub-model
+         float value = INFINITY;
+
+         switch (criterion)
+         {
+         case Criterion::AIC:
+            value = computeAIC(K, 2, logLikelihood());
+            break;
+         case Criterion::BIC:
+            value = computeBIC(K, 2, logLikelihood(), numSamples);
+            break;
+         case Criterion::ICL:
+            value = computeICL(K, 2, logLikelihood(), numSamples, entropy());
+            break;
+         }
+
+         // keep the sub-model with the lowest criterion value
+         if ( value < bestValue )
+         {
+            bestK = K;
+            bestValue = value;
+
+            for ( int i = 0, j = 0; i < labels.size(); ++i )
+            {
+               if ( labels[i] >= 0 )
+               {
+                  labels[i] = _workLabels[j];
+                  ++j;
+               }
+            }
+         }
+      }
+   }
+
+   return bestK;
+}
diff --git a/src/core/pairwise_clusteringmodel.h b/src/core/pairwise_clusteringmodel.h
new file mode 100644
index 0000000..467a296
--- /dev/null
+++ b/src/core/pairwise_clusteringmodel.h
@@ -0,0 +1,68 @@
+#ifndef PAIRWISE_CLUSTERINGMODEL_H
+#define PAIRWISE_CLUSTERINGMODEL_H
+#include <ace/core/core.h>
+
+#include "ccmatrix.h"
+#include "expressionmatrix.h"
+#include "pairwise_linalg.h"
+#include "pairwise_index.h"
+
+namespace Pairwise
+{
+   /*!
+    * Defines the criterion types used by the abstract clustering model.
+    */
+   enum class Criterion
+   {
+      /*!
+       * Akaike information criterion
+       */
+      AIC
+      /*!
+       * Bayesian information criterion
+       */
+      ,BIC
+      /*!
+       * Integrated completed likelihood
+       */
+      ,ICL
+   };
+
+   /*!
+    * This class implements the abstract pairwise clustering model, which takes
+    * a pairwise data array and determines the number of clusters, as well as the
+    * cluster label for each sample in the data array. The number of clusters is
+    * determined by creating several sub-models, each with a different assumption
+    * of the number of clusters, and selecting the sub-model which best fits the
+    * data according to a criterion. The clustering sub-model must be implemented
+    * by the inheriting class.
+    */
+   class ClusteringModel
+   {
+   public:
+      ClusteringModel(ExpressionMatrix* emx);
+      qint8 compute(
+         const QVector<Vector2>& data,
+         int numSamples,
+         QVector<qint8>& labels,
+         int minSamples,
+         qint8 minClusters,
+         qint8 maxClusters,
+         Criterion criterion
+      );
+   protected:
+      virtual bool fit(const QVector<Vector2>& X, int N, int K, QVector<qint8>& labels) = 0;
+      virtual float logLikelihood() const = 0;
+      virtual float entropy() const = 0;
+      virtual float computeAIC(int K, int D, float logL) = 0;
+      virtual float computeBIC(int K, int D, float logL, int N) = 0;
+      virtual float computeICL(int K, int D, float logL, int N, float E) = 0;
+   private:
+      /*!
+       * Workspace for the cluster labels.
+       */
+      QVector<qint8> _workLabels;
+   };
+}
+
+#endif
diff --git a/src/core/pairwise_correlation.cpp b/src/core/pairwise_correlation.cpp
deleted file mode 100644
index 1edb6fa..0000000
--- a/src/core/pairwise_correlation.cpp
+++ /dev/null
@@ -1,26 +0,0 @@
-#include "pairwise_correlation.h"
-
-
-
-using namespace Pairwise;
-
-
-
-
-
-
-QVector<float> Correlation::compute(
-   const QVector<Vector2>& data,
-   int K,
-   const QVector<qint8>& labels,
-   int minSamples)
-{
-   QVector<float> correlations(K);
-
-   for ( qint8 k = 0; k < K; ++k )
-   {
-      correlations[k] = computeCluster(data, labels, k, minSamples);
-   }
-
-   return correlations;
-}
diff --git a/src/core/pairwise_correlationmodel.cpp b/src/core/pairwise_correlationmodel.cpp
new file mode 100644
index 0000000..fd0ee85
--- /dev/null
+++ b/src/core/pairwise_correlationmodel.cpp
@@ -0,0 +1,36 @@
+#include "pairwise_correlationmodel.h"
+
+
+
+using namespace Pairwise;
+
+
+
+
+
+
+/*!
+ * Compute the correlation of each cluster in a pairwise data array. The data array
+ * should only contain the clean samples that were extracted from the expression
+ * matrix, while the labels should contain all samples.
+ *
+ * @param data
+ * @param K
+ * @param labels
+ * @param minSamples
+ */
+QVector<float> CorrelationModel::compute(
+   const QVector<Vector2>& data,
+   int K,
+   const QVector<qint8>& labels,
+   int minSamples)
+{
+   QVector<float> correlations(K);
+
+   for ( qint8 k = 0; k < K; ++k )
+   {
+      correlations[k] = computeCluster(data, labels, k, minSamples);
+   }
+
+   return correlations;
+}
diff --git a/src/core/pairwise_correlation.h b/src/core/pairwise_correlationmodel.h
similarity index 54%
rename from src/core/pairwise_correlation.h
rename to src/core/pairwise_correlationmodel.h
index 965246a..5025d10 100644
--- a/src/core/pairwise_correlation.h
+++ b/src/core/pairwise_correlationmodel.h
@@ -1,26 +1,26 @@
-#ifndef PAIRWISE_CORRELATION_H
-#define PAIRWISE_CORRELATION_H
+#ifndef PAIRWISE_CORRELATIONMODEL_H
+#define PAIRWISE_CORRELATIONMODEL_H
 #include <ace/core/core.h>
 
-#include "correlationmatrix.h"
-#include "expressionmatrix.h"
 #include "pairwise_linalg.h"
 
 namespace Pairwise
 {
-   class Correlation
+   /*!
+    * This class implements the abstract pairwise correlation model, which
+    * takes a pairwise data array (with cluster labels) and computes a correlation
+    * for each cluster in the data. The correlation metric must be implemented by
+    * the inheriting class.
+    */
+   class CorrelationModel
    {
    public:
-      virtual void initialize(ExpressionMatrix* input) = 0;
-      virtual QString getName() const = 0;
-
       QVector<float> compute(
          const QVector<Vector2>& data,
          int K,
          const QVector<qint8>& labels,
          int minSamples
       );
-
    protected:
       virtual float computeCluster(
          const QVector<Vector2>& data,
diff --git a/src/core/pairwise_gmm.cpp b/src/core/pairwise_gmm.cpp
index bce9bef..bd8d2c1 100644
--- a/src/core/pairwise_gmm.cpp
+++ b/src/core/pairwise_gmm.cpp
@@ -8,18 +8,40 @@ using namespace Pairwise;
 
 
 
+/*!
+ * Construct a Gaussian mixture model.
+ *
+ * @param emx
+ */
+GMM::GMM(ExpressionMatrix* emx):
+   ClusteringModel(emx)
+{
+}
+
+
+
+
+
+
+/*!
+ * Initialize a mixture component with the given mixture weight and mean.
+ *
+ * @param pi
+ * @param mu
+ */
 void GMM::Component::initialize(float pi, const Vector2& mu)
 {
-   // initialize pi and mu as given
+   // initialize mixture weight and mean
    _pi = pi;
    _mu = mu;
 
-   // Use identity covariance- assume dimensions are independent
+   // initialize covariance to identity matrix
    matrixInitIdentity(_sigma);
 
-   // Initialize zero artifacts
+   // initialize precision to zero matrix
    matrixInitZero(_sigmaInv);
 
+   // initialize normalizer term to 0
    _normalizer = 0;
 }
 
@@ -28,12 +50,14 @@ void GMM::Component::initialize(float pi, const Vector2& mu)
 
 
 
-void GMM::Component::prepareCovariance()
+/*!
+ * Pre-compute the precision matrix and normalizer term for a mixture component.
+ */
+void GMM::Component::prepare()
 {
    const int D = 2;
 
-   // Compute inverse of Sigma once each iteration instead of
-   // repeatedly for each calcLogMvNorm execution.
+   // compute precision (inverse of covariance)
    float det;
    matrixInverse(_sigma, _sigmaInv, &det);
 
@@ -42,7 +66,7 @@ void GMM::Component::prepareCovariance()
       throw std::runtime_error("matrix inverse failed");
    }
 
-   // Compute normalizer for multivariate normal distribution
+   // compute normalizer term for multivariate normal distribution
    _normalizer = -0.5f * (D * log(2.0f * M_PI) + log(det));
 }
 
@@ -51,27 +75,36 @@ void GMM::Component::prepareCovariance()
 
 
 
-void GMM::Component::calcLogMvNorm(const QVector<Vector2>& X, int N, float *logP)
+/*!
+ * Compute the log of the probability density function of the multivariate normal
+ * distribution conditioned on a single component for each point in X:
+ *
+ *   P(x|k) = exp(-0.5 * (x - mu)^T Sigma^-1 (x - mu)) / sqrt((2pi)^d det(Sigma))
+ *
+ * Therefore the log-probability is:
+ *
+ *   log(P(x|k)) = -0.5 * (x - mu)^T Sigma^-1 (x - mu) - 0.5 * (d * log(2pi) + log(det(Sigma)))
+ *
+ * @param X
+ * @param N
+ * @param logP
+ */
+void GMM::Component::computeLogProbNorm(const QVector<Vector2>& X, int N, float *logP)
 {
-   // Here we are computing the probability density function of the multivariate
-   // normal distribution conditioned on a single component for the set of points
-   // given by X.
-   //
-   // P(x|k) = exp{ -0.5 * (x - mu)^T Sigma^{-} (x - mu) } / sqrt{ (2pi)^d det(Sigma) }
-
    for (int i = 0; i < N; ++i)
    {
-      // Let xm = (x - mu)
+      // compute xm = (x - mu)
       Vector2 xm = X[i];
       vectorSubtract(xm, _mu);
 
-      // Compute xm^T Sxm = xm^T S^-1 xm
+      // compute Sxm = Sigma^-1 xm
       Vector2 Sxm;
       matrixProduct(_sigmaInv, xm, Sxm);
 
+      // compute xmSxm = xm^T Sigma^-1 xm
       float xmSxm = vectorDot(xm, Sxm);
 
-      // Compute log(P) = normalizer - 0.5 * xm^T * S^-1 * xm
+      // compute log(P) = normalizer - 0.5 * xm^T * Sigma^-1 * xm
       logP[i] = _normalizer - 0.5f * xmSxm;
    }
 }
@@ -81,7 +114,14 @@ void GMM::Component::calcLogMvNorm(const QVector<Vector2>& X, int N, float *logP
 
 
 
-void GMM::kmeans(const QVector<Vector2>& X, int N)
+/*!
+ * Initialize the mean of each component in the mixture model using k-means
+ * clustering.
+ *
+ * @param X
+ * @param N
+ */
+void GMM::initializeMeans(const QVector<Vector2>& X, int N)
 {
    const int K = _components.size();
 
@@ -89,48 +129,54 @@ void GMM::kmeans(const QVector<Vector2>& X, int N)
    const float TOLERANCE = 1e-3;
    float diff = 0;
 
-   Vector2 MP[K];
+   // initialize workspace
+   Vector2 Mu[K];
    int counts[K];
 
    for (int t = 0; t < MAX_ITERATIONS && diff > TOLERANCE; ++t)
    {
-      memset(MP, 0, K * sizeof(Vector2));
+      // compute mean and sample count for each component
+      memset(Mu, 0, K * sizeof(Vector2));
       memset(counts, 0, K * sizeof(int));
 
       for (int i = 0; i < N; ++i)
       {
-         // arg min
-         float minD = INFINITY;
-         int minDk = 0;
+         // determine the component mean which is nearest to x_i
+         float min_dist = INFINITY;
+         int min_k = 0;
          for (int k = 0; k < K; ++k)
          {
             float dist = vectorDiffNorm(X[i], _components[k]._mu);
-            if (minD > dist)
+            if (min_dist > dist)
             {
-               minD = dist;
-               minDk = k;
+               min_dist = dist;
+               min_k = k;
             }
          }
 
-         vectorAdd(MP[minDk], X[i]);
-         ++counts[minDk];
+         // update mean and sample count
+         vectorAdd(Mu[min_k], X[i]);
+         ++counts[min_k];
       }
 
+      // scale each mean by its sample count
       for (int k = 0; k < K; ++k)
       {
-         vectorScale(MP[k], 1.0f / counts[k]);
+         vectorScale(Mu[k], 1.0f / counts[k]);
       }
 
+      // compute the total change of all means
       diff = 0;
       for (int k = 0; k < K; ++k)
       {
-         diff += vectorDiffNorm(MP[k], _components[k]._mu);
+         diff += vectorDiffNorm(Mu[k], _components[k]._mu);
       }
       diff /= K;
 
+      // update component means
       for (int k = 0; k < K; ++k)
       {
-         _components[k]._mu = MP[k];
+         _components[k]._mu = Mu[k];
       }
    }
 }
@@ -140,111 +186,79 @@ void GMM::kmeans(const QVector<Vector2>& X, int N)
 
 
 
-void GMM::calcLogMvNorm(const QVector<Vector2>& X, int N, float *loggamma)
+/*!
+ * Perform the expectation step of the EM algorithm. In this step we compute
+ * gamma, the posterior probabilities for each component in the mixture model
+ * and each sample in X, as well as the log-likelihood of the model:
+ *
+ *   log(p(x_i)) = a + log(sum(exp(log(pi_k) + log(P(x_i|k))) - a))
+ *
+ *   gamma_ki = exp(log(pi_k) + log(P(x_i|k)) - log(p(x_i)))
+ *
+ *   log(L) = sum(log(p(x_i)))
+ *
+ * @param X
+ * @param N
+ * @param gamma
+ */
+float GMM::computeEStep(const QVector<Vector2>& X, int N, float *gamma)
 {
    const int K = _components.size();
 
-   for ( int k = 0; k < K; ++k )
+   // compute logpi
+   float logpi[K];
+
+   for (int k = 0; k < K; ++k)
    {
-      _components[k].calcLogMvNorm(X, N, &loggamma[k * N]);
+      logpi[k] = log(_components[k]._pi);
    }
-}
-
-
 
+   // compute the log-probability for each component and each point in X
+   float *logProb = gamma;
 
+   for ( int k = 0; k < K; ++k )
+   {
+      _components[k].computeLogProbNorm(X, N, &logProb[k * N]);
+   }
 
+   // compute gamma and log-likelihood
+   float logL = 0.0;
 
-void GMM::calcLogLikelihoodAndGammaNK(const float *logpi, int K, float *loggamma, int N, float *logL)
-{
-   *logL = 0.0;
    for (int i = 0; i < N; ++i)
    {
+      // compute a = argmax(logpi_k + logProb_ki, k)
       float maxArg = -INFINITY;
       for (int k = 0; k < K; ++k)
       {
-         const float logProbK = logpi[k] + loggamma[k * N + i];
-         if (logProbK > maxArg)
+         float arg = logpi[k] + logProb[k * N + i];
+         if (maxArg < arg)
          {
-            maxArg = logProbK;
+            maxArg = arg;
          }
       }
 
+      // compute logpx
       float sum = 0.0;
       for (int k = 0; k < K; ++k)
       {
-         const float logProbK = logpi[k] + loggamma[k * N + i];
-         sum += exp(logProbK - maxArg);
+         sum += exp(logpi[k] + logProb[k * N + i] - maxArg);
       }
 
-      const float logpx = maxArg + log(sum);
-      *logL += logpx;
-      for (int k = 0; k < K; ++k)
-      {
-         loggamma[k * N + i] += -logpx;
-      }
-   }
-}
-
-
-
-
-
-
-void GMM::calcLogGammaK(const float *loggamma, int N, int K, float *logGamma)
-{
-   memset(logGamma, 0, K * sizeof(float));
-
-   for (int k = 0; k < K; ++k)
-   {
-      const float *loggammak = &loggamma[k * N];
-
-      float maxArg = -INFINITY;
-      for (int i = 0; i < N; ++i)
-      {
-         const float loggammank = loggammak[i];
-         if (loggammank > maxArg)
-         {
-            maxArg = loggammank;
-         }
-      }
-
-      float sum = 0;
-      for (int i = 0; i < N; ++i)
-      {
-         const float loggammank = loggammak[i];
-         sum += exp(loggammank - maxArg);
-      }
-
-      logGamma[k] = maxArg + log(sum);
-   }
-}
-
-
-
-
-
+      float logpx = maxArg + log(sum);
 
-float GMM::calcLogGammaSum(const float *logpi, int K, const float *logGamma)
-{
-   float maxArg = -INFINITY;
-   for (int k = 0; k < K; ++k)
-   {
-      const float arg = logpi[k] + logGamma[k];
-      if (arg > maxArg)
+      // compute gamma_ki
+      for (int k = 0; k < K; ++k)
       {
-         maxArg = arg;
+         gamma[k * N + i] += logpi[k] - logpx;
+         gamma[k * N + i] = exp(gamma[k * N + i]);
       }
-   }
 
-   float sum = 0;
-   for (int k = 0; k < K; ++k)
-   {
-      const float arg = logpi[k] + logGamma[k];
-      sum += exp(arg - maxArg);
+      // update log-likelihood
+      logL += logpx;
    }
 
-   return maxArg + log(sum);
+   // return log-likelihood
+   return logL;
 }
 
 
@@ -252,66 +266,74 @@ float GMM::calcLogGammaSum(const float *logpi, int K, const float *logGamma)
 
 
 
-void GMM::performMStep(float *logpi, int K, float *loggamma, float *logGamma, float logGammaSum, const QVector<Vector2>& X, int N)
+/*!
+ * Perform the maximization step of the EM algorithm. In this step we update the
+ * parameters of the the mixture model using gamma, which is computed during the
+ * expectation step:
+ *
+ *   n_k = sum(gamma_ki)
+ *
+ *   pi_k = n_k / N
+ *
+ *   mu_k = sum(gamma_ki * x_i)) / n_k
+ *
+ *   Sigma_k = sum(gamma_ki * (x_i - mu_k) * (x_i - mu_k)^T) / n_k
+ *
+ * @param X
+ * @param N
+ * @param gamma
+ */
+void GMM::computeMStep(const QVector<Vector2>& X, int N, const float *gamma)
 {
-   // update pi
-   for (int k = 0; k < K; ++k)
-   {
-      logpi[k] += logGamma[k] - logGammaSum;
-
-      _components[k]._pi = exp(logpi[k]);
-   }
+   const int K = _components.size();
 
-   // convert loggamma / logGamma to gamma / Gamma to avoid duplicate exp(x) calls
    for (int k = 0; k < K; ++k)
    {
+      // compute n_k = sum(gamma_ki)
+      float n_k = 0;
+
       for (int i = 0; i < N; ++i)
       {
-         const int idx = k * N + i;
-         loggamma[idx] = exp(loggamma[idx]);
+         n_k += gamma[k * N + i];
       }
-   }
 
-   for (int k = 0; k < K; ++k)
-   {
-      logGamma[k] = exp(logGamma[k]);
-   }
+      // update mixture weight
+      _components[k]._pi = n_k / N;
 
-   for (int k = 0; k < K; ++k)
-   {
-      // Update mu
+      // update mean
       Vector2& mu = _components[k]._mu;
 
       vectorInitZero(mu);
 
       for (int i = 0; i < N; ++i)
       {
-         vectorAdd(mu, loggamma[k * N + i], X[i]);
+         vectorAdd(mu, gamma[k * N + i], X[i]);
       }
 
-      vectorScale(mu, 1.0f / logGamma[k]);
+      vectorScale(mu, 1.0f / n_k);
 
-      // Update sigma
+      // update covariance matrix
       Matrix2x2& sigma = _components[k]._sigma;
 
       matrixInitZero(sigma);
 
       for (int i = 0; i < N; ++i)
       {
-         // xm = (x - mu)
+         // compute xm = (x_i - mu_k)
          Vector2 xm = X[i];
          vectorSubtract(xm, mu);
 
-         // S_i = gamma_ik * (x - mu) (x - mu)^T
+         // compute Sigma_ki = gamma_ki * (x_i - mu_k) (x_i - mu_k)^T
          Matrix2x2 outerProduct;
          matrixOuterProduct(xm, xm, outerProduct);
 
-         matrixAdd(sigma, loggamma[k * N + i], outerProduct);
+         matrixAdd(sigma, gamma[k * N + i], outerProduct);
       }
 
-      matrixScale(sigma, 1.0f / logGamma[k]);
+      matrixScale(sigma, 1.0f / n_k);
 
-      _components[k].prepareCovariance();
+      // pre-compute precision matrix and normalizer term
+      _components[k].prepare();
    }
 }
 
@@ -320,22 +342,34 @@ void GMM::performMStep(float *logpi, int K, float *loggamma, float *logGamma, fl
 
 
 
-void GMM::calcLabels(float *loggamma, int N, int K, QVector<qint8>& labels)
+/*!
+ * Compute the cluster labels of a dataset using gamma:
+ *
+ *   y_i = argmax(gamma_ki, k)
+ *
+ * @param gamma
+ * @param N
+ * @param K
+ * @param labels
+ */
+void GMM::computeLabels(const float *gamma, int N, int K, QVector<qint8>& labels)
 {
    for ( int i = 0; i < N; ++i )
    {
+      // determine the value k for which gamma_ki is highest
       int max_k = -1;
       float max_gamma = -INFINITY;
 
       for ( int k = 0; k < K; ++k )
       {
-         if ( max_gamma < loggamma[k * N + i] )
+         if ( max_gamma < gamma[k * N + i] )
          {
             max_k = k;
-            max_gamma = loggamma[k * N + i];
+            max_gamma = gamma[k * N + i];
          }
       }
 
+      // assign x_i to cluster k
       labels[i] = max_k;
    }
 }
@@ -345,7 +379,17 @@ void GMM::calcLabels(float *loggamma, int N, int K, QVector<qint8>& labels)
 
 
 
-float GMM::calcEntropy(float *loggamma, int N, const QVector<qint8>& labels)
+/*!
+ * Compute the entropy of the mixture model for a dataset using gamma
+ * and the given cluster labels:
+ *
+ *   E = sum(sum(z_ki * log(gamma_ki))), z_ki = (y_i == k)
+ *
+ * @param gamma
+ * @param N
+ * @param labels
+ */
+float GMM::computeEntropy(const float *gamma, int N, const QVector<qint8>& labels)
 {
    float E = 0;
 
@@ -353,7 +397,7 @@ float GMM::calcEntropy(float *loggamma, int N, const QVector<qint8>& labels)
    {
       int k = labels[i];
 
-      E += log(loggamma[k * N + i]);
+      E += log(gamma[k * N + i]);
    }
 
    return E;
@@ -364,6 +408,15 @@ float GMM::calcEntropy(float *loggamma, int N, const QVector<qint8>& labels)
 
 
 
+/*!
+ * Fit the mixture model to a pairwise data array and compute the output cluster
+ * labels for the data. The data array should only contain clean samples.
+ *
+ * @param X
+ * @param N
+ * @param K
+ * @param labels
+ */
 bool GMM::fit(const QVector<Vector2>& X, int N, int K, QVector<qint8>& labels)
 {
    // initialize components
@@ -371,64 +424,48 @@ bool GMM::fit(const QVector<Vector2>& X, int N, int K, QVector<qint8>& labels)
 
    for ( int k = 0; k < K; ++k )
    {
-      // use uniform mixture proportion and randomly sampled mean
+      // use uniform mixture weight and randomly sampled mean
       int i = rand() % N;
 
       _components[k].initialize(1.0f / K, X[i]);
-      _components[k].prepareCovariance();
+      _components[k].prepare();
    }
 
    // initialize means with k-means
-   kmeans(X, N);
+   initializeMeans(X, N);
 
    // initialize workspace
-   float *logpi = new float[K];
-   float *loggamma = new float[K * N];
-   float *logGamma = new float[K];
-
-   for (int k = 0; k < K; ++k)
-   {
-      logpi[k] = log(_components[k]._pi);
-   }
+   float *gamma = new float[K * N];
 
    // run EM algorithm
    const int MAX_ITERATIONS = 100;
    const float TOLERANCE = 1e-8;
    float prevLogL = -INFINITY;
-   float currentLogL = -INFINITY;
+   float currLogL = -INFINITY;
    bool success;
 
    try
    {
       for ( int t = 0; t < MAX_ITERATIONS; ++t )
       {
-         // E step
-         // compute gamma, log-likelihood
-         calcLogMvNorm(X, N, loggamma);
-
-         prevLogL = currentLogL;
-         calcLogLikelihoodAndGammaNK(logpi, K, loggamma, N, &currentLogL);
+         // perform E step
+         prevLogL = currLogL;
+         currLogL = computeEStep(X, N, gamma);
 
          // check for convergence
-         if ( fabs(currentLogL - prevLogL) < TOLERANCE )
+         if ( fabs(currLogL - prevLogL) < TOLERANCE )
          {
             break;
          }
 
-         // M step
-         // Let Gamma[k] = \Sum_i gamma[k, i]
-         calcLogGammaK(loggamma, N, K, logGamma);
-
-         float logGammaSum = calcLogGammaSum(logpi, K, logGamma);
-
-         // Update parameters
-         performMStep(logpi, K, loggamma, logGamma, logGammaSum, X, N);
+         // perform M step
+         computeMStep(X, N, gamma);
       }
 
       // save outputs
-      _logL = currentLogL;
-      calcLabels(loggamma, N, K, labels);
-      _entropy = calcEntropy(loggamma, N, labels);
+      _logL = currLogL;
+      computeLabels(gamma, N, K, labels);
+      _entropy = computeEntropy(gamma, N, labels);
 
       success = true;
    }
@@ -437,9 +474,67 @@ bool GMM::fit(const QVector<Vector2>& X, int N, int K, QVector<qint8>& labels)
       success = false;
    }
 
-   delete[] logpi;
-   delete[] loggamma;
-   delete[] logGamma;
+   delete[] gamma;
 
    return success;
 }
+
+
+
+
+
+
+/*!
+ * Compute the Akaike Information Criterion of a Gaussian mixture model.
+ *
+ * @param K
+ * @param D
+ * @param logL
+ */
+float GMM::computeAIC(int K, int D, float logL)
+{
+   int p = K * (1 + D + D * D);
+
+   return 2 * p - 2 * logL;
+}
+
+
+
+
+
+
+/*!
+ * Compute the Bayesian Information Criterion of a Gaussian mixture model.
+ *
+ * @param K
+ * @param D
+ * @param logL
+ * @param N
+ */
+float GMM::computeBIC(int K, int D, float logL, int N)
+{
+   int p = K * (1 + D + D * D);
+
+   return log(N) * p - 2 * logL;
+}
+
+
+
+
+
+
+/*!
+ * Compute the Integrated Completed Likelihood of a Gaussian mixture model.
+ *
+ * @param K
+ * @param D
+ * @param logL
+ * @param N
+ * @param E
+ */
+float GMM::computeICL(int K, int D, float logL, int N, float E)
+{
+   int p = K * (1 + D + D * D);
+
+   return log(N) * p - 2 * logL - 2 * E;
+}
diff --git a/src/core/pairwise_gmm.h b/src/core/pairwise_gmm.h
index e097b58..61a0de4 100644
--- a/src/core/pairwise_gmm.h
+++ b/src/core/pairwise_gmm.h
@@ -1,50 +1,73 @@
 #ifndef PAIRWISE_GMM_H
 #define PAIRWISE_GMM_H
-#include "pairwise_clustering.h"
+#include "pairwise_clusteringmodel.h"
 
 namespace Pairwise
 {
-   class GMM : public Clustering
+   /*!
+    * This class implements the Gaussian mixture model.
+    */
+   class GMM : public ClusteringModel
    {
    public:
-      GMM() = default;
-
+      GMM(ExpressionMatrix* emx);
+   public:
       class Component
       {
       public:
          Component() = default;
-
          void initialize(float pi, const Vector2& mu);
-         void prepareCovariance();
-         void calcLogMvNorm(const QVector<Vector2>& X, int N, float *logP);
-
+         void prepare();
+         void computeLogProbNorm(const QVector<Vector2>& X, int N, float *logP);
+      public:
+         /*!
+          * The mixture weight.
+          */
          float _pi;
+         /*!
+          * The mean.
+          */
          Vector2 _mu;
+         /*!
+          * The covariance matrix.
+          */
          Matrix2x2 _sigma;
-
       private:
+         /*!
+          * The precision matrix, or inverse of the covariance matrix.
+          */
          Matrix2x2 _sigmaInv;
+         /*!
+          * A normalization term which is pre-computed for the multivariate
+          * normal distribution function.
+          */
          float _normalizer;
       };
-
    protected:
       bool fit(const QVector<Vector2>& X, int N, int K, QVector<qint8>& labels);
-
       float logLikelihood() const { return _logL; }
       float entropy() const { return _entropy; }
-
+      float computeAIC(int K, int D, float logL);
+      float computeBIC(int K, int D, float logL, int N);
+      float computeICL(int K, int D, float logL, int N, float E);
    private:
-      void kmeans(const QVector<Vector2>& X, int N);
-      void calcLogMvNorm(const QVector<Vector2>& X, int N, float *loggamma);
-      void calcLogLikelihoodAndGammaNK(const float *logpi, int K, float *loggamma, int N, float *logL);
-      void calcLogGammaK(const float *loggamma, int N, int K, float *logGamma);
-      float calcLogGammaSum(const float *logpi, int K, const float *logGamma);
-      void performMStep(float *logpi, int K, float *loggamma, float *logGamma, float logGammaSum, const QVector<Vector2>& X, int N);
-      void calcLabels(float *loggamma, int N, int K, QVector<qint8>& labels);
-      float calcEntropy(float *loggamma, int N, const QVector<qint8>& labels);
-
+      void initializeMeans(const QVector<Vector2>& X, int N);
+      float computeEStep(const QVector<Vector2>& X, int N, float *gamma);
+      void computeMStep(const QVector<Vector2>& X, int N, const float *gamma);
+      void computeLabels(const float *gamma, int N, int K, QVector<qint8>& labels);
+      float computeEntropy(const float *gamma, int N, const QVector<qint8>& labels);
+      /*!
+       * The list of mixture components, which define the mean and covariance
+       * of each cluster in the mixture model.
+       */
       QVector<Component> _components;
+      /*!
+       * The log-likelihood of the mixture model.
+       */
       float _logL;
+      /*!
+       * The entropy of the mixture model.
+       */
       float _entropy;
    };
 }
diff --git a/src/core/pairwise_index.cpp b/src/core/pairwise_index.cpp
index 8a45c76..094f03d 100644
--- a/src/core/pairwise_index.cpp
+++ b/src/core/pairwise_index.cpp
@@ -10,10 +10,19 @@ using namespace Pairwise;
 
 
 
+/*!
+ * Construct a pairwise index from a row index and a column index. The row
+ * index must be greater than the column index.
+ *
+ * @param x
+ * @param y
+ */
 Index::Index(qint32 x, qint32 y):
    _x(x),
    _y(y)
 {
+   EDEBUG_FUNC(this,x,y);
+
    // make sure pairwise index is valid
    if ( x < 1 || y < 0 || x <= y )
    {
@@ -29,10 +38,16 @@ Index::Index(qint32 x, qint32 y):
 
 
 
-Index::Index(qint64 index):
-   _x(1),
-   _y(0)
+/*!
+ * Construct a pairwise index from a one-dimensional index, which corresponds
+ * to the i-th element in the lower triangle of a matrix using row-major order.
+ *
+ * @param index
+ */
+Index::Index(qint64 index)
 {
+   EDEBUG_FUNC(this,index);
+
    // make sure index is valid
    if ( index < 0 )
    {
@@ -44,15 +59,15 @@ Index::Index(qint64 index):
 
    // compute pairwise index from scalar index
    qint64 pos {0};
-   while ( pos <= index )
+   qint64 x {0};
+
+   while ( pos + x <= index )
    {
-      ++_x;
-      pos = _x * (_x - 1) / 2;
+      pos += x;
+      ++x;
    }
 
-   --_x;
-   pos = _x * (_x - 1) / 2;
-
+   _x = x;
    _y = index - pos;
 }
 
@@ -61,8 +76,15 @@ Index::Index(qint64 index):
 
 
 
+/*!
+ * Return the indent value of this pairwise index with a given cluster index.
+ *
+ * @param cluster
+ */
 qint64 Index::indent(qint8 cluster) const
 {
+   EDEBUG_FUNC(this,cluster);
+
    // make sure cluster given is valid
    if ( cluster < 0 || cluster >= MAX_CLUSTER_SIZE )
    {
@@ -82,8 +104,13 @@ qint64 Index::indent(qint8 cluster) const
 
 
 
+/*!
+ * Increment a pairwise index to the next element.
+ */
 void Index::operator++()
 {
+   EDEBUG_FUNC(this);
+
    // increment gene y and check if it reaches gene x
    if ( ++_y >= _x )
    {
@@ -92,16 +119,3 @@ void Index::operator++()
       ++_x;
    }
 }
-
-
-
-
-
-
-Index Index::operator++(int)
-{
-   // save index value, increment it, and return previous value
-   Index ret {*this};
-   ++(*this);
-   return ret;
-}
diff --git a/src/core/pairwise_index.h b/src/core/pairwise_index.h
index e9ab485..befaca7 100644
--- a/src/core/pairwise_index.h
+++ b/src/core/pairwise_index.h
@@ -6,6 +6,16 @@
 
 namespace Pairwise
 {
+   /*!
+    * This class implements the pairwise index, which provides a way to order
+    * elements in a pairwise matrix and iterate through them. The pairwise index
+    * uses row-major order and uses only the lower triangle of a matrix; that is,
+    * it assumes that the row index is always greater than the column index.
+    * Additionally, the pairwise index provides an "indent" value which can be
+    * used to rank pairs that also have a cluster index; this value requires a
+    * fixed upper bound on the number of clusters, which depends on the data
+    * objects that use this class.
+    */
    class Index
    {
    public:
@@ -20,7 +30,6 @@ namespace Pairwise
       Index& operator=(const Index&) = default;
       Index& operator=(Index&&) = default;
       void operator++();
-      Index operator++(int);
       bool operator==(const Index& object) const
          { return _x == object._x && _y == object._y; }
       bool operator!=(const Index& object)
@@ -33,9 +42,21 @@ namespace Pairwise
          { return !(*this <= object); }
       bool operator>=(const Index& object)
          { return !(*this < object); }
+      /*!
+       * The maximum number of clusters used to compute the indent value
+       * of a pairwise index. Data objects which use the pairwise index should
+       * never attempt to store more than this number of clusters in a single
+       * pair.
+       */
       constexpr static qint8 MAX_CLUSTER_SIZE {64};
    private:
+      /*!
+       * The row index.
+       */
       qint32 _x {1};
+      /*!
+       * The column index.
+       */
       qint32 _y {0};
    };
 }
diff --git a/src/core/pairwise_kmeans.cpp b/src/core/pairwise_kmeans.cpp
deleted file mode 100644
index 34ecb8c..0000000
--- a/src/core/pairwise_kmeans.cpp
+++ /dev/null
@@ -1,127 +0,0 @@
-#include "pairwise_kmeans.h"
-
-
-
-using namespace Pairwise;
-
-
-
-
-
-
-bool KMeans::fit(const QVector<Vector2>& X, int N, int K, QVector<qint8>& labels)
-{
-   const int NUM_INITS = 10;
-   const int MAX_ITERATIONS = 300;
-
-   // repeat with several initializations
-   _logL = -INFINITY;
-
-   for ( int init = 0; init < NUM_INITS; ++init )
-   {
-      // initialize means randomly from X
-      _means.resize(K);
-
-      for ( int k = 0; k < K; ++k )
-      {
-         int i = rand() % N;
-         _means[k] = X[i];
-      }
-
-      // iterate K means until convergence
-      QVector<qint8> y(N);
-      QVector<qint8> y_next(N);
-
-      for ( int t = 0; t < MAX_ITERATIONS; ++t )
-      {
-         // compute new labels
-         for ( int i = 0; i < N; ++i )
-         {
-            // find k that minimizes norm(x_i - mu_k)
-            int min_k = -1;
-            float min_dist;
-
-            for ( int k = 0; k < K; ++k )
-            {
-               float dist = vectorDiffNorm(X[i], _means[k]);
-
-               if ( min_k == -1 || dist < min_dist )
-               {
-                  min_k = k;
-                  min_dist = dist;
-               }
-            }
-
-            y_next[i] = min_k;
-         }
-
-         // check for convergence
-         if ( y == y_next )
-         {
-            break;
-         }
-
-         // update labels
-         std::swap(y, y_next);
-
-         // update means
-         for ( int k = 0; k < K; ++k )
-         {
-            // compute mu_k = mean of all x_i in cluster k
-            int n_k = 0;
-
-            vectorInitZero(_means[k]);
-
-            for ( int i = 0; i < N; ++i )
-            {
-               if ( y[i] == k )
-               {
-                  vectorAdd(_means[k], X[i]);
-                  n_k++;
-               }
-            }
-
-            vectorScale(_means[k], 1.0f / n_k);
-         }
-      }
-
-      // save the run with the greatest log-likelihood
-      float logL = computeLogLikelihood(X, N, y);
-
-      if ( _logL < logL )
-      {
-         _logL = logL;
-         std::swap(labels, y);
-      }
-   }
-
-   return true;
-}
-
-
-
-
-
-
-float KMeans::computeLogLikelihood(const QVector<Vector2>& X, int N, const QVector<qint8>& y)
-{
-   // compute within-class scatter
-   float S = 0;
-
-   for ( int k = 0; k < _means.size(); ++k )
-   {
-      for ( int i = 0; i < N; ++i )
-      {
-         if ( y[i] != k )
-         {
-            continue;
-         }
-
-         float dist = vectorDiffNorm(X[i], _means[k]);
-
-         S += dist * dist;
-      }
-   }
-
-   return -S;
-}
diff --git a/src/core/pairwise_kmeans.h b/src/core/pairwise_kmeans.h
deleted file mode 100644
index 645cea6..0000000
--- a/src/core/pairwise_kmeans.h
+++ /dev/null
@@ -1,26 +0,0 @@
-#ifndef PAIRWISE_KMEANS_H
-#define PAIRWISE_KMEANS_H
-#include "pairwise_clustering.h"
-
-namespace Pairwise
-{
-   class KMeans : public Clustering
-   {
-   public:
-      KMeans() = default;
-
-   protected:
-      bool fit(const QVector<Vector2>& X, int N, int K, QVector<qint8>& labels);
-
-      float logLikelihood() const { return _logL; }
-      float entropy() const { return 0; }
-
-   private:
-      float computeLogLikelihood(const QVector<Vector2>& X, int N, const QVector<qint8>& y);
-
-      QVector<Vector2> _means;
-      float _logL;
-   };
-}
-
-#endif
diff --git a/src/core/pairwise_linalg.cpp b/src/core/pairwise_linalg.cpp
index aaf6bb2..1d63d34 100644
--- a/src/core/pairwise_linalg.cpp
+++ b/src/core/pairwise_linalg.cpp
@@ -9,6 +9,13 @@ namespace Pairwise {
 
 
 
+/*!
+ * Return the i.j element of a matrix.
+ *
+ * @param M
+ * @param i
+ * @param j
+ */
 inline const float& elem(const Matrix2x2& M, int i, int j)
 {
    return M.s[i * 2 + j];
@@ -19,6 +26,13 @@ inline const float& elem(const Matrix2x2& M, int i, int j)
 
 
 
+/*!
+ * Return the i.j element of a matrix.
+ *
+ * @param M
+ * @param i
+ * @param j
+ */
 inline float& elem(Matrix2x2& M, int i, int j)
 {
    return M.s[i * 2 + j];
@@ -29,6 +43,11 @@ inline float& elem(Matrix2x2& M, int i, int j)
 
 
 
+/*!
+ * Initialize a vector to the zero vector.
+ *
+ * @param a
+ */
 void vectorInitZero(Vector2& a)
 {
    a.s[0] = 0;
@@ -40,6 +59,12 @@ void vectorInitZero(Vector2& a)
 
 
 
+/*!
+ * Add two vectors in-place. The result is stored in a.
+ *
+ * @param a
+ * @param b
+ */
 void vectorAdd(Vector2& a, const Vector2& b)
 {
    a.s[0] += b.s[0];
@@ -51,6 +76,14 @@ void vectorAdd(Vector2& a, const Vector2& b)
 
 
 
+/*!
+ * Add two vectors in-place. The vector b is scaled by a constant c, and the
+ * result is stored in a.
+ *
+ * @param a
+ * @param c
+ * @param b
+ */
 void vectorAdd(Vector2& a, float c, const Vector2& b)
 {
    a.s[0] += c * b.s[0];
@@ -62,6 +95,12 @@ void vectorAdd(Vector2& a, float c, const Vector2& b)
 
 
 
+/*!
+ * Subtract two vectors in-place. The result is stored in a.
+ *
+ * @param a
+ * @param b
+ */
 void vectorSubtract(Vector2& a, const Vector2& b)
 {
    a.s[0] -= b.s[0];
@@ -73,6 +112,12 @@ void vectorSubtract(Vector2& a, const Vector2& b)
 
 
 
+/*!
+ * Scale a vector by a constant.
+ *
+ * @param a
+ * @param c
+ */
 void vectorScale(Vector2& a, float c)
 {
    a.s[0] *= c;
@@ -84,6 +129,12 @@ void vectorScale(Vector2& a, float c)
 
 
 
+/*!
+ * Return the dot product of two vectors.
+ *
+ * @param a
+ * @param b
+ */
 float vectorDot(const Vector2& a, const Vector2& b)
 {
    return a.s[0] * b.s[0] + a.s[1] * b.s[1];
@@ -94,6 +145,12 @@ float vectorDot(const Vector2& a, const Vector2& b)
 
 
 
+/*!
+ * Return the Euclidean distance between two vectors.
+ *
+ * @param a
+ * @param b
+ */
 float vectorDiffNorm(const Vector2& a, const Vector2& b)
 {
    float dist = 0;
@@ -108,6 +165,11 @@ float vectorDiffNorm(const Vector2& a, const Vector2& b)
 
 
 
+/*!
+ * Initialize a matrix to the identity matrix.
+ *
+ * @param M
+ */
 void matrixInitIdentity(Matrix2x2& M)
 {
    elem(M, 0, 0) = 1;
@@ -121,6 +183,11 @@ void matrixInitIdentity(Matrix2x2& M)
 
 
 
+/*!
+ * Initialize a matrix to the zero matrix.
+ *
+ * @param M
+ */
 void matrixInitZero(Matrix2x2& M)
 {
    elem(M, 0, 0) = 0;
@@ -134,6 +201,14 @@ void matrixInitZero(Matrix2x2& M)
 
 
 
+/*!
+ * Add two matrices in place. The matrix B is scaled by a constant c, and the
+ * result is stored in A.
+ *
+ * @param A
+ * @param c
+ * @param B
+ */
 void matrixAdd(Matrix2x2& A, float c, const Matrix2x2& B)
 {
    elem(A, 0, 0) += c * elem(B, 0, 0);
@@ -147,6 +222,12 @@ void matrixAdd(Matrix2x2& A, float c, const Matrix2x2& B)
 
 
 
+/*!
+ * Scale a matrix by a constant.
+ *
+ * @param M
+ * @param c
+ */
 void matrixScale(Matrix2x2& A, float c)
 {
    elem(A, 0, 0) *= c;
@@ -160,6 +241,14 @@ void matrixScale(Matrix2x2& A, float c)
 
 
 
+/*!
+ * Compute the inverse of A and store the result in B. Additionally, the
+ * determinant is returned as a pointer argument.
+ *
+ * @param A
+ * @param B
+ * @param p_det
+ */
 void matrixInverse(const Matrix2x2& A, Matrix2x2& B, float *p_det)
 {
    float det = elem(A, 0, 0) * elem(A, 1, 1) - elem(A, 0, 1) * elem(A, 1, 0);
@@ -177,6 +266,13 @@ void matrixInverse(const Matrix2x2& A, Matrix2x2& B, float *p_det)
 
 
 
+/*!
+ * Compute the matrix-vector product A * x and store the result in b.
+ *
+ * @param A
+ * @param x
+ * @param b
+ */
 void matrixProduct(const Matrix2x2& A, const Vector2& x, Vector2& b)
 {
    b.s[0] = elem(A, 0, 0) * x.s[0] + elem(A, 0, 1) * x.s[1];
@@ -188,6 +284,13 @@ void matrixProduct(const Matrix2x2& A, const Vector2& x, Vector2& b)
 
 
 
+/*!
+ * Compute the outer product a * b^T and store the result in C.
+ *
+ * @param a
+ * @param b
+ * @param C
+ */
 void matrixOuterProduct(const Vector2& a, const Vector2& b, Matrix2x2& C)
 {
    elem(C, 0, 0) = a.s[0] * b.s[0];
diff --git a/src/core/pairwise_linalg.h b/src/core/pairwise_linalg.h
index b135e87..70109df 100644
--- a/src/core/pairwise_linalg.h
+++ b/src/core/pairwise_linalg.h
@@ -2,6 +2,13 @@
 #define PAIRWISE_LINALG_H
 #include <ace/core/core.h>
 
+/*!
+ * This file provides structure and function definitions for the Vector2 and
+ * Matrix2x2 types, which are vector and matrix types with fixed dimensions.
+ * The operations defined for these types compute outputs directly without the
+ * use of loops. These types are useful for any algorithm that operates on
+ * pairwise data.
+ */
 namespace Pairwise
 {
    typedef union {
diff --git a/src/core/pairwise_matrix.cpp b/src/core/pairwise_matrix.cpp
index ae8d82c..9386730 100644
--- a/src/core/pairwise_matrix.cpp
+++ b/src/core/pairwise_matrix.cpp
@@ -9,9 +9,16 @@ using namespace Pairwise;
 
 
 
+/*!
+ * Return the index of the first byte in this data object after the end of
+ * the data section. Defined as the size of the header and sub-header plus the
+ * total size of all pairs.
+ */
 qint64 Matrix::dataEnd() const
 {
-   return _headerSize + _offset + _clusterSize * (_dataSize + _itemHeaderSize);
+   EDEBUG_FUNC(this);
+
+   return _headerSize + _subHeaderSize + _clusterSize * (_dataSize + _itemHeaderSize);
 }
 
 
@@ -19,11 +26,20 @@ qint64 Matrix::dataEnd() const
 
 
 
+/*!
+ * Read in the data of an existing data object that was just opened.
+ */
 void Matrix::readData()
 {
-   // read header
+   EDEBUG_FUNC(this);
+
+   // seek to the beginning of the data
    seek(0);
-   stream() >> _geneSize >> _maxClusterSize >> _dataSize >> _pairSize >> _clusterSize >> _offset;
+
+   // read the header
+   stream() >> _geneSize >> _maxClusterSize >> _dataSize >> _pairSize >> _clusterSize >> _subHeaderSize;
+
+   // read the sub-header
    readHeader();
 }
 
@@ -32,14 +48,23 @@ void Matrix::readData()
 
 
 
+/*!
+ * Initialize this data object's data to a null state.
+ */
 void Matrix::writeNewData()
 {
+   EDEBUG_FUNC(this);
+
    // initialize metadata
-   setMeta(EMetadata(EMetadata::Object));
+   setMeta(EMetaObject());
 
-   // initialize header
+   // seek to the beginning of the data
    seek(0);
-   stream() << _geneSize << _maxClusterSize << _dataSize << _pairSize << _clusterSize << _offset;
+
+   // write the header
+   stream() << _geneSize << _maxClusterSize << _dataSize << _pairSize << _clusterSize << _subHeaderSize;
+
+   // write the sub-header
    writeHeader();
 }
 
@@ -48,11 +73,21 @@ void Matrix::writeNewData()
 
 
 
+/*!
+ * Finalize this data object's data after the analytic that created it has
+ * finished giving it new data.
+ */
 void Matrix::finish()
 {
-   // initialize header
+   EDEBUG_FUNC(this);
+
+   // seek to the beginning of the data
    seek(0);
-   stream() << _geneSize << _maxClusterSize << _dataSize << _pairSize << _clusterSize << _offset;
+
+   // write the header
+   stream() << _geneSize << _maxClusterSize << _dataSize << _pairSize << _clusterSize << _subHeaderSize;
+
+   // write the sub-header
    writeHeader();
 }
 
@@ -61,9 +96,14 @@ void Matrix::finish()
 
 
 
-EMetadata Matrix::geneNames() const
+/*!
+ * Return the list of gene names in this pairwise matrix.
+ */
+EMetaArray Matrix::geneNames() const
 {
-   return meta().toObject().at("genes");
+   EDEBUG_FUNC(this);
+
+   return meta().toObject().at("genes").toArray();
 }
 
 
@@ -71,19 +111,30 @@ EMetadata Matrix::geneNames() const
 
 
 
-void Matrix::initialize(const EMetadata& geneNames, int maxClusterSize, int dataSize, int offset)
+/*!
+ * Initialize this pairwise matrix with a list of gene names, the max cluster
+ * size, the pairwise data size, and the sub-header size.
+ *
+ * @param geneNames
+ * @param maxClusterSize
+ * @param dataSize
+ * @param subHeaderSize
+ */
+void Matrix::initialize(const EMetaArray& geneNames, int maxClusterSize, int dataSize, int subHeaderSize)
 {
-   // make sure gene names metadata is an array and is not empty
-   if ( !geneNames.isArray() || geneNames.toArray().isEmpty() )
+   EDEBUG_FUNC(this,&geneNames,maxClusterSize,dataSize,subHeaderSize);
+
+   // make sure gene names metadata is not empty
+   if ( geneNames.isEmpty() )
    {
       E_MAKE_EXCEPTION(e);
       e.setTitle(tr("Domain Error"));
-      e.setDetails(tr("Gene names metadata is not an array or is empty."));
+      e.setDetails(tr("Gene names metadata is empty."));
       throw e;
    }
 
    // make sure arguments are valid
-   if ( maxClusterSize < 1 || dataSize < 1 || offset < 0 )
+   if ( maxClusterSize < 1 || dataSize < 1 || subHeaderSize < 0 )
    {
       E_MAKE_EXCEPTION(e);
       e.setTitle(tr("Pairwise Matrix Initialization Error"));
@@ -105,10 +156,10 @@ void Matrix::initialize(const EMetadata& geneNames, int maxClusterSize, int data
    setMeta(metaObject);
 
    // initiailze new data within object
-   _geneSize = geneNames.toArray().size();
+   _geneSize = geneNames.size();
    _maxClusterSize = maxClusterSize;
    _dataSize = dataSize;
-   _offset = offset;
+   _subHeaderSize = subHeaderSize;
    _pairSize = 0;
    _clusterSize = 0;
    _lastWrite = -1;
@@ -119,8 +170,16 @@ void Matrix::initialize(const EMetadata& geneNames, int maxClusterSize, int data
 
 
 
-void Matrix::write(Index index, qint8 cluster)
+/*!
+ * Write the header of a new pair given a pairwise index and cluster index.
+ *
+ * @param index
+ * @param cluster
+ */
+void Matrix::write(const Index& index, qint8 cluster)
 {
+   EDEBUG_FUNC(this,&index,cluster);
+
    // make sure this is new data object that can be written to
    if ( _lastWrite == -2 )
    {
@@ -142,7 +201,7 @@ void Matrix::write(Index index, qint8 cluster)
    }
 
    // seek to position for next pair and write indent value
-   seek(_headerSize + _offset + _clusterSize * (_dataSize + _itemHeaderSize));
+   seek(_headerSize + _subHeaderSize + _clusterSize * (_dataSize + _itemHeaderSize));
    stream() << index.getX() << index.getY() << cluster;
 
    // increment cluster size and set new last index
@@ -155,9 +214,18 @@ void Matrix::write(Index index, qint8 cluster)
 
 
 
+/*!
+ * Get a pair at the given index in the data object file and return the
+ * pairwise index and cluster index of that pair.
+ *
+ * @param index
+ * @param cluster
+ */
 Index Matrix::getPair(qint64 index, qint8* cluster) const
 {
-   // seek to pairwise index and read item header data
+   EDEBUG_FUNC(this,index,cluster);
+
+   // seek to index and read item header data
    seekPair(index);
    qint32 geneX;
    qint32 geneY;
@@ -172,8 +240,17 @@ Index Matrix::getPair(qint64 index, qint8* cluster) const
 
 
 
+/*!
+ * Find a pair with a given indent value using binary search.
+ *
+ * @param indent
+ * @param first
+ * @param last
+ */
 qint64 Matrix::findPair(qint64 indent, qint64 first, qint64 last) const
 {
+   EDEBUG_FUNC(this,indent,first,last);
+
    // calculate the midway pivot point and seek to it
    qint64 pivot {first + (last - first)/2};
    seekPair(pivot);
@@ -227,8 +304,15 @@ qint64 Matrix::findPair(qint64 indent, qint64 first, qint64 last) const
 
 
 
+/*!
+ * Seek to the pair at the given index in the data object file.
+ *
+ * @param index
+ */
 void Matrix::seekPair(qint64 index) const
 {
+   EDEBUG_FUNC(this,index);
+
    // make sure index is within range
    if ( index < 0 || index >= _clusterSize )
    {
@@ -239,119 +323,6 @@ void Matrix::seekPair(qint64 index) const
       throw e;
    }
 
-   // seek to pairwise index requested making sure it worked
-   seek(_headerSize + _offset + index * (_dataSize + _itemHeaderSize));
-}
-
-
-
-
-
-
-void Matrix::Pair::write(Index index)
-{
-   // make sure cluster size of pair does not exceed max
-   if ( clusterSize() > _matrix->_maxClusterSize )
-   {
-      E_MAKE_EXCEPTION(e);
-      e.setTitle(tr("Pairwise Logical Error"));
-      e.setDetails(tr("Cannot write pair with cluster size %1 exceeding the max of %2.")
-         .arg(clusterSize())
-         .arg(_matrix->_maxClusterSize));
-      throw e;
-   }
-
-   // go through each cluster and write it to data object
-   for (int i = 0; i < clusterSize() ;++i)
-   {
-      _matrix->write(index,i);
-      writeCluster(_matrix->stream(),i);
-   }
-
-   // increment pair size of data object
-   ++(_matrix->_pairSize);
-}
-
-
-
-
-
-
-void Matrix::Pair::read(Index index) const
-{
-   // clear any existing clusters
-   clearClusters();
-
-   // attempt to find cluster index within data object
-   qint64 clusterIndex;
-   if ( _cMatrix->_clusterSize > 0
-        && (clusterIndex = _cMatrix->findPair(index.indent(0),0,_cMatrix->_clusterSize - 1)) != -1 )
-   {
-      // pair found, read in all clusters
-      _rawIndex = clusterIndex;
-      readNext();
-   }
-}
-
-
-
-
-
-
-void Matrix::Pair::readNext() const
-{
-   // make sure read next index is not already at end of data object
-   if ( _rawIndex < _cMatrix->_clusterSize )
-   {
-      // clear any existing clusters
-      clearClusters();
-
-      // get to first cluster
-      qint8 cluster;
-      Index index {_cMatrix->getPair(_rawIndex++,&cluster)};
-
-      // make sure this is cluster 0
-      if ( cluster != 0 )
-      {
-         E_MAKE_EXCEPTION(e);
-         e.setTitle(tr("File IO Error"));
-         e.setDetails(tr("Reading pair failed because first cluster is not 0."));
-         throw e;
-      }
-
-      // add first cluster, read it in, and save pairwise index
-      addCluster();
-      readCluster(_cMatrix->stream(),0);
-      _index = index;
-
-      // read in remaining clusters for pair
-      qint8 count {1};
-      while ( _rawIndex < _cMatrix->_clusterSize )
-      {
-         // get next pair cluster
-         _cMatrix->getPair(_rawIndex++,&cluster);
-
-         // if cluster is zero this is the next pair so break from loop
-         if ( cluster == 0 )
-         {
-            --_rawIndex;
-            break;
-         }
-
-         // make sure max cluster size has not been exceeded
-         if ( ++count > _cMatrix->_maxClusterSize )
-         {
-            E_MAKE_EXCEPTION(e);
-            e.setTitle(tr("Pairwise Logical Error"));
-            e.setDetails(tr("Cannot read pair with cluster size %1 exceeding the max of %2.")
-               .arg(count)
-               .arg(_matrix->_maxClusterSize));
-            throw e;
-         }
-
-         // add new cluster and read it in
-         addCluster();
-         readCluster(_cMatrix->stream(),cluster);
-      }
-   }
+   // seek to the specified index
+   seek(_headerSize + _subHeaderSize + index * (_dataSize + _itemHeaderSize));
 }
diff --git a/src/core/pairwise_matrix.h b/src/core/pairwise_matrix.h
index f8c8ab7..21d0e59 100644
--- a/src/core/pairwise_matrix.h
+++ b/src/core/pairwise_matrix.h
@@ -1,5 +1,5 @@
-#ifndef PAIRWISE_BASE_H
-#define PAIRWISE_BASE_H
+#ifndef PAIRWISE_MATRIX_H
+#define PAIRWISE_MATRIX_H
 #include <ace/core/core.h>
 
 #include "pairwise_index.h"
@@ -8,76 +8,80 @@
 
 namespace Pairwise
 {
+   /*!
+    * This class implements the abstract pairwise matrix data object, which can
+    * be extended to represent any pairwise matrix. Both the rows and columns
+    * correspond to genes, and each element (i, j) in the matrix contains
+    * pairwise data for genes i and j. This pairwise data can have multiple clusters,
+    * and the structure of a "pair-cluster" is defined by the inheriting class.
+    * This class stores matrix data as an ordered list of indexed pairs; therefore,
+    * pairwise data must be written in order and it should be sparse for the
+    * storage format to be efficient.
+    */
    class Matrix : public EAbstractData
    {
    public:
       class Pair;
+   public:
       virtual qint64 dataEnd() const override final;
       virtual void readData() override final;
       virtual void writeNewData() override final;
       virtual void finish() override final;
+   public:
       int geneSize() const { return _geneSize; }
       int maxClusterSize() const { return _maxClusterSize; }
       qint64 size() const { return _pairSize; }
-      EMetadata geneNames() const;
+      EMetaArray geneNames() const;
    protected:
       virtual void writeHeader() = 0;
       virtual void readHeader() = 0;
-      void initialize(const EMetadata& geneNames, int maxClusterSize, int dataSize, int offset);
+      void initialize(const EMetaArray& geneNames, int maxClusterSize, int dataSize, int offset);
    private:
-      void write(Index index, qint8 cluster);
+      void write(const Index& index, qint8 cluster);
       Index getPair(qint64 index, qint8* cluster) const;
       qint64 findPair(qint64 indent, qint64 first, qint64 last) const;
       void seekPair(qint64 index) const;
+      /*!
+       * The size (in bytes) of the header at the beginning of the file. The header
+       * consists of the gene size, max cluster size, pairwise data size, total
+       * number of pairs, total number of clusters, and sub-header offset.
+       */
       constexpr static int _headerSize {30};
+      /*!
+       * The size (in bytes) of the pairwise header. The item header size consists
+       * of the row and column index of the pair.
+       */
       constexpr static int _itemHeaderSize {9};
+      /*!
+       * The number of genes in the pairwise matrix.
+       */
       qint32 _geneSize {0};
+      /*!
+       * The maximum number of clusters allowed for each pair in the matrix.
+       */
       qint32 _maxClusterSize {0};
+      /*!
+       * The size (in bytes) of a pairwise data element.
+       */
       qint32 _dataSize {0};
+      /*!
+       * The total number of pairs in the matrix.
+       */
       qint64 _pairSize {0};
+      /*!
+       * The total number of clusters (across all pairs) in the matrix.
+       */
       qint64 _clusterSize {0};
-      qint16 _offset {0};
+      /*!
+       * The size (in bytes) of the sub-header, which occurs after the header
+       * and can be used by an inheriting class.
+       */
+      qint16 _subHeaderSize {0};
+      /*!
+       * The index of the last pair that was written to the matrix.
+       */
       qint64 _lastWrite {-2};
    };
-
-
-
-   class Matrix::Pair
-   {
-   public:
-      Pair(Matrix* matrix):
-         _matrix(matrix),
-         _cMatrix(matrix),
-         _index({matrix->_geneSize,0})
-         {}
-      Pair(const Matrix* matrix):
-         _cMatrix(matrix),
-         _index({matrix->_geneSize,0})
-         {}
-      Pair() = default;
-      Pair(const Pair&) = default;
-      Pair(Pair&&) = default;
-      virtual void clearClusters() const = 0;
-      virtual void addCluster(int amount = 1) const = 0;
-      virtual int clusterSize() const = 0;
-      virtual bool isEmpty() const = 0;
-      void write(Index index);
-      void read(Index index) const;
-      void reset() const { _rawIndex = 0; };
-      void readNext() const;
-      bool hasNext() const { return _rawIndex != _cMatrix->_clusterSize; }
-      const Index& index() const { return _index; }
-      Pair& operator=(const Pair&) = default;
-      Pair& operator=(Pair&&) = default;
-   protected:
-      virtual void writeCluster(EDataStream& stream, int cluster) = 0;
-      virtual void readCluster(const EDataStream& stream, int cluster) const = 0;
-   private:
-      Matrix* _matrix {nullptr};
-      const Matrix* _cMatrix;
-      mutable qint64 _rawIndex {0};
-      mutable Index _index;
-   };
 }
 
 
diff --git a/src/core/pairwise_matrix_pair.cpp b/src/core/pairwise_matrix_pair.cpp
new file mode 100644
index 0000000..823832a
--- /dev/null
+++ b/src/core/pairwise_matrix_pair.cpp
@@ -0,0 +1,138 @@
+#include "pairwise_matrix_pair.h"
+
+
+
+using namespace Pairwise;
+
+
+
+
+
+
+/*!
+ * Write the iterator's pairwise data to the data object file with the given
+ * pairwise index.
+ *
+ * @param index
+ */
+void Matrix::Pair::write(const Index& index)
+{
+   EDEBUG_FUNC(this,&index);
+
+   // make sure cluster size of pair does not exceed max
+   if ( clusterSize() > _matrix->_maxClusterSize )
+   {
+      E_MAKE_EXCEPTION(e);
+      e.setTitle(tr("Pairwise Logical Error"));
+      e.setDetails(tr("Cannot write pair with cluster size %1 exceeding the max of %2.")
+         .arg(clusterSize())
+         .arg(_matrix->_maxClusterSize));
+      throw e;
+   }
+
+   // go through each cluster and write it to data object
+   for (int i = 0; i < clusterSize() ;++i)
+   {
+      _matrix->write(index,i);
+      writeCluster(_matrix->stream(),i);
+   }
+
+   // increment pair size of data object
+   ++(_matrix->_pairSize);
+}
+
+
+
+
+
+
+/*!
+ * Read the pair with the given pairwise index from the data object file.
+ *
+ * @param index
+ */
+void Matrix::Pair::read(const Index& index) const
+{
+   EDEBUG_FUNC(this,&index);
+
+   // clear any existing clusters
+   clearClusters();
+
+   // attempt to find cluster index within data object
+   qint64 clusterIndex;
+   if ( _cMatrix->_clusterSize > 0
+        && (clusterIndex = _cMatrix->findPair(index.indent(0),0,_cMatrix->_clusterSize - 1)) != -1 )
+   {
+      // pair found, read in all clusters
+      _rawIndex = clusterIndex;
+      readNext();
+   }
+}
+
+
+
+
+
+
+/*!
+ * Read the next pair in the data object file.
+ */
+void Matrix::Pair::readNext() const
+{
+   EDEBUG_FUNC(this);
+
+   // make sure read next index is not already at end of data object
+   if ( _rawIndex < _cMatrix->_clusterSize )
+   {
+      // clear any existing clusters
+      clearClusters();
+
+      // get to first cluster
+      qint8 cluster;
+      Index index {_cMatrix->getPair(_rawIndex++,&cluster)};
+
+      // make sure this is cluster 0
+      if ( cluster != 0 )
+      {
+         E_MAKE_EXCEPTION(e);
+         e.setTitle(tr("File IO Error"));
+         e.setDetails(tr("Reading pair failed because first cluster is not 0."));
+         throw e;
+      }
+
+      // add first cluster, read it in, and save pairwise index
+      addCluster();
+      readCluster(_cMatrix->stream(),0);
+      _index = index;
+
+      // read in remaining clusters for pair
+      qint8 count {1};
+      while ( _rawIndex < _cMatrix->_clusterSize )
+      {
+         // get next pair cluster
+         _cMatrix->getPair(_rawIndex++,&cluster);
+
+         // if cluster is zero this is the next pair so break from loop
+         if ( cluster == 0 )
+         {
+            --_rawIndex;
+            break;
+         }
+
+         // make sure max cluster size has not been exceeded
+         if ( ++count > _cMatrix->_maxClusterSize )
+         {
+            E_MAKE_EXCEPTION(e);
+            e.setTitle(tr("Pairwise Logical Error"));
+            e.setDetails(tr("Cannot read pair with cluster size %1 exceeding the max of %2.")
+               .arg(count)
+               .arg(_matrix->_maxClusterSize));
+            throw e;
+         }
+
+         // add new cluster and read it in
+         addCluster();
+         readCluster(_cMatrix->stream(),cluster);
+      }
+   }
+}
diff --git a/src/core/pairwise_matrix_pair.h b/src/core/pairwise_matrix_pair.h
new file mode 100644
index 0000000..e116e49
--- /dev/null
+++ b/src/core/pairwise_matrix_pair.h
@@ -0,0 +1,65 @@
+#ifndef PAIRWISE_MATRIX_PAIR_H
+#define PAIRWISE_MATRIX_PAIR_H
+#include "pairwise_matrix.h"
+
+
+
+namespace Pairwise
+{
+   /*!
+    * This class implements the pairwise iterator for the pairwise matrix
+    * data object. The pairwise iterator can read from or write to any pair in
+    * the pairwise matrix, or it can simply iterate through each pair. The
+    * iterator stores only one pair in memory at a time.
+    */
+   class Matrix::Pair
+   {
+   public:
+      Pair(Matrix* matrix):
+         _matrix(matrix),
+         _cMatrix(matrix)
+         {}
+      Pair(const Matrix* matrix):
+         _cMatrix(matrix)
+         {}
+      Pair() = default;
+      Pair(const Pair&) = default;
+      Pair(Pair&&) = default;
+      virtual void clearClusters() const = 0;
+      virtual void addCluster(int amount = 1) const = 0;
+      virtual int clusterSize() const = 0;
+      virtual bool isEmpty() const = 0;
+      void write(const Index& index);
+      void read(const Index& index) const;
+      void reset() const { _rawIndex = 0; };
+      void readNext() const;
+      bool hasNext() const { return _rawIndex != _cMatrix->_clusterSize; }
+      const Index& index() const { return _index; }
+      Pair& operator=(const Pair&) = default;
+      Pair& operator=(Pair&&) = default;
+   protected:
+      virtual void writeCluster(EDataStream& stream, int cluster) = 0;
+      virtual void readCluster(const EDataStream& stream, int cluster) const = 0;
+   private:
+      /*!
+       * Pointer to the parent pairwise matrix.
+       */
+      Matrix* _matrix {nullptr};
+      /*!
+       * Constant pointer to the parent pairwise matrix.
+       */
+      const Matrix* _cMatrix;
+      /*!
+       * The iterator's current position in the pairwise matrix.
+       */
+      mutable qint64 _rawIndex {0};
+      /*!
+       * Pairwise index corresponding to the iterator's position.
+       */
+      mutable Index _index;
+   };
+}
+
+
+
+#endif
diff --git a/src/core/pairwise_pearson.cpp b/src/core/pairwise_pearson.cpp
index 871587b..163569b 100644
--- a/src/core/pairwise_pearson.cpp
+++ b/src/core/pairwise_pearson.cpp
@@ -9,6 +9,15 @@ using namespace Pairwise;
 
 
 
+/*!
+ * Compute the Pearson correlation of a cluster in a pairwise data array. The
+ * data array should only contain samples that have a non-negative label.
+ *
+ * @param data
+ * @param labels
+ * @param cluster
+ * @param minSamples
+ */
 float Pearson::computeCluster(
    const QVector<Vector2>& data,
    const QVector<qint8>& labels,
@@ -23,20 +32,25 @@ float Pearson::computeCluster(
    float sumy2 = 0;
    float sumxy = 0;
 
-   for ( int i = 0; i < labels.size(); ++i )
+   for ( int i = 0, j = 0; i < labels.size(); ++i )
    {
-      if ( labels[i] == cluster )
+      if ( labels[i] >= 0 )
       {
-         float x_i = data[i].s[0];
-         float y_i = data[i].s[1];
+         if ( labels[i] == cluster )
+         {
+            float x_i = data[j].s[0];
+            float y_i = data[j].s[1];
 
-         sumx += x_i;
-         sumy += y_i;
-         sumx2 += x_i * x_i;
-         sumy2 += y_i * y_i;
-         sumxy += x_i * y_i;
+            sumx += x_i;
+            sumy += y_i;
+            sumx2 += x_i * x_i;
+            sumy2 += y_i * y_i;
+            sumxy += x_i * y_i;
 
-         ++n;
+            ++n;
+         }
+
+         ++j;
       }
    }
 
diff --git a/src/core/pairwise_pearson.h b/src/core/pairwise_pearson.h
index e3a4529..72e25bc 100644
--- a/src/core/pairwise_pearson.h
+++ b/src/core/pairwise_pearson.h
@@ -1,15 +1,14 @@
 #ifndef PAIRWISE_PEARSON_H
 #define PAIRWISE_PEARSON_H
-#include "pairwise_correlation.h"
+#include "pairwise_correlationmodel.h"
 
 namespace Pairwise
 {
-   class Pearson : public Correlation
+   /*!
+    * This class implements the Pearson correlation model.
+    */
+   class Pearson : public CorrelationModel
    {
-   public:
-      void initialize(ExpressionMatrix* /*input*/) {}
-      QString getName() const { return "pearson"; }
-
    protected:
       float computeCluster(
          const QVector<Vector2>& data,
diff --git a/src/core/pairwise_spearman.cpp b/src/core/pairwise_spearman.cpp
index a71ddcd..b85387f 100644
--- a/src/core/pairwise_spearman.cpp
+++ b/src/core/pairwise_spearman.cpp
@@ -9,10 +9,36 @@ using namespace Pairwise;
 
 
 
-void Spearman::initialize(ExpressionMatrix* input)
+/*!
+ * Compute the next power of 2 which occurs after a number.
+ *
+ * @param n
+ */
+int Spearman::nextPower2(int n)
+{
+   int pow2 = 2;
+   while ( pow2 < n )
+   {
+      pow2 *= 2;
+   }
+
+   return pow2;
+}
+
+
+
+
+
+
+/*!
+ * Construct a Spearman correlation model.
+ *
+ * @param emx
+ */
+Spearman::Spearman(ExpressionMatrix* emx)
 {
    // pre-allocate workspace
-   int workSize = nextPower2(input->getSampleSize());
+   int workSize = nextPower2(emx->sampleSize());
 
    _x.resize(workSize);
    _y.resize(workSize);
@@ -24,13 +50,22 @@ void Spearman::initialize(ExpressionMatrix* input)
 
 
 
+/*!
+ * Compute the Spearman correlation of a cluster in a pairwise data array. The
+ * data array should only contain samples that have a non-negative label.
+ *
+ * @param data
+ * @param labels
+ * @param cluster
+ * @param minSamples
+ */
 float Spearman::computeCluster(
    const QVector<Vector2>& data,
    const QVector<qint8>& labels,
    qint8 cluster,
    int minSamples)
 {
-   // extract samples in gene pair cluster
+   // extract samples in pairwise cluster
    int N_pow2 = nextPower2(labels.size());
    int n = 0;
 
@@ -90,22 +125,15 @@ float Spearman::computeCluster(
 
 
 
-int Spearman::nextPower2(int n)
-{
-   int pow2 = 2;
-   while ( pow2 < n )
-   {
-      pow2 *= 2;
-   }
-
-   return pow2;
-}
-
-
-
-
-
-
+/*!
+ * Sort a list using bitonic sort, while also applying the same swap operations
+ * to a second list of the same size. The lists should have a size which is a
+ * power of two.
+ *
+ * @param size
+ * @param sortList
+ * @param extraList
+ */
 void Spearman::bitonicSort(int size, QVector<float>& sortList, QVector<float>& extraList)
 {
    // initialize all variables
@@ -138,6 +166,15 @@ void Spearman::bitonicSort(int size, QVector<float>& sortList, QVector<float>& e
 
 
 
+/*!
+ * Sort a list using bitonic sort, while also applying the same swap operations
+ * to a second list of the same size. The lists should have a size which is a
+ * power of two.
+ *
+ * @param size
+ * @param sortList
+ * @param extraList
+ */
 void Spearman::bitonicSort(int size, QVector<float>& sortList, QVector<int>& extraList)
 {
    // initialize all variables
diff --git a/src/core/pairwise_spearman.h b/src/core/pairwise_spearman.h
index 3c27581..394046c 100644
--- a/src/core/pairwise_spearman.h
+++ b/src/core/pairwise_spearman.h
@@ -1,15 +1,18 @@
 #ifndef PAIRWISE_SPEARMAN_H
 #define PAIRWISE_SPEARMAN_H
-#include "pairwise_correlation.h"
+#include "pairwise_correlationmodel.h"
+#include "expressionmatrix.h"
 
 namespace Pairwise
 {
-   class Spearman : public Correlation
+   /*!
+    * This class implements the Spearman correlation model.
+    */
+   class Spearman : public CorrelationModel
    {
    public:
-      void initialize(ExpressionMatrix* input);
-      QString getName() const { return "spearman"; }
-
+      static int nextPower2(int n);
+      Spearman(ExpressionMatrix* emx);
    protected:
       float computeCluster(
          const QVector<Vector2>& data,
@@ -17,14 +20,20 @@ namespace Pairwise
          qint8 cluster,
          int minSamples
       );
-
    private:
-      int nextPower2(int n);
       void bitonicSort(int size, QVector<float>& sortList, QVector<float>& extraList);
       void bitonicSort(int size, QVector<float>& sortList, QVector<int>& extraList);
-
+      /*!
+       * Workspace for the x data.
+       */
       QVector<float> _x;
+      /*!
+       * Workspace for the y data.
+       */
       QVector<float> _y;
+      /*!
+       * Workspace for the rank data.
+       */
       QVector<float> _rank;
    };
 }
diff --git a/src/core/powerlaw.cpp b/src/core/powerlaw.cpp
new file mode 100644
index 0000000..d695e97
--- /dev/null
+++ b/src/core/powerlaw.cpp
@@ -0,0 +1,365 @@
+#include "powerlaw.h"
+#include "powerlaw_input.h"
+#include "correlationmatrix.h"
+
+
+
+using namespace std;
+using RawPair = CorrelationMatrix::RawPair;
+
+
+
+
+
+
+/*!
+ * Return the total number of blocks this analytic must process as steps
+ * or blocks of work.
+ */
+int PowerLaw::size() const
+{
+   EDEBUG_FUNC(this);
+
+   return 1;
+}
+
+
+
+
+
+
+/*!
+ * Process the given index with a possible block of results if this analytic
+ * produces work blocks. This analytic implementation has no work blocks.
+ *
+ * @param result
+ */
+void PowerLaw::process(const EAbstractAnalytic::Block*)
+{
+   EDEBUG_FUNC(this);
+
+   // initialize log text stream
+   QTextStream stream(_logfile);
+
+   // load raw correlation data, row-wise maximums
+   QVector<RawPair> pairs {_input->dumpRawData()};
+   QVector<float> maximums {computeMaximums(pairs)};
+
+   // continue until network is sufficiently scale-free
+   float threshold {_thresholdStart};
+
+   while ( true )
+   {
+      qInfo("\n");
+      qInfo("threshold: %8.3f", threshold);
+
+      // compute adjacency matrix based on threshold
+      int size;
+      QVector<bool> adjacencyMatrix {computeAdjacencyMatrix(pairs, maximums, threshold, &size)};
+
+      qInfo("adjacency matrix: %d", size);
+
+      // make sure that adjacency matrix is not empty
+      float correlation {0};
+
+      if ( size > 0 )
+      {
+         // compute degree distribution of matrix
+         QVector<int> histogram {computeDegreeDistribution(adjacencyMatrix, size)};
+
+         // compute correlation of degree distribution
+         correlation = computeCorrelation(histogram);
+
+         qInfo("correlation: %8.3f", correlation);
+      }
+
+      // output to log file
+      stream << threshold << "\t" << size << "\t" << correlation << "\n";
+
+      // TODO: break if network is sufficently scale-free
+
+      // decrement threshold and fail if minimum threshold is reached
+      threshold -= _thresholdStep;
+      if ( threshold < _thresholdStop )
+      {
+         E_MAKE_EXCEPTION(e);
+         e.setTitle(tr("Power-law Threshold Error"));
+         e.setDetails(tr("Could not find scale-free network above stopping threshold."));
+         throw e;
+      }
+   }
+
+   // write final threshold
+   stream << threshold << "\n";
+}
+
+
+
+
+
+
+/*!
+ * Make a new input object and return its pointer.
+ */
+EAbstractAnalytic::Input* PowerLaw::makeInput()
+{
+   EDEBUG_FUNC(this);
+
+   return new Input(this);
+}
+
+
+
+
+
+
+/*!
+ * Initialize this analytic. This implementation checks to make sure the input
+ * data object and output log file have been set, and that various integer
+ * arguments are valid.
+ */
+void PowerLaw::initialize()
+{
+   EDEBUG_FUNC(this);
+
+   // make sure input and output were set properly
+   if ( !_input || !_logfile )
+   {
+      E_MAKE_EXCEPTION(e);
+      e.setTitle(tr("Invalid Argument"));
+      e.setDetails(tr("Did not get valid input or logfile arguments."));
+      throw e;
+   }
+
+   // make sure threshold arguments are valid
+   if ( _thresholdStart <= _thresholdStop )
+   {
+      E_MAKE_EXCEPTION(e);
+      e.setTitle(tr("Invalid Argument"));
+      e.setDetails(tr("Starting threshold must be greater than stopping threshold."));
+      throw e;
+   }
+}
+
+
+
+
+
+
+/*!
+ * Compute the row-wise maximums of a correlation matrix.
+ *
+ * @param pairs
+ */
+QVector<float> PowerLaw::computeMaximums(const QVector<RawPair>& pairs)
+{
+   EDEBUG_FUNC(this,&pairs);
+
+   // initialize elements to minimum value
+   QVector<float> maximums(_input->geneSize(), 0);
+
+   // compute maximum correlation of each row
+   for ( auto& pair : pairs )
+   {
+      int i = pair.index.getX();
+
+      for ( int k = 0; k < pair.correlations.size(); ++k )
+      {
+         float correlation = fabs(pair.correlations[k]);
+
+         if ( maximums[i] < correlation )
+         {
+            maximums[i] = correlation;
+         }
+      }
+   }
+
+   // return row-wise maximums
+   return maximums;
+}
+
+
+
+
+
+
+/*!
+ * Compute the adjacency matrix of a correlation matrix with a given threshold.
+ * This function uses the pre-computed row-wise maximums for faster computation.
+ * Additionally, all zero-columns removed. The number of rows in the adjacency
+ * matrix is returned as a pointer argument.
+ *
+ * @param pairs
+ * @param maximums
+ * @param threshold
+ * @param size
+ */
+QVector<bool> PowerLaw::computeAdjacencyMatrix(const QVector<RawPair>& pairs, const QVector<float>& maximums, float threshold, int* size)
+{
+   EDEBUG_FUNC(this,&pairs,&maximums,threshold,size);
+
+   // generate vector of row indices that have a correlation above threshold
+   QVector<int> indices(_input->geneSize(), -1);
+   int pruneSize = 0;
+
+   for ( int i = 0; i < maximums.size(); ++i )
+   {
+      if ( maximums[i] >= threshold )
+      {
+         indices[i] = pruneSize;
+         pruneSize++;
+      }
+   }
+
+   // extract adjacency matrix from correlation matrix
+   QVector<bool> adjacencyMatrix(pruneSize * pruneSize);
+
+   // initialize diagonal
+   for ( int i = 0; i < pruneSize; ++i )
+   {
+      adjacencyMatrix[i * pruneSize + i] = 1;
+   }
+
+   // iterate through all pairs
+   for ( auto& pair : pairs )
+   {
+      // get indices into pruned matrix
+      int i = indices[pair.index.getX()];
+      int j = indices[pair.index.getY()];
+
+      // skip pair if it was pruned
+      if ( i == -1 || j == -1 )
+      {
+         continue;
+      }
+
+      // select correlation from pair
+      float correlation = pair.correlations[0];
+
+      // save correlation if it is above threshold
+      if ( fabs(correlation) >= threshold )
+      {
+         adjacencyMatrix[i * pruneSize + j] = 1;
+         adjacencyMatrix[j * pruneSize + i] = 1;
+      }
+   }
+
+   // save size of adjacency matrix
+   *size = pruneSize;
+
+   // return adjacency matrix
+   return adjacencyMatrix;
+}
+
+
+
+
+
+
+/*!
+ * Compute the degree distribution of an adjacency matrix.
+ *
+ * @param matrix
+ * @param size
+ */
+QVector<int> PowerLaw::computeDegreeDistribution(const QVector<bool>& matrix, int size)
+{
+   EDEBUG_FUNC(this,&matrix,size);
+
+   // compute degree of each node
+   QVector<int> degrees(size);
+
+   for ( int i = 0; i < size; i++ )
+   {
+      for ( int j = 0; j < size; j++ )
+      {
+         degrees[i] += matrix[i * size + j];
+      }
+   }
+
+   // compute max degree
+   int max {0};
+
+   for ( int i = 0; i < degrees.size(); i++ )
+   {
+      if ( max < degrees[i] )
+      {
+         max = degrees[i];
+      }
+   }
+
+   // compute histogram of degrees
+   QVector<int> histogram(max);
+
+   for ( int i = 0; i < degrees.size(); i++ )
+   {
+      if ( degrees[i] > 0 )
+      {
+         histogram[degrees[i] - 1]++;
+      }
+   }
+
+   return histogram;
+}
+
+
+
+
+
+
+/*!
+ * Compare a degree distribution to a power-law distribution. The goodness-of-fit
+ * is measured by the Pearson correlation of the log-transformed histogram.
+ *
+ * @param histogram
+ */
+float PowerLaw::computeCorrelation(const QVector<int>& histogram)
+{
+   EDEBUG_FUNC(this,&histogram);
+
+   // compute log-log transform of histogram data
+   const int n = histogram.size();
+   QVector<float> x(n);
+   QVector<float> y(n);
+
+   for ( int i = 0; i < n; i++ )
+   {
+      x[i] = log(i + 1);
+      y[i] = log(histogram[i] + 1);
+   }
+
+   // visualize log-log histogram
+   qInfo("histogram:");
+
+   for ( int i = 0; i < 10; i++ )
+   {
+      float sum {0};
+      for ( int j = i * n / 10; j < (i + 1) * n / 10; j++ )
+      {
+         sum += y[j];
+      }
+
+      int len {(int)(sum / log((float) _input->geneSize()))};
+      QString bin(len, '#');
+
+      qInfo(" | %s", bin.toStdString().c_str());
+   }
+
+   // compute Pearson correlation of x, y
+   float sumx = 0;
+   float sumy = 0;
+   float sumx2 = 0;
+   float sumy2 = 0;
+   float sumxy = 0;
+
+   for ( int i = 0; i < n; ++i )
+   {
+      sumx += x[i];
+      sumy += y[i];
+      sumx2 += x[i] * x[i];
+      sumy2 += y[i] * y[i];
+      sumxy += x[i] * y[i];
+   }
+
+   return (n*sumxy - sumx*sumy) / sqrt((n*sumx2 - sumx*sumx) * (n*sumy2 - sumy*sumy));
+}
diff --git a/src/core/powerlaw.h b/src/core/powerlaw.h
new file mode 100644
index 0000000..ec4de3c
--- /dev/null
+++ b/src/core/powerlaw.h
@@ -0,0 +1,55 @@
+#ifndef POWERLAW_H
+#define POWERLAW_H
+#include <ace/core/core.h>
+#include "correlationmatrix.h"
+
+
+
+/*!
+ * This class implements the Power-law thresholding analytic. This analytic takes
+ * a correlation matrix and attempts to find a threshold which, when applied to
+ * the correlation matrix, produces a scale-free network. Each thresholded network
+ * is evaluted by comparing the degree distribution of the network to a power-law
+ * distribution. This process is repeated at each threshold step from the starting
+ * threshold.
+ */
+class PowerLaw : public EAbstractAnalytic
+{
+   Q_OBJECT
+public:
+   class Input;
+   virtual int size() const override final;
+   virtual void process(const EAbstractAnalytic::Block* result) override final;
+   virtual EAbstractAnalytic::Input* makeInput() override final;
+   virtual void initialize();
+private:
+   QVector<float> computeMaximums(const QVector<CorrelationMatrix::RawPair>& pairs);
+   QVector<bool> computeAdjacencyMatrix(const QVector<CorrelationMatrix::RawPair>& pairs, const QVector<float>& maximums, float threshold, int* size);
+   QVector<int> computeDegreeDistribution(const QVector<bool>& matrix, int size);
+   float computeCorrelation(const QVector<int>& histogram);
+   /*!
+    * Pointer to the input correlation matrix.
+    */
+   CorrelationMatrix* _input {nullptr};
+   /*!
+    * Pointer to the output log file.
+    */
+   QFile* _logfile {nullptr};
+   /*!
+    * The starting threshold.
+    */
+   float _thresholdStart {0.99};
+   /*!
+    * The threshold decrement.
+    */
+   float _thresholdStep {0.01};
+   /*!
+    * The stopping threshold. The analytic will fail if it cannot find a
+    * proper threshold before reaching the stopping threshold.
+    */
+   float _thresholdStop {0.5};
+};
+
+
+
+#endif
diff --git a/src/core/powerlaw_input.cpp b/src/core/powerlaw_input.cpp
new file mode 100644
index 0000000..3471f6d
--- /dev/null
+++ b/src/core/powerlaw_input.cpp
@@ -0,0 +1,203 @@
+#include "powerlaw_input.h"
+#include "correlationmatrix.h"
+#include "datafactory.h"
+
+
+
+
+
+
+/*!
+ * Construct a new input object with the given analytic as its parent.
+ *
+ * @param parent
+ */
+PowerLaw::Input::Input(PowerLaw* parent):
+   EAbstractAnalytic::Input(parent),
+   _base(parent)
+{
+   EDEBUG_FUNC(this,parent);
+}
+
+
+
+
+
+
+/*!
+ * Return the total number of arguments this analytic type contains.
+ */
+int PowerLaw::Input::size() const
+{
+   EDEBUG_FUNC(this);
+
+   return Total;
+}
+
+
+
+
+
+
+/*!
+ * Return the argument type for a given index.
+ *
+ * @param index
+ */
+EAbstractAnalytic::Input::Type PowerLaw::Input::type(int index) const
+{
+   EDEBUG_FUNC(this,index);
+
+   switch (index)
+   {
+   case InputData: return Type::DataIn;
+   case LogFile: return Type::FileOut;
+   case ThresholdStart: return Type::Double;
+   case ThresholdStep: return Type::Double;
+   case ThresholdStop: return Type::Double;
+   default: return Type::Boolean;
+   }
+}
+
+
+
+
+
+
+/*!
+ * Return data for a given role on an argument with the given index.
+ *
+ * @param index
+ * @param role
+ */
+QVariant PowerLaw::Input::data(int index, Role role) const
+{
+   EDEBUG_FUNC(this,index,role);
+
+   switch (index)
+   {
+   case InputData:
+      switch (role)
+      {
+      case Role::CommandLineName: return QString("input");
+      case Role::Title: return tr("Input:");
+      case Role::WhatsThis: return tr("Correlation matrix for which an appropriate correlation threshold will be found.");
+      case Role::DataType: return DataFactory::CorrelationMatrixType;
+      default: return QVariant();
+      }
+   case LogFile:
+      switch (role)
+      {
+      case Role::CommandLineName: return QString("log");
+      case Role::Title: return tr("Log File:");
+      case Role::WhatsThis: return tr("Output text file that logs all results.");
+      case Role::FileFilters: return tr("Text file %1").arg("(*.txt)");
+      default: return QVariant();
+      }
+   case ThresholdStart:
+      switch (role)
+      {
+      case Role::CommandLineName: return QString("tstart");
+      case Role::Title: return tr("Threshold Start:");
+      case Role::WhatsThis: return tr("Starting threshold.");
+      case Role::Default: return 0.99;
+      case Role::Minimum: return 0;
+      case Role::Maximum: return 1;
+      default: return QVariant();
+      }
+   case ThresholdStep:
+      switch (role)
+      {
+      case Role::CommandLineName: return QString("tstep");
+      case Role::Title: return tr("Threshold Step:");
+      case Role::WhatsThis: return tr("Threshold step size.");
+      case Role::Default: return 0.01;
+      case Role::Minimum: return 0;
+      case Role::Maximum: return 1;
+      default: return QVariant();
+      }
+   case ThresholdStop:
+      switch (role)
+      {
+      case Role::CommandLineName: return QString("tstop");
+      case Role::Title: return tr("Threshold Stop:");
+      case Role::WhatsThis: return tr("Stopping threshold.");
+      case Role::Default: return 0.5;
+      case Role::Minimum: return 0;
+      case Role::Maximum: return 1;
+      default: return QVariant();
+      }
+   default: return QVariant();
+   }
+}
+
+
+
+
+
+
+/*!
+ * Set an argument with the given index to the given value.
+ *
+ * @param index
+ * @param value
+ */
+void PowerLaw::Input::set(int index, const QVariant& value)
+{
+   EDEBUG_FUNC(this,index,&value);
+
+   switch (index)
+   {
+   case ThresholdStart:
+      _base->_thresholdStart = value.toDouble();
+      break;
+   case ThresholdStep:
+      _base->_thresholdStep = value.toDouble();
+      break;
+   case ThresholdStop:
+      _base->_thresholdStop = value.toDouble();
+      break;
+   }
+}
+
+
+
+
+
+
+/*!
+ * Set a file argument with the given index to the given qt file pointer.
+ *
+ * @param index
+ * @param file
+ */
+void PowerLaw::Input::set(int index, QFile* file)
+{
+   EDEBUG_FUNC(this,index,file);
+
+   if ( index == LogFile )
+   {
+      _base->_logfile = file;
+   }
+}
+
+
+
+
+
+
+/*!
+ * Set a data argument with the given index to the given data object pointer.
+ *
+ * @param index
+ * @param data
+ */
+void PowerLaw::Input::set(int index, EAbstractData* data)
+{
+   EDEBUG_FUNC(this,index,data);
+
+   if ( index == InputData )
+   {
+      _base->_input = data->cast<CorrelationMatrix>();
+   }
+}
diff --git a/src/core/powerlaw_input.h b/src/core/powerlaw_input.h
new file mode 100644
index 0000000..3516c38
--- /dev/null
+++ b/src/core/powerlaw_input.h
@@ -0,0 +1,42 @@
+#ifndef POWERLAW_INPUT_H
+#define POWERLAW_INPUT_H
+#include "powerlaw.h"
+
+
+
+/*!
+ * This class implements the abstract input of the PowerLaw analytic.
+ */
+class PowerLaw::Input : public EAbstractAnalytic::Input
+{
+   Q_OBJECT
+public:
+   /*!
+    * Defines all input arguments for this analytic.
+    */
+   enum Argument
+   {
+      InputData = 0
+      ,LogFile
+      ,ThresholdStart
+      ,ThresholdStep
+      ,ThresholdStop
+      ,Total
+   };
+   explicit Input(PowerLaw* parent);
+   virtual int size() const override final;
+   virtual EAbstractAnalytic::Input::Type type(int index) const override final;
+   virtual QVariant data(int index, Role role) const override final;
+   virtual void set(int index, const QVariant& value) override final;
+   virtual void set(int index, QFile* file) override final;
+   virtual void set(int index, EAbstractData* data) override final;
+private:
+   /*!
+    * Pointer to the base analytic for this object.
+    */
+   PowerLaw* _base;
+};
+
+
+
+#endif
diff --git a/src/core/rmt.cpp b/src/core/rmt.cpp
index 4ab2eea..88e6817 100644
--- a/src/core/rmt.cpp
+++ b/src/core/rmt.cpp
@@ -1,27 +1,28 @@
-#include <memory>
-#include <random>
 #include <gsl/gsl_interp.h>
 #include <gsl/gsl_spline.h>
-#include <gsl/gsl_vector.h>
-#include <gsl/gsl_matrix.h>
-#include <gsl/gsl_eigen.h>
+#include <lapacke.h>
 
 #include "rmt.h"
 #include "rmt_input.h"
-#include "correlationmatrix.h"
-#include "datafactory.h"
 
 
 
 using namespace std;
+using RawPair = CorrelationMatrix::RawPair;
 
 
 
 
 
 
+/*!
+ * Return the total number of blocks this analytic must process as steps
+ * or blocks of work.
+ */
 int RMT::size() const
 {
+   EDEBUG_FUNC(this);
+
    return 1;
 }
 
@@ -30,9 +31,15 @@ int RMT::size() const
 
 
 
-void RMT::process(const EAbstractAnalytic::Block* result)
+/*!
+ * Process the given index with a possible block of results if this analytic
+ * produces work blocks. This analytic implementation has no work blocks.
+ *
+ * @param result
+ */
+void RMT::process(const EAbstractAnalytic::Block*)
 {
-   Q_UNUSED(result);
+   EDEBUG_FUNC(this);
 
    // initialize log text stream
    QTextStream stream(_logfile);
@@ -45,18 +52,18 @@ void RMT::process(const EAbstractAnalytic::Block* result)
    float threshold {_thresholdStart};
 
    // load raw correlation data, row-wise maximums
-   QVector<float> matrix {_input->dumpRawData()};
-   QVector<float> maximums {computeMaximums(matrix)};
+   QVector<RawPair> pairs {_input->dumpRawData()};
+   QVector<float> maximums {computeMaximums(pairs)};
 
    // continue while max chi is less than final threshold
    while ( maxChi < _chiSquareThreshold2 )
    {
       qInfo("\n");
-      qInfo("threshold: %g", threshold);
+      qInfo("threshold: %8.3f", threshold);
 
       // compute pruned matrix based on threshold
       int size;
-      QVector<float> pruneMatrix {computePruneMatrix(matrix, maximums, threshold, &size)};
+      QVector<float> pruneMatrix {computePruneMatrix(pairs, maximums, threshold, &size)};
 
       qInfo("prune matrix: %d", size);
 
@@ -70,23 +77,28 @@ void RMT::process(const EAbstractAnalytic::Block* result)
 
          qInfo("eigenvalues: %d", eigens.size());
 
-         // compute chi-square value from NNSD of eigenvalues
-         chi = computeChiSquare(eigens);
+         // compute unique eigenvalues
+         QVector<float> unique {computeUnique(eigens)};
+
+         qInfo("unique eigenvalues: %d", unique.size());
 
-         qInfo("chi-square: %g", chi);
+         // compute chi-squared value from NNSD of eigenvalues
+         chi = computeChiSquare(unique);
+
+         qInfo("chi-squared: %g", chi);
       }
 
-      // make sure that chi-square test succeeded
+      // make sure that chi-squared test succeeded
       if ( chi != -1 )
       {
-         // save the most recent chi-square value less than critical value
+         // save the most recent chi-squared value less than critical value
          if ( chi < _chiSquareThreshold1 )
          {
             finalChi = chi;
             finalThreshold = threshold;
          }
 
-         // save the largest chi-square value which occurs after finalChi
+         // save the largest chi-squared value which occurs after finalChi
          if ( finalChi < _chiSquareThreshold1 && chi > finalChi )
          {
             maxChi = chi;
@@ -116,8 +128,13 @@ void RMT::process(const EAbstractAnalytic::Block* result)
 
 
 
+/*!
+ * Make a new input object and return its pointer.
+ */
 EAbstractAnalytic::Input* RMT::makeInput()
 {
+   EDEBUG_FUNC(this);
+
    return new Input(this);
 }
 
@@ -126,8 +143,15 @@ EAbstractAnalytic::Input* RMT::makeInput()
 
 
 
+/*!
+ * Initialize this analytic. This implementation checks to make sure the input
+ * data object and output log file have been set, and that various integer
+ * arguments are valid.
+ */
 void RMT::initialize()
 {
+   EDEBUG_FUNC(this);
+
    // make sure input and output were set properly
    if ( !_input || !_logfile )
    {
@@ -147,11 +171,11 @@ void RMT::initialize()
    }
 
    // make sure pace arguments are valid
-   if ( _minUnfoldingPace >= _maxUnfoldingPace )
+   if ( _minSplinePace >= _maxSplinePace )
    {
       E_MAKE_EXCEPTION(e);
       e.setTitle(tr("Invalid Argument"));
-      e.setDetails(tr("Minimum unfolding pace must be less than maximum unfolding pace."));
+      e.setDetails(tr("Minimum spline pace must be less than maximum spline pace."));
       throw e;
    }
 }
@@ -161,36 +185,35 @@ void RMT::initialize()
 
 
 
-QVector<float> RMT::computeMaximums(const QVector<float>& matrix)
+/*!
+ * Compute the row-wise maximums of a correlation matrix.
+ *
+ * @param pairs
+ */
+QVector<float> RMT::computeMaximums(const QVector<RawPair>& pairs)
 {
-   const int N {_input->geneSize()};
-   const int K {_input->maxClusterSize()};
+   EDEBUG_FUNC(this,&pairs);
 
    // initialize elements to minimum value
-   QVector<float> maximums(N * K, 0);
+   QVector<float> maximums(_input->geneSize(), 0);
 
-   // compute maximum of each row/column
-   for ( int i = 0; i < N; ++i )
+   // compute maximum correlation of each row
+   for ( auto& pair : pairs )
    {
-      for ( int j = 0; j < i; ++j )
-      {
-         for ( int k = 0; k < K; ++k )
-         {
-            float correlation = fabs(matrix[i * N * K + j * K + k]);
+      int i = pair.index.getX();
 
-            if ( maximums[i * K + k] < correlation )
-            {
-               maximums[i * K + k] = correlation;
-            }
+      for ( int k = 0; k < pair.correlations.size(); ++k )
+      {
+         float correlation = fabs(pair.correlations[k]);
 
-            if ( maximums[j * K + k] < correlation )
-            {
-               maximums[j * K + k] = correlation;
-            }
+         if ( maximums[i] < correlation )
+         {
+            maximums[i] = correlation;
          }
       }
    }
 
+   // return row-wise maximums
    return maximums;
 }
 
@@ -199,48 +222,107 @@ QVector<float> RMT::computeMaximums(const QVector<float>& matrix)
 
 
 
-QVector<float> RMT::computePruneMatrix(const QVector<float>& matrix, const QVector<float>& maximums, float threshold, int* size)
+/*!
+ * Compute the pruned matrix of a correlation matrix with a given threshold. This
+ * function uses the pre-computed row-wise maximums for faster computation. The
+ * returned matrix is equivalent to the correlation matrix with all correlations
+ * below the given threshold removed, and all zero-columns removed. Additionally,
+ * the number of rows in the pruned matrix is returned as a pointer argument.
+ *
+ * @param pairs
+ * @param maximums
+ * @param threshold
+ * @param size
+ */
+QVector<float> RMT::computePruneMatrix(const QVector<RawPair>& pairs, const QVector<float>& maximums, float threshold, int* size)
 {
-   const int N {_input->geneSize()};
-   const int K {_input->maxClusterSize()};
+   EDEBUG_FUNC(this,&pairs,&maximums,threshold,size);
 
-   // generate vector of row/column indices that have a correlation above threshold
-   QVector<int> indices;
+   // generate vector of row indices that have a correlation above threshold
+   QVector<int> indices(_input->geneSize(), -1);
+   int pruneSize = 0;
 
    for ( int i = 0; i < maximums.size(); ++i )
    {
       if ( maximums[i] >= threshold )
       {
-         indices.append(i);
+         indices[i] = pruneSize;
+         pruneSize++;
       }
    }
 
    // extract pruned matrix from correlation matrix
-   QVector<float> pruneMatrix(indices.size() * indices.size());
+   QVector<float> pruneMatrix(pruneSize * pruneSize);
 
-   for ( int i = 0; i < indices.size(); ++i )
+   // initialize diagonal
+   for ( int i = 0; i < pruneSize; ++i )
    {
-      for ( int j = 0; j < i; ++j )
+      pruneMatrix[i * pruneSize + i] = 1;
+   }
+
+   // iterate through all pairs
+   for ( auto& pair : pairs )
+   {
+      // get indices into pruned matrix
+      int i = indices[pair.index.getX()];
+      int j = indices[pair.index.getY()];
+
+      // skip pair if it was pruned
+      if ( i == -1 || j == -1 )
       {
-         if ( indices[i] % K != indices[j] % K )
-         {
-            continue;
-         }
+         continue;
+      }
 
-         float correlation = matrix[indices[i]/K * N * K + indices[j]/K * K + indices[i] % K];
+      // select correlation from pair using reduction method
+      float correlation = 0;
 
-         if ( fabs(correlation) >= threshold )
+      switch ( _reductionMethod )
+      {
+      case ReductionMethod::First:
+      {
+         correlation = pair.correlations[0];
+         break;
+      }
+      case ReductionMethod::MaximumCorrelation:
+      {
+         for ( int k = 0; k < pair.correlations.size(); k++ )
          {
-            pruneMatrix[i * indices.size() + j] = correlation;
+            float r = fabs(pair.correlations[k]);
+
+            if ( correlation < r )
+            {
+               correlation = r;
+            }
          }
+         break;
+      }
+      case ReductionMethod::MaximumSize:
+      {
+         E_MAKE_EXCEPTION(e);
+         e.setTitle(tr("Unsupported Option"));
+         e.setDetails(tr("Pairwise reduction by maximum size is not yet supported."));
+         throw e;
       }
+      case ReductionMethod::Random:
+      {
+         int k = qrand() % pair.correlations.size();
+         correlation = pair.correlations[k];
+         break;
+      }
+      };
 
-      pruneMatrix[i * indices.size() + i] = 1;
+      // save correlation if it is above threshold
+      if ( fabs(correlation) >= threshold )
+      {
+         pruneMatrix[i * pruneSize + j] = correlation;
+         pruneMatrix[j * pruneSize + i] = correlation;
+      }
    }
 
    // save size of pruned matrix
-   *size = indices.size();
+   *size = pruneSize;
 
+   // return pruned matrix
    return pruneMatrix;
 }
 
@@ -249,36 +331,35 @@ QVector<float> RMT::computePruneMatrix(const QVector<float>& matrix, const QVect
 
 
 
-QVector<float> RMT::computeEigenvalues(QVector<float>* pruneMatrix, int size)
+/*!
+ * Compute the eigenvalues of a correlation matrix.
+ *
+ * @param matrix
+ * @param size
+ */
+QVector<float> RMT::computeEigenvalues(QVector<float>* matrix, int size)
 {
-   // using declarations for gsl resources
-   using gsl_vector_ptr = unique_ptr<gsl_vector,decltype(&gsl_vector_free)>;
-   using gsl_matrix_ptr = unique_ptr<gsl_matrix,decltype(&gsl_matrix_free)>;
-   using gsl_eigen_symmv_workspace_ptr = unique_ptr<gsl_eigen_symmv_workspace
-      ,decltype(&gsl_eigen_symmv_free)>;
-
-   QVector<double> temp;
-   for (auto val: *pruneMatrix) temp.append(val);
-
-   // make and initialize gsl eigen resources
-   gsl_matrix_view view = gsl_matrix_view_array(temp.data(),size,size);
-   gsl_vector_ptr eval (gsl_vector_alloc(size),&gsl_vector_free);
-   gsl_matrix_ptr evec (gsl_matrix_alloc(size,size),&gsl_matrix_free);
-   gsl_eigen_symmv_workspace_ptr work (gsl_eigen_symmv_alloc(size),&gsl_eigen_symmv_free);
-
-   // have gsl compute eigen values for the pruned matrix
-   gsl_eigen_symmv(&view.matrix,eval.get(),evec.get(),work.get());
-   gsl_eigen_symmv_sort(eval.get(),evec.get(),GSL_EIGEN_SORT_ABS_ASC);
-
-   // create return vector and get eigen values from gsl
-   QVector<float> ret(size);
-   for (int i = 0; i < size ;i++)
+   EDEBUG_FUNC(this,matrix,size);
+
+   // initialize eigenvalues and workspace
+   QVector<float> eigens(size);
+   QVector<float> work(5 * size);
+
+   // compute eigenvalues
+   int info = LAPACKE_ssyev_work(
+      LAPACK_COL_MAJOR, 'N', 'U',
+      size, matrix->data(), size,
+      eigens.data(),
+      work.data(), work.size());
+
+   // print warning if LAPACKE returned error code
+   if ( info != 0 )
    {
-      ret[i] = gsl_vector_get(eval.get(),i);
+      qInfo("warning: LAPACKE ssyev returned %d", info);
    }
 
-   // return eigen values vector
-   return ret;
+   // return eigenvalues
+   return eigens;
 }
 
 
@@ -286,40 +367,93 @@ QVector<float> RMT::computeEigenvalues(QVector<float>* pruneMatrix, int size)
 
 
 
-float RMT::computeChiSquare(const QVector<float>& eigens)
+/*!
+ * Return the unique values of a sorted list of real numbers. Two real numbers
+ * are unique if their absolute difference is greater than some small value
+ * epsilon.
+ *
+ * @param values
+ */
+QVector<float> RMT::computeUnique(const QVector<float>& values)
 {
-   // compute unique eigenvalues
-   QVector<float> unique {degenerate(eigens)};
+   EDEBUG_FUNC(this,&values);
 
-   qInfo("unique eigenvalues: %d", unique.size());
+   const float EPSILON {1e-6};
+   QVector<float> unique;
 
-   // make sure there are enough unique eigenvalues
-   if ( unique.size() < _minEigenvalueSize )
+   for ( int i = 1; i < values.size(); ++i )
    {
-      return -1;
+      if ( unique.isEmpty() || fabs(values.at(i) - unique.last()) > EPSILON )
+      {
+         unique.append(values.at(i));
+      }
    }
 
-   // perform several chi-square tests by varying the pace
-   float chi {0.0};
-   int chiTestCount {0};
+   return unique;
+}
+
+
 
-   for ( int pace = _minUnfoldingPace; pace <= _maxUnfoldingPace; ++pace )
+
+
+
+
+/*!
+ * Compute the chi-squared test for the nearest-neighbor spacing distribution
+ * (NNSD) of a list of eigenvalues. The list should be sorted and should contain
+ * only unique values. If spline interpolation is enabled, the chi-squared value
+ * is an average of several chi-squared tests, in which splines of varying pace
+ * are applied to the eigenvalues. Otherwise, a single chi-squared test is
+ * performed directly on the eigenvalues.
+ *
+ * @param eigens
+ */
+float RMT::computeChiSquare(const QVector<float>& eigens)
+{
+   EDEBUG_FUNC(this,&eigens);
+
+   // make sure there are enough eigenvalues
+   if ( eigens.size() < _minEigenvalueSize )
    {
-      // perform test only if there are enough eigenvalues for pace
-      if ( unique.size() / pace < 5 )
+      return -1;
+   }
+
+   // determine whether spline interpolation is enabled
+   if ( _splineInterpolation )
+   {
+      // perform several chi-squared tests with spline interpolation by varying the pace
+      float chi {0.0};
+      int chiTestCount {0};
+
+      for ( int pace = _minSplinePace; pace <= _maxSplinePace; ++pace )
       {
-         break;
-      }
+         // perform test only if there are enough eigenvalues for pace
+         if ( eigens.size() / pace < 5 )
+         {
+            break;
+         }
 
-      chi += computePaceChiSquare(unique, pace);
-      ++chiTestCount;
-   }
+         // compute spline-interpolated eigenvalues
+         QVector<float> splineEigens {computeSpline(eigens, pace)};
 
-   // compute average of chi-square tests
-   chi /= chiTestCount;
+         // compute chi-squared value
+         float chiPace {computeChiSquareHelper(splineEigens)};
 
-   // return chi value
-   return chi;
+         qInfo("pace: %d, chi-squared: %g", pace, chiPace);
+
+         // append chi-squared value to running sum
+         chi += chiPace;
+         ++chiTestCount;
+      }
+
+      // return average of chi-squared tests
+      return chi / chiTestCount;
+   }
+   else
+   {
+      // perform a single chi-squared test without spline interpolation
+      return computeChiSquareHelper(eigens);
+   }
 }
 
 
@@ -327,12 +461,21 @@ float RMT::computeChiSquare(const QVector<float>& eigens)
 
 
 
-float RMT::computePaceChiSquare(const QVector<float>& eigens, int pace)
+/*!
+ * Compute the chi-squared test for the nearest-neighbor spacing distribution
+ * (NNSD) of a list of values. The list should be sorted and should contain only
+ * unique values.
+ *
+ * @param values
+ */
+float RMT::computeChiSquareHelper(const QVector<float>& values)
 {
-   // compute eigenvalue spacings
-   QVector<float> spacings {unfold(eigens, pace)};
+   EDEBUG_FUNC(this,&values);
+
+   // compute spacings
+   QVector<float> spacings {computeSpacings(values)};
 
-   // compute nearest-neighbor spacing distribution
+   // compute histogram of spacings
    const float histogramMin {0};
    const float histogramMax {3};
    const float histogramBinWidth {(histogramMax - histogramMin) / _histogramBinSize};
@@ -346,7 +489,7 @@ float RMT::computePaceChiSquare(const QVector<float>& eigens, int pace)
       }
    }
 
-   // compute chi-square value from nearest-neighbor spacing distribution
+   // compute chi-squared value from the histogram
    float chi {0.0};
 
    for ( int i = 0; i < histogram.size(); ++i )
@@ -355,14 +498,12 @@ float RMT::computePaceChiSquare(const QVector<float>& eigens, int pace)
       float O_i {histogram[i]};
 
       // compute E_i, the expected value of Poisson distribution for bin i
-      float E_i {(exp(-i * histogramBinWidth) - exp(-(i + 1) * histogramBinWidth)) * eigens.size()};
+      float E_i {(exp(-i * histogramBinWidth) - exp(-(i + 1) * histogramBinWidth)) * values.size()};
 
-      // update chi-square value based on difference between O_i and E_i
+      // update chi-squared value based on difference between O_i and E_i
       chi += (O_i - E_i) * (O_i - E_i) / E_i;
    }
 
-   qInfo("pace: %d, chi: %g", pace, chi);
-
    return chi;
 }
 
@@ -371,44 +512,34 @@ float RMT::computePaceChiSquare(const QVector<float>& eigens, int pace)
 
 
 
-QVector<float> RMT::degenerate(const QVector<float>& eigens)
+/*!
+ * Compute a spline interpolation of a list of values using the given pace. The
+ * list should be sorted and should contain only unique values. The pace determines
+ * the ratio of values which are used as points to create the spline; for example,
+ * a pace of 10 means that every 10th value is used to create the spline.
+ *
+ * @param values
+ * @param pace
+ */
+QVector<float> RMT::computeSpline(const QVector<float>& values, int pace)
 {
-   const float EPSILON {1e-6};
-   QVector<float> unique;
-
-   for ( int i = 1; i < eigens.size(); ++i )
-   {
-      if ( unique.isEmpty() || fabs(eigens.at(i) - unique.last()) > EPSILON )
-      {
-         unique.append(eigens.at(i));
-      }
-   }
-
-   return unique;
-}
-
-
-
+   EDEBUG_FUNC(this,&values,pace);
 
-
-
-QVector<float> RMT::unfold(const QVector<float>& eigens, int pace)
-{
    // using declarations for gsl resource pointers
    using gsl_interp_accel_ptr = unique_ptr<gsl_interp_accel, decltype(&gsl_interp_accel_free)>;
    using gsl_spline_ptr = unique_ptr<gsl_spline, decltype(&gsl_spline_free)>;
 
    // extract eigenvalues for spline based on pace
-   int splineSize {eigens.size() / pace};
+   int splineSize {values.size() / pace};
    unique_ptr<double[]> x(new double[splineSize]);
    unique_ptr<double[]> y(new double[splineSize]);
 
    for ( int i = 0; i < splineSize; ++i )
    {
-      x[i] = (double)eigens.at(i*pace);
-      y[i] = (double)(i*pace + 1) / eigens.size();
+      x[i] = (double)values.at(i*pace);
+      y[i] = (double)(i*pace + 1) / values.size();
    }
-   x[splineSize - 1] = eigens.back();
+   x[splineSize - 1] = values.back();
    y[splineSize - 1] = 1.0;
 
    // initialize gsl spline
@@ -417,22 +548,41 @@ QVector<float> RMT::unfold(const QVector<float>& eigens, int pace)
    gsl_spline_init(spline.get(), x.get(), y.get(), splineSize);
 
    // extract interpolated eigenvalues from spline
-   QVector<float> splineEigens(eigens.size());
+   QVector<float> splineValues(values.size());
 
-   splineEigens[0] = 0.0;
-   splineEigens[eigens.size() - 1] = 1.0;
+   splineValues[0] = 0.0;
+   splineValues[values.size() - 1] = 1.0;
 
-   for ( int i = 1; i < eigens.size() - 1; ++i )
+   for ( int i = 1; i < values.size() - 1; ++i )
    {
-      splineEigens[i] = gsl_spline_eval(spline.get(), eigens.at(i), interp.get());
+      splineValues[i] = gsl_spline_eval(spline.get(), values.at(i), interp.get());
    }
 
+   // return interpolated values
+   return splineValues;
+}
+
+
+
+
+
+
+/*!
+ * Compute the spacings of a list of values. The list should be sorted and should
+ * contain only unique values.
+ *
+ * @param values
+ */
+QVector<float> RMT::computeSpacings(const QVector<float>& values)
+{
+   EDEBUG_FUNC(this,&values);
+
    // compute spacings between interpolated eigenvalues
-   QVector<float> spacings(eigens.size() - 1);
+   QVector<float> spacings(values.size() - 1);
 
    for ( int i = 0; i < spacings.size(); ++i )
    {
-      spacings[i] = (splineEigens.at(i + 1) - splineEigens.at(i)) * eigens.size();
+      spacings[i] = (values.at(i + 1) - values.at(i)) * values.size();
    }
 
    return spacings;
diff --git a/src/core/rmt.h b/src/core/rmt.h
index 7ebe305..02d9af6 100644
--- a/src/core/rmt.h
+++ b/src/core/rmt.h
@@ -1,13 +1,24 @@
 #ifndef RMT_H
 #define RMT_H
 #include <ace/core/core.h>
+#include "correlationmatrix.h"
 
 
 
-class CorrelationMatrix;
-
-
-
+/*!
+ * This class implements the RMT analytic. This analytic takes a correlation
+ * matrix and attempts to find a threshold which, when applied to the correlation
+ * matrix, produces a scale-free network. This analytic uses Random Matrix Theory
+ * (RMT), which involves computing the eigenvalues of a thresholded correlation
+ * matrix, computing the nearest-neighbor spacing distribution (NNSD) of the eigenvalues,
+ * and comparing the distribution to a Poisson distribution using a chi-squared
+ * test. This process is repeated at each threshold step from the starting threshold;
+ * as the threshold decreases, the NNSD changes from a Poisson distribution to
+ * a Gaussian orthogonal ensemble (GOE) distribution, so the chi-squared value
+ * decreases. When the threshold approaches the scale-free threshold, the chi-squared
+ * value increases sharply, and the final threshold is chosen as the lowest threshold
+ * which produced a chi-squared value below the critical value.
+ */
 class RMT : public EAbstractAnalytic
 {
    Q_OBJECT
@@ -18,24 +29,105 @@ class RMT : public EAbstractAnalytic
    virtual EAbstractAnalytic::Input* makeInput() override final;
    virtual void initialize();
 private:
-   QVector<float> computeMaximums(const QVector<float>& matrix);
-   QVector<float> computePruneMatrix(const QVector<float>& matrix, const QVector<float>& maximums, float threshold, int* size);
+   /*!
+    * Defines the reduction methods this analytic supports.
+    */
+   enum class ReductionMethod
+   {
+      /*!
+       * Select the first cluster
+       */
+      First
+      /*!
+       * Select the cluster with the highest absolute correlation
+       */
+      ,MaximumCorrelation
+      /*!
+       * Select the cluster with the largest sample size
+       */
+      ,MaximumSize
+      /*!
+       * Select a random cluster
+       */
+      ,Random
+   };
+private:
+   QVector<float> computeMaximums(const QVector<CorrelationMatrix::RawPair>& pairs);
+   QVector<float> computePruneMatrix(const QVector<CorrelationMatrix::RawPair>& pairs, const QVector<float>& maximums, float threshold, int* size);
    QVector<float> computeEigenvalues(QVector<float>* pruneMatrix, int size);
+   QVector<float> computeUnique(const QVector<float>& values);
    float computeChiSquare(const QVector<float>& eigens);
-   float computePaceChiSquare(const QVector<float>& eigens, int pace);
-   QVector<float> degenerate(const QVector<float>& eigens);
-   QVector<float> unfold(const QVector<float>& eigens, int pace);
-
+   float computeChiSquareHelper(const QVector<float>& values);
+   QVector<float> computeSpline(const QVector<float>& values, int pace);
+   QVector<float> computeSpacings(const QVector<float>& values);
+   /*!
+    * Pointer to the input correlation matrix.
+    */
    CorrelationMatrix* _input {nullptr};
+   /*!
+    * Pointer to the output log file.
+    */
    QFile* _logfile {nullptr};
+   /*!
+    * The reduction method to use. Pairwise reduction is used to select pairwise
+    * correlations when there are multiple correlations per pair. By default, the
+    * first cluster is selected from each pair.
+    */
+   ReductionMethod _reductionMethod {ReductionMethod::First};
+   /*!
+    * The starting threshold.
+    */
    float _thresholdStart {0.99};
+   /*!
+    * The threshold decrement.
+    */
    float _thresholdStep {0.001};
+   /*!
+    * The stopping threshold. The analytic will fail if it cannot find a
+    * proper threshold before reaching the stopping threshold.
+    */
    float _thresholdStop {0.5};
+   /*!
+    * The critical value for the chi-squared test, which is dependent on the
+    * degrees of freedom and the alpha-value of the test. This particular
+    * value is based on df = 60 and alpha = 0.001. Note that since the degrees
+    * of freedom corresponds to the number of histogram bins, this value
+    * must be re-calculated if the number of histogram bins is changed.
+    */
    float _chiSquareThreshold1 {99.607};
+   /*!
+    * The final chi-squared threshold. Once the chi-squared test goes below the
+    * chi-squared critical value, it must go above this value in order for the
+    * analytic to find a proper threshold.
+    */
    float _chiSquareThreshold2 {200};
+   /*!
+    * The minimum number of unique eigenvalues which must exist in a pruned matrix
+    * for the analytic to compute the NNSD of the eigenvalues. If the number of
+    * unique eigenvalues is less, the chi-squared test for that threshold is skipped.
+    */
    int _minEigenvalueSize {50};
-   int _minUnfoldingPace {10};
-   int _maxUnfoldingPace {40};
+   /*!
+    * Whether to perform spline interpolation on each set of eigenvalues before
+    * computing the spacings. If this option is enabled then the chi-squared value
+    * for each set of eigenvalues will be the average of multiple tests in which
+    * the spline pace is varied (according to the minimum and maximum spline pace);
+    * otherwise, only one test is performed for each set of eigenvalues.
+    */
+   bool _splineInterpolation {true};
+   /*!
+    * The minimum pace of the spline interpolation.
+    */
+   int _minSplinePace {10};
+   /*!
+    * The maximum pace of the spline interpolation.
+    */
+   int _maxSplinePace {40};
+   /*!
+    * The number of histogram bins in the NNSD of eigenvalues. This value
+    * corresponds to the degrees of freedom in the chi-squared test, therefore
+    * it affects the setting of the chi-squared critical value.
+    */
    int _histogramBinSize {60};
 };
 
diff --git a/src/core/rmt_input.cpp b/src/core/rmt_input.cpp
index 57233cd..87c3f3b 100644
--- a/src/core/rmt_input.cpp
+++ b/src/core/rmt_input.cpp
@@ -7,18 +7,48 @@
 
 
 
+/*!
+ * String list of reduction methods for this analytic that correspond exactly
+ * to its enumeration. Used for handling the reduction method argument for this
+ * input object.
+ */
+const QStringList RMT::Input::REDUCTION_NAMES
+{
+   "first"
+   ,"maxcorr"
+   ,"maxsize"
+   ,"random"
+};
+
+
+
+
+
+
+/*!
+ * Construct a new input object with the given analytic as its parent.
+ *
+ * @param parent
+ */
 RMT::Input::Input(RMT* parent):
    EAbstractAnalytic::Input(parent),
    _base(parent)
-{}
+{
+   EDEBUG_FUNC(this,parent);
+}
 
 
 
 
 
 
+/*!
+ * Return the total number of arguments this analytic type contains.
+ */
 int RMT::Input::size() const
 {
+   EDEBUG_FUNC(this);
+
    return Total;
 }
 
@@ -27,17 +57,26 @@ int RMT::Input::size() const
 
 
 
+/*!
+ * Return the argument type for a given index.
+ *
+ * @param index
+ */
 EAbstractAnalytic::Input::Type RMT::Input::type(int index) const
 {
+   EDEBUG_FUNC(this,index);
+
    switch (index)
    {
    case InputData: return Type::DataIn;
    case LogFile: return Type::FileOut;
+   case ReductionType: return Type::Selection;
    case ThresholdStart: return Type::Double;
    case ThresholdStep: return Type::Double;
    case ThresholdStop: return Type::Double;
-   case MinUnfoldingPace: return Type::Integer;
-   case MaxUnfoldingPace: return Type::Integer;
+   case SplineInterpolation: return Type::Boolean;
+   case MinSplinePace: return Type::Integer;
+   case MaxSplinePace: return Type::Integer;
    case HistogramBinSize: return Type::Integer;
    default: return Type::Boolean;
    }
@@ -48,8 +87,16 @@ EAbstractAnalytic::Input::Type RMT::Input::type(int index) const
 
 
 
+/*!
+ * Return data for a given role on an argument with the given index.
+ *
+ * @param index
+ * @param role
+ */
 QVariant RMT::Input::data(int index, Role role) const
 {
+   EDEBUG_FUNC(this,index,role);
+
    switch (index)
    {
    case InputData:
@@ -70,6 +117,16 @@ QVariant RMT::Input::data(int index, Role role) const
       case Role::FileFilters: return tr("Text file %1").arg("(*.txt)");
       default: return QVariant();
       }
+   case ReductionType:
+      switch (role)
+      {
+      case Role::CommandLineName: return QString("reduction");
+      case Role::Title: return tr("Reduction Method:");
+      case Role::WhatsThis: return tr("Method to use for pairwise reduction.");
+      case Role::SelectionValues: return REDUCTION_NAMES;
+      case Role::Default: return "first";
+      default: return QVariant();
+      }
    case ThresholdStart:
       switch (role)
       {
@@ -103,23 +160,32 @@ QVariant RMT::Input::data(int index, Role role) const
       case Role::Maximum: return 1;
       default: return QVariant();
       }
-   case MinUnfoldingPace:
+   case SplineInterpolation:
+      switch (role)
+      {
+      case Role::CommandLineName: return QString("spline");
+      case Role::Title: return tr("Use Spline Interpolation:");
+      case Role::WhatsThis: return tr("Whether to perform spline interpolation on each set of eigenvalues.");
+      case Role::Default: return true;
+      default: return QVariant();
+      }
+   case MinSplinePace:
       switch (role)
       {
       case Role::CommandLineName: return QString("minpace");
-      case Role::Title: return tr("Minimum Unfolding Pace:");
-      case Role::WhatsThis: return tr("The minimum pace with which to perform unfolding.");
+      case Role::Title: return tr("Minimum Spline Pace:");
+      case Role::WhatsThis: return tr("The minimum pace of the spline interpolation.");
       case Role::Default: return 10;
       case Role::Minimum: return 1;
       case Role::Maximum: return std::numeric_limits<int>::max();
       default: return QVariant();
       }
-   case MaxUnfoldingPace:
+   case MaxSplinePace:
       switch (role)
       {
       case Role::CommandLineName: return QString("maxpace");
-      case Role::Title: return tr("Maximum Unfolding Pace:");
-      case Role::WhatsThis: return tr("The maximum pace with which to perform unfolding.");
+      case Role::Title: return tr("Maximum Spline Pace:");
+      case Role::WhatsThis: return tr("The maximum pace of the spline interpolation.");
       case Role::Default: return 40;
       case Role::Minimum: return 1;
       case Role::Maximum: return std::numeric_limits<int>::max();
@@ -145,10 +211,21 @@ QVariant RMT::Input::data(int index, Role role) const
 
 
 
+/*!
+ * Set an argument with the given index to the given value.
+ *
+ * @param index
+ * @param value
+ */
 void RMT::Input::set(int index, const QVariant& value)
 {
+   EDEBUG_FUNC(this,index,&value);
+
    switch (index)
    {
+   case ReductionType:
+      _base->_reductionMethod = static_cast<ReductionMethod>(REDUCTION_NAMES.indexOf(value.toString()));
+      break;
    case ThresholdStart:
       _base->_thresholdStart = value.toDouble();
       break;
@@ -158,11 +235,14 @@ void RMT::Input::set(int index, const QVariant& value)
    case ThresholdStop:
       _base->_thresholdStop = value.toDouble();
       break;
-   case MinUnfoldingPace:
-      _base->_minUnfoldingPace = value.toInt();
+   case SplineInterpolation:
+      _base->_splineInterpolation = value.toBool();
       break;
-   case MaxUnfoldingPace:
-      _base->_maxUnfoldingPace = value.toInt();
+   case MinSplinePace:
+      _base->_minSplinePace = value.toInt();
+      break;
+   case MaxSplinePace:
+      _base->_maxSplinePace = value.toInt();
       break;
    case HistogramBinSize:
       _base->_histogramBinSize = value.toInt();
@@ -175,8 +255,16 @@ void RMT::Input::set(int index, const QVariant& value)
 
 
 
+/*!
+ * Set a file argument with the given index to the given qt file pointer.
+ *
+ * @param index
+ * @param file
+ */
 void RMT::Input::set(int index, QFile* file)
 {
+   EDEBUG_FUNC(this,index,file);
+
    if ( index == LogFile )
    {
       _base->_logfile = file;
@@ -188,8 +276,16 @@ void RMT::Input::set(int index, QFile* file)
 
 
 
+/*!
+ * Set a data argument with the given index to the given data object pointer.
+ *
+ * @param index
+ * @param data
+ */
 void RMT::Input::set(int index, EAbstractData* data)
 {
+   EDEBUG_FUNC(this,index,data);
+
    if ( index == InputData )
    {
       _base->_input = data->cast<CorrelationMatrix>();
diff --git a/src/core/rmt_input.h b/src/core/rmt_input.h
index 4ac2873..ac4e519 100644
--- a/src/core/rmt_input.h
+++ b/src/core/rmt_input.h
@@ -4,19 +4,27 @@
 
 
 
+/*!
+ * This class implements the abstract input of the RMT analytic.
+ */
 class RMT::Input : public EAbstractAnalytic::Input
 {
    Q_OBJECT
 public:
+   /*!
+    * Defines all input arguments for this analytic.
+    */
    enum Argument
    {
       InputData = 0
       ,LogFile
+      ,ReductionType
       ,ThresholdStart
       ,ThresholdStep
       ,ThresholdStop
-      ,MinUnfoldingPace
-      ,MaxUnfoldingPace
+      ,SplineInterpolation
+      ,MinSplinePace
+      ,MaxSplinePace
       ,HistogramBinSize
       ,Total
    };
@@ -28,6 +36,10 @@ class RMT::Input : public EAbstractAnalytic::Input
    virtual void set(int index, QFile* file) override final;
    virtual void set(int index, EAbstractData* data) override final;
 private:
+   static const QStringList REDUCTION_NAMES;
+   /*!
+    * Pointer to the base analytic for this object.
+    */
    RMT* _base;
 };
 
diff --git a/src/core/similarity.cpp b/src/core/similarity.cpp
index 8155fbf..11bf97b 100644
--- a/src/core/similarity.cpp
+++ b/src/core/similarity.cpp
@@ -4,6 +4,10 @@
 #include "similarity_serial.h"
 #include "similarity_workblock.h"
 #include "similarity_opencl.h"
+#include "ccmatrix_pair.h"
+#include "correlationmatrix_pair.h"
+#include <ace/core/ace_qmpi.h>
+#include <ace/core/elog.h>
 
 
 
@@ -14,12 +18,32 @@ using namespace std;
 
 
 
+/*!
+ * Return the total number of pairs that must be processed for a given
+ * expression matrix.
+ *
+ * @param emx
+ */
+qint64 Similarity::totalPairs(const ExpressionMatrix* emx) const
+{
+   EDEBUG_FUNC(this,emx);
+
+   return (qint64) emx->geneSize() * (emx->geneSize() - 1) / 2;
+}
+
+
+
+
+
+
+/*!
+ * Return the total number of work blocks this analytic must process.
+ */
 int Similarity::size() const
 {
-   const qint64 totalPairs {(qint64) _input->getGeneSize() * (_input->getGeneSize() - 1) / 2};
-   const qint64 WORK_BLOCK_SIZE { 32 * 1024 };
+   EDEBUG_FUNC(this);
 
-   return (totalPairs + WORK_BLOCK_SIZE - 1) / WORK_BLOCK_SIZE;
+   return (totalPairs(_input) + _workBlockSize - 1) / _workBlockSize;
 }
 
 
@@ -27,13 +51,24 @@ int Similarity::size() const
 
 
 
+/*!
+ * Create and return a work block for this analytic with the given index. This
+ * implementation creates a work block with a start index and size denoting the
+ * number of pairs to process.
+ *
+ * @param index
+ */
 std::unique_ptr<EAbstractAnalytic::Block> Similarity::makeWork(int index) const
 {
-   const qint64 totalPairs {(qint64) _input->getGeneSize() * (_input->getGeneSize() - 1) / 2};
-   const qint64 WORK_BLOCK_SIZE { 32 * 1024 };
+   EDEBUG_FUNC(this,index);
 
-   qint64 start {index * WORK_BLOCK_SIZE};
-   qint64 size {min(totalPairs - start, WORK_BLOCK_SIZE)};
+   if ( ELog::isActive() )
+   {
+      ELog() << tr("Making work index %1 of %2.\n").arg(index).arg(size());
+   }
+
+   qint64 start {index * (qint64) _workBlockSize};
+   qint64 size {min(totalPairs(_input) - start, (qint64) _workBlockSize)};
 
    return unique_ptr<EAbstractAnalytic::Block>(new WorkBlock(index, start, size));
 }
@@ -43,8 +78,13 @@ std::unique_ptr<EAbstractAnalytic::Block> Similarity::makeWork(int index) const
 
 
 
+/*!
+ * Create an empty and uninitialized work block.
+ */
 std::unique_ptr<EAbstractAnalytic::Block> Similarity::makeWork() const
 {
+   EDEBUG_FUNC(this);
+
    return unique_ptr<EAbstractAnalytic::Block>(new WorkBlock);
 }
 
@@ -53,8 +93,13 @@ std::unique_ptr<EAbstractAnalytic::Block> Similarity::makeWork() const
 
 
 
+/*!
+ * Create an empty and uninitialized result block.
+ */
 std::unique_ptr<EAbstractAnalytic::Block> Similarity::makeResult() const
 {
+   EDEBUG_FUNC(this);
+
    return unique_ptr<EAbstractAnalytic::Block>(new ResultBlock);
 }
 
@@ -63,8 +108,22 @@ std::unique_ptr<EAbstractAnalytic::Block> Similarity::makeResult() const
 
 
 
+/*!
+ * Read in a block of results made from a block of work with the corresponding
+ * index. This implementation takes the Pair objects in the result block and
+ * saves them to the output correlation matrix and cluster matrix.
+ *
+ * @param result
+ */
 void Similarity::process(const EAbstractAnalytic::Block* result)
 {
+   EDEBUG_FUNC(this,result);
+
+   if ( ELog::isActive() )
+   {
+      ELog() << tr("Processing result %1 of %2.\n").arg(result->index()).arg(size());
+   }
+
    const ResultBlock* resultBlock {result->cast<ResultBlock>()};
 
    // iterate through all pairs in result block
@@ -85,7 +144,7 @@ void Similarity::process(const EAbstractAnalytic::Block* result)
             {
                ccmPair.addCluster();
 
-               for ( int i = 0; i < _input->getSampleSize(); ++i )
+               for ( int i = 0; i < _input->sampleSize(); ++i )
                {
                   ccmPair.at(ccmPair.clusterSize() - 1, i) = (pair.labels[i] >= 0)
                      ? (k == pair.labels[i])
@@ -131,8 +190,13 @@ void Similarity::process(const EAbstractAnalytic::Block* result)
 
 
 
+/*!
+ * Make a new input object and return its pointer.
+ */
 EAbstractAnalytic::Input* Similarity::makeInput()
 {
+   EDEBUG_FUNC(this);
+
    return new Input(this);
 }
 
@@ -141,8 +205,13 @@ EAbstractAnalytic::Input* Similarity::makeInput()
 
 
 
+/*!
+ * Make a new serial object and return its pointer.
+ */
 EAbstractAnalytic::Serial* Similarity::makeSerial()
 {
+   EDEBUG_FUNC(this);
+
    return new Serial(this);
 }
 
@@ -151,8 +220,13 @@ EAbstractAnalytic::Serial* Similarity::makeSerial()
 
 
 
+/*!
+ * Make a new OpenCL object and return its pointer.
+ */
 EAbstractAnalytic::OpenCL* Similarity::makeOpenCL()
 {
+   EDEBUG_FUNC(this);
+
    return new OpenCL(this);
 }
 
@@ -161,19 +235,29 @@ EAbstractAnalytic::OpenCL* Similarity::makeOpenCL()
 
 
 
+/*!
+ * Initialize this analytic. This implementation checks to make sure that valid
+ * arguments were provided.
+ */
 void Similarity::initialize()
 {
-   if ( !isMaster() )
+   EDEBUG_FUNC(this);
+
+   // get MPI instance
+   auto& mpi {Ace::QMPI::instance()};
+
+   // only the master process needs to validate arguments
+   if ( !mpi.isMaster() )
    {
       return;
    }
 
-   // make sure input and output are valid
-   if ( !_input || !_ccm || !_cmx )
+   // make sure input data is valid
+   if ( !_input )
    {
       E_MAKE_EXCEPTION(e);
       e.setTitle(tr("Invalid Argument"));
-      e.setDetails(tr("Did not get valid input and/or output arguments."));
+      e.setDetails(tr("Did not get a valid input data object."));
       throw e;
    }
 
@@ -186,12 +270,42 @@ void Similarity::initialize()
       throw e;
    }
 
+   // initialize work block size
+   if ( _workBlockSize == 0 )
+   {
+      int numWorkers = max(1, mpi.size() - 1);
+
+      _workBlockSize = min((qint64) 32768, totalPairs(_input) / numWorkers);
+   }
+}
+
+
+
+
+
+
+/*!
+ * Initialize the output data objects of this analytic.
+ */
+void Similarity::initializeOutputs()
+{
+   EDEBUG_FUNC(this);
+
+   // make sure output data is valid
+   if ( !_ccm || !_cmx )
+   {
+      E_MAKE_EXCEPTION(e);
+      e.setTitle(tr("Invalid Argument"));
+      e.setDetails(tr("Did not get valid output data objects."));
+      throw e;
+   }
+
    // initialize cluster matrix
-   _ccm->initialize(_input->getGeneNames(), _maxClusters, _input->getSampleNames());
+   _ccm->initialize(_input->geneNames(), _maxClusters, _input->sampleNames());
 
    // initialize correlation matrix
    EMetaArray correlations;
-   correlations.append(_corrModel->getName());
+   correlations.append(_corrName);
 
-   _cmx->initialize(_input->getGeneNames(), _maxClusters, correlations);
+   _cmx->initialize(_input->geneNames(), _maxClusters, correlations);
 }
diff --git a/src/core/similarity.h b/src/core/similarity.h
index 9543b7e..52fa5be 100644
--- a/src/core/similarity.h
+++ b/src/core/similarity.h
@@ -5,29 +5,55 @@
 #include "ccmatrix.h"
 #include "correlationmatrix.h"
 #include "expressionmatrix.h"
-#include "pairwise_clustering.h"
-#include "pairwise_correlation.h"
-#include "pairwise_gmm.h"
-#include "pairwise_pearson.h"
+#include "pairwise_clusteringmodel.h"
 
 
 
+/*!
+ * This class implements the similarity analytic. This analytic takes an
+ * expression matrix and computes a similarity matrix, where each element is
+ * a similarity measure of two genes in the expression matrix. The similarity
+ * is computed using a correlation measure. The similarity matrix can also have
+ * multiple modes within a pair; these modes can be optionally computed using a
+ * clustering method. This analytic produces two data objects: a correlation
+ * matrix containing the pairwise correlations, and a cluster matrix containing
+ * sample masks of the pairwise clusters. Sample masks for unimodal pairs are not
+ * saved to the cluster matrix. If clustering is not used, an empty cluster matrix
+ * is created. This analytic can also perform pairwise outlier removal before and
+ * after clustering, if clustering is used.
+ *
+ * This analytic can use MPI and it has both CPU and GPU implementations, as the
+ * pairwise clustering significantly increases the amount of computations required
+ * for a large expression matrix.
+ */
 class Similarity : public EAbstractAnalytic
 {
    Q_OBJECT
 public:
+   /*!
+    * Defines the pair structure used to send results in result blocks.
+    */
    struct Pair
    {
+      /*!
+       * The number of clusters in a pair.
+       */
       qint8 K;
+      /*!
+       * The cluster labels for a pair.
+       */
       QVector<qint8> labels;
+      /*!
+       * The correlation for each cluster in a pair.
+       */
       QVector<float> correlations;
    };
-
    class Input;
    class WorkBlock;
    class ResultBlock;
    class Serial;
    class OpenCL;
+public:
    virtual int size() const override final;
    virtual std::unique_ptr<EAbstractAnalytic::Block> makeWork(int index) const override final;
    virtual std::unique_ptr<EAbstractAnalytic::Block> makeWork() const override final;
@@ -37,38 +63,110 @@ class Similarity : public EAbstractAnalytic
    virtual EAbstractAnalytic::Serial* makeSerial() override final;
    virtual EAbstractAnalytic::OpenCL* makeOpenCL() override final;
    virtual void initialize() override final;
-
+   virtual void initializeOutputs() override final;
 private:
+   /*!
+    * Defines the clustering methods this analytic supports.
+    */
    enum class ClusteringMethod
    {
+      /*!
+       * No clustering
+       */
       None
+      /*!
+       * Gaussian mixture models
+       */
       ,GMM
-      ,KMeans
    };
-
+   /*!
+    * Defines the correlation methods this analytic supports.
+    */
    enum class CorrelationMethod
    {
+      /*!
+       * Pearson correlation
+       */
       Pearson
+      /*!
+       * Spearman rank correlation
+       */
       ,Spearman
    };
-
+private:
+   qint64 totalPairs(const ExpressionMatrix* emx) const;
+   /*!
+    * Pointer to the input expression matrix.
+    */
    ExpressionMatrix* _input {nullptr};
+   /*!
+    * Pointer to the output cluster matrix.
+    */
    CCMatrix* _ccm {nullptr};
+   /*!
+    * Pointer to the output correlation matrix.
+    */
    CorrelationMatrix* _cmx {nullptr};
+   /*!
+    * The clustering method to use.
+    */
    ClusteringMethod _clusMethod {ClusteringMethod::None};
+   /*!
+    * The correlation method to use.
+    */
    CorrelationMethod _corrMethod {CorrelationMethod::Pearson};
-   Pairwise::Clustering* _clusModel {nullptr};
-   Pairwise::Correlation* _corrModel {new Pairwise::Pearson()};
+   /*!
+    * The name of the correlation method.
+    */
+   QString _corrName;
+   /*!
+    * The minimum number of clean samples required to consider a pair.
+    */
    int _minSamples {30};
+   /*!
+    * The minimum expression value required to include a sample.
+    */
    float _minExpression {-std::numeric_limits<float>::infinity()};
+   /*!
+    * The minimum number of clusters to use in the clustering model.
+    */
    qint8 _minClusters {1};
+   /*!
+    * The maximum number of clusters to use in the clustering model.
+    */
    qint8 _maxClusters {5};
+   /*!
+    * The model selection criterion to use in the clustering model.
+    */
    Pairwise::Criterion _criterion {Pairwise::Criterion::ICL};
+   /*!
+    * Whether to remove outliers before clustering.
+    */
    bool _removePreOutliers {false};
+   /*!
+    * Whether to remove outliers after clustering.
+    */
    bool _removePostOutliers {false};
+   /*!
+    * The minimum (absolute) correlation threshold to save a correlation.
+    */
    float _minCorrelation {0.5};
+   /*!
+    * The maximum (absolute) correlation threshold to save a correlation.
+    */
    float _maxCorrelation {1.0};
-   int _kernelSize {4096};
+   /*!
+    * The number of pairs to process in each work block.
+    */
+   int _workBlockSize {0};
+   /*!
+    * The global work size for each OpenCL worker.
+    */
+   int _globalWorkSize {4096};
+   /*!
+    * The local work size for each OpenCL worker.
+    */
+   int _localWorkSize {0};
 };
 
 
diff --git a/src/core/similarity_input.cpp b/src/core/similarity_input.cpp
index ac2140c..757c270 100644
--- a/src/core/similarity_input.cpp
+++ b/src/core/similarity_input.cpp
@@ -1,20 +1,20 @@
 #include "similarity_input.h"
 #include "datafactory.h"
-#include "pairwise_gmm.h"
-#include "pairwise_kmeans.h"
-#include "pairwise_pearson.h"
-#include "pairwise_spearman.h"
 
 
 
 
 
 
+/*!
+ * String list of clustering methods for this analytic that correspond exactly
+ * to its enumeration. Used for handling the clustering method argument for this
+ * input object.
+ */
 const QStringList Similarity::Input::CLUSTERING_NAMES
 {
    "none"
    ,"gmm"
-   ,"kmeans"
 };
 
 
@@ -22,6 +22,11 @@ const QStringList Similarity::Input::CLUSTERING_NAMES
 
 
 
+/*!
+ * String list of correlation methods for this analytic that correspond exactly
+ * to its enumeration. Used for handling the correlation method argument for this
+ * input object.
+ */
 const QStringList Similarity::Input::CORRELATION_NAMES
 {
    "pearson"
@@ -33,9 +38,15 @@ const QStringList Similarity::Input::CORRELATION_NAMES
 
 
 
+/*!
+ * String list of criterion options for this analytic that correspond exactly
+ * to its enumeration. Used for handling the criterion argument for this input
+ * object.
+ */
 const QStringList Similarity::Input::CRITERION_NAMES
 {
-   "BIC"
+   "AIC"
+   ,"BIC"
    ,"ICL"
 };
 
@@ -44,10 +55,16 @@ const QStringList Similarity::Input::CRITERION_NAMES
 
 
 
+/*!
+ * Construct a new input object with the given analytic as its parent.
+ *
+ * @param parent
+ */
 Similarity::Input::Input(Similarity* parent):
    EAbstractAnalytic::Input(parent),
    _base(parent)
 {
+   EDEBUG_FUNC(this,parent);
 }
 
 
@@ -55,8 +72,13 @@ Similarity::Input::Input(Similarity* parent):
 
 
 
+/*!
+ * Return the total number of arguments this analytic type contains.
+ */
 int Similarity::Input::size() const
 {
+   EDEBUG_FUNC(this);
+
    return Total;
 }
 
@@ -65,8 +87,15 @@ int Similarity::Input::size() const
 
 
 
+/*!
+ * Return the argument type for a given index.
+ *
+ * @param index
+ */
 EAbstractAnalytic::Input::Type Similarity::Input::type(int index) const
 {
+   EDEBUG_FUNC(this,index);
+
    switch (index)
    {
    case InputData: return Type::DataIn;
@@ -83,7 +112,9 @@ EAbstractAnalytic::Input::Type Similarity::Input::type(int index) const
    case RemovePostOutliers: return Type::Boolean;
    case MinCorrelation: return Type::Double;
    case MaxCorrelation: return Type::Double;
-   case KernelSize: return Type::Integer;
+   case WorkBlockSize: return Type::Integer;
+   case GlobalWorkSize: return Type::Integer;
+   case LocalWorkSize: return Type::Integer;
    default: return Type::Boolean;
    }
 }
@@ -93,8 +124,16 @@ EAbstractAnalytic::Input::Type Similarity::Input::type(int index) const
 
 
 
+/*!
+ * Return data for a given role on an argument with the given index.
+ *
+ * @param index
+ * @param role
+ */
 QVariant Similarity::Input::data(int index, Role role) const
 {
+   EDEBUG_FUNC(this,index,role);
+
    switch (index)
    {
    case InputData:
@@ -238,17 +277,39 @@ QVariant Similarity::Input::data(int index, Role role) const
       case Role::Maximum: return 1;
       default: return QVariant();
       }
-   case KernelSize:
+   case WorkBlockSize:
       switch (role)
       {
-      case Role::CommandLineName: return QString("ksize");
-      case Role::Title: return tr("Kernel Size:");
-      case Role::WhatsThis: return tr("(OpenCL) Total number of kernels to run per block.");
+      case Role::CommandLineName: return QString("bsize");
+      case Role::Title: return tr("Work Block Size:");
+      case Role::WhatsThis: return tr("Number of pairs to process in each work block.");
+      case Role::Default: return 0;
+      case Role::Minimum: return 0;
+      case Role::Maximum: return std::numeric_limits<int>::max();
+      default: return QVariant();
+      }
+   case GlobalWorkSize:
+      switch (role)
+      {
+      case Role::CommandLineName: return QString("gsize");
+      case Role::Title: return tr("Global Work Size:");
+      case Role::WhatsThis: return tr("The global work size for each OpenCL worker.");
       case Role::Default: return 4096;
       case Role::Minimum: return 1;
       case Role::Maximum: return std::numeric_limits<int>::max();
       default: return QVariant();
       }
+   case LocalWorkSize:
+      switch (role)
+      {
+      case Role::CommandLineName: return QString("lsize");
+      case Role::Title: return tr("Local Work Size:");
+      case Role::WhatsThis: return tr("The local work size for each OpenCL worker.");
+      case Role::Default: return 0;
+      case Role::Minimum: return 0;
+      case Role::Maximum: return std::numeric_limits<int>::max();
+      default: return QVariant();
+      }
    default: return QVariant();
    }
 }
@@ -258,38 +319,24 @@ QVariant Similarity::Input::data(int index, Role role) const
 
 
 
+/*!
+ * Set an argument with the given index to the given value.
+ *
+ * @param index
+ * @param value
+ */
 void Similarity::Input::set(int index, const QVariant& value)
 {
+   EDEBUG_FUNC(this,index,&value);
+
    switch (index)
    {
    case ClusteringType:
       _base->_clusMethod = static_cast<ClusteringMethod>(CLUSTERING_NAMES.indexOf(value.toString()));
-
-      switch ( _base->_clusMethod )
-      {
-      case ClusteringMethod::None:
-         _base->_clusModel = nullptr;
-         break;
-      case ClusteringMethod::GMM:
-         _base->_clusModel = new Pairwise::GMM();
-         break;
-      case ClusteringMethod::KMeans:
-         _base->_clusModel = new Pairwise::KMeans();
-         break;
-      }
       break;
    case CorrelationType:
       _base->_corrMethod = static_cast<CorrelationMethod>(CORRELATION_NAMES.indexOf(value.toString()));
-
-      switch ( _base->_corrMethod )
-      {
-      case CorrelationMethod::Pearson:
-         _base->_corrModel = new Pairwise::Pearson();
-         break;
-      case CorrelationMethod::Spearman:
-         _base->_corrModel = new Pairwise::Spearman();
-         break;
-      }
+      _base->_corrName = value.toString();
       break;
    case MinExpression:
       _base->_minExpression = value.toDouble();
@@ -318,8 +365,14 @@ void Similarity::Input::set(int index, const QVariant& value)
    case MaxCorrelation:
       _base->_maxCorrelation = value.toDouble();
       break;
-   case KernelSize:
-      _base->_kernelSize = value.toInt();
+   case WorkBlockSize:
+      _base->_workBlockSize = value.toInt();
+      break;
+   case GlobalWorkSize:
+      _base->_globalWorkSize = value.toInt();
+      break;
+   case LocalWorkSize:
+      _base->_localWorkSize = value.toInt();
       break;
    }
 }
@@ -329,10 +382,16 @@ void Similarity::Input::set(int index, const QVariant& value)
 
 
 
-void Similarity::Input::set(int index, QFile* file)
+/*!
+ * Set a file argument with the given index to the given qt file pointer. This
+ * implementation does nothing because this analytic has no file arguments.
+ *
+ * @param index
+ * @param file
+ */
+void Similarity::Input::set(int, QFile*)
 {
-   Q_UNUSED(index)
-   Q_UNUSED(file)
+   EDEBUG_FUNC(this);
 }
 
 
@@ -340,8 +399,16 @@ void Similarity::Input::set(int index, QFile* file)
 
 
 
+/*!
+ * Set a data argument with the given index to the given data object pointer.
+ *
+ * @param index
+ * @param data
+ */
 void Similarity::Input::set(int index, EAbstractData *data)
 {
+   EDEBUG_FUNC(this,index,data);
+
    switch (index)
    {
    case InputData:
diff --git a/src/core/similarity_input.h b/src/core/similarity_input.h
index f5ea425..90ffe7f 100644
--- a/src/core/similarity_input.h
+++ b/src/core/similarity_input.h
@@ -4,10 +4,16 @@
 
 
 
+/*!
+ * This class implements the abstract input of the similarity analytic.
+ */
 class Similarity::Input : public EAbstractAnalytic::Input
 {
    Q_OBJECT
 public:
+   /*!
+    * Defines all arguments for its parent analytic.
+    */
    enum Argument
    {
       InputData = 0
@@ -24,7 +30,9 @@ class Similarity::Input : public EAbstractAnalytic::Input
       ,RemovePostOutliers
       ,MinCorrelation
       ,MaxCorrelation
-      ,KernelSize
+      ,WorkBlockSize
+      ,GlobalWorkSize
+      ,LocalWorkSize
       ,Total
    };
    explicit Input(Similarity* parent);
@@ -34,12 +42,13 @@ class Similarity::Input : public EAbstractAnalytic::Input
    virtual void set(int index, const QVariant& value) override final;
    virtual void set(int index, QFile* file) override final;
    virtual void set(int index, EAbstractData* data) override final;
-
 private:
    static const QStringList CLUSTERING_NAMES;
    static const QStringList CORRELATION_NAMES;
    static const QStringList CRITERION_NAMES;
-
+   /*!
+    * Pointer to the base analytic for this object.
+    */
    Similarity* _base;
 };
 
diff --git a/src/core/similarity_opencl.cpp b/src/core/similarity_opencl.cpp
index 41a6833..678b466 100644
--- a/src/core/similarity_opencl.cpp
+++ b/src/core/similarity_opencl.cpp
@@ -1,4 +1,5 @@
 #include "similarity_opencl.h"
+#include <QVector>
 #include "similarity_opencl_worker.h"
 
 
@@ -10,10 +11,16 @@ using namespace std;
 
 
 
+/*!
+ * Construct a new OpenCL object with the given analytic as its parent.
+ *
+ * @param parent
+ */
 Similarity::OpenCL::OpenCL(Similarity* parent):
    EAbstractAnalytic::OpenCL(parent),
    _base(parent)
 {
+   EDEBUG_FUNC(this,parent);
 }
 
 
@@ -21,8 +28,13 @@ Similarity::OpenCL::OpenCL(Similarity* parent):
 
 
 
+/*!
+ * Create and return a new OpenCL worker for the analytic.
+ */
 std::unique_ptr<EAbstractAnalytic::OpenCL::Worker> Similarity::OpenCL::makeWorker()
 {
+   EDEBUG_FUNC(this);
+
    return unique_ptr<EAbstractAnalytic::OpenCL::Worker>(new Worker(_base, this, _context, _program));
 }
 
@@ -31,8 +43,15 @@ std::unique_ptr<EAbstractAnalytic::OpenCL::Worker> Similarity::OpenCL::makeWorke
 
 
 
+/*!
+ * Initializes all OpenCL resources used by this object's implementation.
+ *
+ * @param context
+ */
 void Similarity::OpenCL::initialize(::OpenCL::Context* context)
 {
+   EDEBUG_FUNC(this,context);
+
    // create list of opencl source files
    QStringList paths {
       ":/opencl/linalg.cl",
@@ -40,7 +59,6 @@ void Similarity::OpenCL::initialize(::OpenCL::Context* context)
       ":/opencl/sort.cl",
       ":/opencl/outlier.cl",
       ":/opencl/gmm.cl",
-      ":/opencl/kmeans.cl",
       ":/opencl/pearson.cl",
       ":/opencl/spearman.cl"
    };
@@ -53,17 +71,15 @@ void Similarity::OpenCL::initialize(::OpenCL::Context* context)
    _queue = new ::OpenCL::CommandQueue(context, context->devices().first(), this);
 
    // create buffer for expression data
-   _expressions = ::OpenCL::Buffer<cl_float>(context, _base->_input->getRawSize());
-
-   unique_ptr<ExpressionMatrix::Expression> rawData(_base->_input->dumpRawData());
-   ExpressionMatrix::Expression* rawDataRef {rawData.get()};
+   QVector<float> rawData = _base->_input->dumpRawData();
+   _expressions = ::OpenCL::Buffer<cl_float>(context,rawData.size());
 
    // copy expression data to device
    _expressions.mapWrite(_queue).wait();
 
-   for ( int i = 0; i < _base->_input->getRawSize(); ++i )
+   for (int i = 0; i < rawData.size() ; ++i )
    {
-      _expressions[i] = rawDataRef[i];
+      _expressions[i] = rawData[i];
    }
 
    _expressions.unmap(_queue).wait();
diff --git a/src/core/similarity_opencl.h b/src/core/similarity_opencl.h
index f5aab68..ff4e18b 100644
--- a/src/core/similarity_opencl.h
+++ b/src/core/similarity_opencl.h
@@ -6,13 +6,16 @@
 
 
 
+/*!
+ * This class implements the base OpenCL class of the similarity analytic.
+ */
 class Similarity::OpenCL : public EAbstractAnalytic::OpenCL
 {
    Q_OBJECT
 public:
    class FetchPair;
    class GMM;
-   class KMeans;
+   class Outlier;
    class Pearson;
    class Spearman;
    class Worker;
@@ -20,11 +23,25 @@ class Similarity::OpenCL : public EAbstractAnalytic::OpenCL
    virtual std::unique_ptr<EAbstractAnalytic::OpenCL::Worker> makeWorker() override final;
    virtual void initialize(::OpenCL::Context* context) override final;
 private:
+   /*!
+    * Pointer to the base analytic for this object.
+    */
    Similarity* _base;
+   /*!
+    * Pointer to this object's base OpenCL context used to create all other resources.
+    */
    ::OpenCL::Context* _context {nullptr};
+   /*!
+    * Pointer to this object's OpenCL program.
+    */
    ::OpenCL::Program* _program {nullptr};
+   /*!
+    * Pointer to this object's OpenCL command queue.
+    */
    ::OpenCL::CommandQueue* _queue {nullptr};
-
+   /*!
+    * Pointer to this object's OpenCL buffer for the expression matrix.
+    */
    ::OpenCL::Buffer<cl_float> _expressions;
 };
 
diff --git a/src/core/similarity_opencl_fetchpair.cpp b/src/core/similarity_opencl_fetchpair.cpp
index b752165..4f53068 100644
--- a/src/core/similarity_opencl_fetchpair.cpp
+++ b/src/core/similarity_opencl_fetchpair.cpp
@@ -9,9 +9,17 @@ using namespace std;
 
 
 
+/*!
+ * Construct a new fetch-pair kernel object with the given OpenCL program and
+ * qt parent.
+ *
+ * @param program
+ * @param parent
+ */
 Similarity::OpenCL::FetchPair::FetchPair(::OpenCL::Program* program, QObject* parent):
    ::OpenCL::Kernel(program, "fetchPair", parent)
 {
+   EDEBUG_FUNC(this,program,parent);
 }
 
 
@@ -19,22 +27,52 @@ Similarity::OpenCL::FetchPair::FetchPair(::OpenCL::Program* program, QObject* pa
 
 
 
+/*!
+ * Execute this kernel object's OpenCL kernel using the given OpenCL command
+ * queue and kernel arguments, returning the OpenCL event associated with the
+ * kernel execution.
+ *
+ * @param queue
+ * @param globalWorkSize
+ * @param localWorkSize
+ * @param expressions
+ * @param sampleSize
+ * @param in_index
+ * @param minExpression
+ * @param out_X
+ * @param out_N
+ * @param out_labels
+ */
 ::OpenCL::Event Similarity::OpenCL::FetchPair::execute(
    ::OpenCL::CommandQueue* queue,
-   int kernelSize,
+   int globalWorkSize,
+   int localWorkSize,
    ::OpenCL::Buffer<cl_float>* expressions,
    cl_int sampleSize,
    ::OpenCL::Buffer<cl_int2>* in_index,
    cl_int minExpression,
-   ::OpenCL::Buffer<Pairwise::Vector2>* out_X,
+   ::OpenCL::Buffer<cl_float2>* out_X,
    ::OpenCL::Buffer<cl_int>* out_N,
    ::OpenCL::Buffer<cl_char>* out_labels
 )
 {
+   EDEBUG_FUNC(this,
+      queue,
+      globalWorkSize,
+      localWorkSize,
+      expressions,
+      sampleSize,
+      in_index,
+      minExpression,
+      out_X,
+      out_N,
+      out_labels);
+
    // acquire lock for this kernel
    Locker locker {lock()};
 
    // set kernel arguments
+   setArgument(GlobalWorkSize, globalWorkSize);
    setBuffer(Expressions, expressions);
    setArgument(SampleSize, sampleSize);
    setBuffer(InIndex, in_index);
@@ -43,8 +81,15 @@ ::OpenCL::Event Similarity::OpenCL::FetchPair::execute(
    setBuffer(OutN, out_N);
    setBuffer(OutLabels, out_labels);
 
-   // set kernel sizes
-   setSizes(0, kernelSize, min(kernelSize, maxWorkGroupSize(queue->device())));
+   // set work sizes
+   if ( localWorkSize == 0 )
+   {
+      localWorkSize = min(globalWorkSize, maxWorkGroupSize(queue->device()));
+   }
+
+   int numWorkgroups = (globalWorkSize + localWorkSize - 1) / localWorkSize;
+
+   setSizes(0, numWorkgroups * localWorkSize, localWorkSize);
 
    // execute kernel
    return ::OpenCL::Kernel::execute(queue);
diff --git a/src/core/similarity_opencl_fetchpair.h b/src/core/similarity_opencl_fetchpair.h
index bdab41e..04de1fd 100644
--- a/src/core/similarity_opencl_fetchpair.h
+++ b/src/core/similarity_opencl_fetchpair.h
@@ -4,13 +4,22 @@
 
 
 
+/*!
+ * This class implements the fetch-pair kernel for the similarity analytic. This
+ * kernel takes a list of pairwise indices and computes the pairwise data, the
+ * number of clean samples, and the initial sample labels for each pair.
+ */
 class Similarity::OpenCL::FetchPair : public ::OpenCL::Kernel
 {
    Q_OBJECT
 public:
+   /*!
+    * Defines the arguments passed to the OpenCL kernel.
+    */
    enum Argument
    {
-      Expressions
+      GlobalWorkSize
+      ,Expressions
       ,SampleSize
       ,InIndex
       ,MinExpression
@@ -21,12 +30,13 @@ class Similarity::OpenCL::FetchPair : public ::OpenCL::Kernel
    explicit FetchPair(::OpenCL::Program* program, QObject* parent = nullptr);
    ::OpenCL::Event execute(
       ::OpenCL::CommandQueue* queue,
-      int kernelSize,
+      int globalWorkSize,
+      int localWorkSize,
       ::OpenCL::Buffer<cl_float>* expressions,
       cl_int sampleSize,
       ::OpenCL::Buffer<cl_int2>* in_index,
       cl_int minExpression,
-      ::OpenCL::Buffer<Pairwise::Vector2>* out_X,
+      ::OpenCL::Buffer<cl_float2>* out_X,
       ::OpenCL::Buffer<cl_int>* out_N,
       ::OpenCL::Buffer<cl_char>* out_labels
    );
diff --git a/src/core/similarity_opencl_gmm.cpp b/src/core/similarity_opencl_gmm.cpp
index 31c330d..e18f562 100644
--- a/src/core/similarity_opencl_gmm.cpp
+++ b/src/core/similarity_opencl_gmm.cpp
@@ -9,9 +9,16 @@ using namespace std;
 
 
 
+/*!
+ * Construct a new GMM kernel object with the given OpenCL program and qt parent.
+ *
+ * @param program
+ * @param parent
+ */
 Similarity::OpenCL::GMM::GMM(::OpenCL::Program* program, QObject* parent):
    ::OpenCL::Kernel(program, "GMM_compute", parent)
 {
+   EDEBUG_FUNC(this,program,parent);
 }
 
 
@@ -19,42 +26,81 @@ Similarity::OpenCL::GMM::GMM(::OpenCL::Program* program, QObject* parent):
 
 
 
+/*!
+ * Execute this kernel object's OpenCL kernel using the given OpenCL command
+ * queue and kernel arguments, returning the OpenCL event associated with the
+ * kernel execution.
+ *
+ * @param queue
+ * @param globalWorkSize
+ * @param localWorkSize
+ * @param sampleSize
+ * @param minSamples
+ * @param minClusters
+ * @param maxClusters
+ * @param criterion
+ * @param work_X
+ * @param work_N
+ * @param work_labels
+ * @param work_components
+ * @param work_MP
+ * @param work_counts
+ * @param work_logpi
+ * @param work_gamma
+ * @param out_K
+ * @param out_labels
+ */
 ::OpenCL::Event Similarity::OpenCL::GMM::execute(
    ::OpenCL::CommandQueue* queue,
-   int kernelSize,
-   ::OpenCL::Buffer<cl_float>* expressions,
+   int globalWorkSize,
+   int localWorkSize,
    cl_int sampleSize,
    cl_int minSamples,
    cl_char minClusters,
    cl_char maxClusters,
-   Pairwise::Criterion criterion,
-   cl_int removePreOutliers,
-   cl_int removePostOutliers,
-   ::OpenCL::Buffer<Pairwise::Vector2>* work_X,
+   cl_int criterion,
+   ::OpenCL::Buffer<cl_float2>* work_X,
    ::OpenCL::Buffer<cl_int>* work_N,
    ::OpenCL::Buffer<cl_char>* work_labels,
-   ::OpenCL::Buffer<Pairwise::GMM::Component>* work_components,
-   ::OpenCL::Buffer<Pairwise::Vector2>* work_MP,
+   ::OpenCL::Buffer<cl_component>* work_components,
+   ::OpenCL::Buffer<cl_float2>* work_MP,
    ::OpenCL::Buffer<cl_int>* work_counts,
    ::OpenCL::Buffer<cl_float>* work_logpi,
-   ::OpenCL::Buffer<cl_float>* work_loggamma,
-   ::OpenCL::Buffer<cl_float>* work_logGamma,
+   ::OpenCL::Buffer<cl_float>* work_gamma,
    ::OpenCL::Buffer<cl_char>* out_K,
    ::OpenCL::Buffer<cl_char>* out_labels
 )
 {
+   EDEBUG_FUNC(this,
+      queue,
+      globalWorkSize,
+      localWorkSize,
+      sampleSize,
+      minSamples,
+      minClusters,
+      maxClusters,
+      &criterion,
+      work_X,
+      work_N,
+      work_labels,
+      work_components,
+      work_MP,
+      work_counts,
+      work_logpi,
+      work_gamma,
+      out_K,
+      out_labels);
+
    // acquire lock for this kernel
    Locker locker {lock()};
 
    // set kernel arguments
-   setBuffer(Expressions, expressions);
+   setArgument(GlobalWorkSize, globalWorkSize);
    setArgument(SampleSize, sampleSize);
    setArgument(MinSamples, minSamples);
    setArgument(MinClusters, minClusters);
    setArgument(MaxClusters, maxClusters);
    setArgument(Criterion, criterion);
-   setArgument(RemovePreOutliers, removePreOutliers);
-   setArgument(RemovePostOutliers, removePostOutliers);
    setBuffer(WorkX, work_X);
    setBuffer(WorkN, work_N);
    setBuffer(WorkLabels, work_labels);
@@ -62,13 +108,19 @@ ::OpenCL::Event Similarity::OpenCL::GMM::execute(
    setBuffer(WorkMP, work_MP);
    setBuffer(WorkCounts, work_counts);
    setBuffer(WorkLogPi, work_logpi);
-   setBuffer(WorkLoggamma, work_loggamma);
-   setBuffer(WorkLogGamma, work_logGamma);
+   setBuffer(WorkGamma, work_gamma);
    setBuffer(OutK, out_K);
    setBuffer(OutLabels, out_labels);
 
-   // set kernel sizes
-   setSizes(0, kernelSize, min(kernelSize, maxWorkGroupSize(queue->device())));
+   // set work sizes
+   if ( localWorkSize == 0 )
+   {
+      localWorkSize = min(globalWorkSize, maxWorkGroupSize(queue->device()));
+   }
+
+   int numWorkgroups = (globalWorkSize + localWorkSize - 1) / localWorkSize;
+
+   setSizes(0, numWorkgroups * localWorkSize, localWorkSize);
 
    // execute kernel
    return ::OpenCL::Kernel::execute(queue);
diff --git a/src/core/similarity_opencl_gmm.h b/src/core/similarity_opencl_gmm.h
index e9a537c..16a5809 100644
--- a/src/core/similarity_opencl_gmm.h
+++ b/src/core/similarity_opencl_gmm.h
@@ -4,20 +4,40 @@
 
 
 
+typedef struct
+{
+   cl_float pi;
+   cl_float2 mu;
+   cl_float4 sigma;
+   cl_float4 sigmaInv;
+   cl_float normalizer;
+} cl_component;
+
+
+
+
+
+
+/*!
+ * This class implements the GMM kernel for the similarity analytic. This
+ * kernel takes a list of pairwise data arrays and computes the number of
+ * clusters and a list of cluster labels for each pair.
+ */
 class Similarity::OpenCL::GMM : public ::OpenCL::Kernel
 {
    Q_OBJECT
 public:
+   /*!
+    * Defines the arguments passed to the OpenCL kernel.
+    */
    enum Argument
    {
-      Expressions
+      GlobalWorkSize
       ,SampleSize
       ,MinSamples
       ,MinClusters
       ,MaxClusters
       ,Criterion
-      ,RemovePreOutliers
-      ,RemovePostOutliers
       ,WorkX
       ,WorkN
       ,WorkLabels
@@ -25,32 +45,28 @@ class Similarity::OpenCL::GMM : public ::OpenCL::Kernel
       ,WorkMP
       ,WorkCounts
       ,WorkLogPi
-      ,WorkLoggamma
-      ,WorkLogGamma
+      ,WorkGamma
       ,OutK
       ,OutLabels
    };
    explicit GMM(::OpenCL::Program* program, QObject* parent = nullptr);
    ::OpenCL::Event execute(
       ::OpenCL::CommandQueue* queue,
-      int kernelSize,
-      ::OpenCL::Buffer<cl_float>* expressions,
+      int globalWorkSize,
+      int localWorkSize,
       cl_int sampleSize,
       cl_int minSamples,
       cl_char minClusters,
       cl_char maxClusters,
-      Pairwise::Criterion criterion,
-      cl_int removePreOutliers,
-      cl_int removePostOutliers,
-      ::OpenCL::Buffer<Pairwise::Vector2>* work_X,
+      cl_int criterion,
+      ::OpenCL::Buffer<cl_float2>* work_X,
       ::OpenCL::Buffer<cl_int>* work_N,
       ::OpenCL::Buffer<cl_char>* work_labels,
-      ::OpenCL::Buffer<Pairwise::GMM::Component>* work_components,
-      ::OpenCL::Buffer<Pairwise::Vector2>* work_MP,
+      ::OpenCL::Buffer<cl_component>* work_components,
+      ::OpenCL::Buffer<cl_float2>* work_MP,
       ::OpenCL::Buffer<cl_int>* work_counts,
       ::OpenCL::Buffer<cl_float>* work_logpi,
-      ::OpenCL::Buffer<cl_float>* work_loggamma,
-      ::OpenCL::Buffer<cl_float>* work_logGamma,
+      ::OpenCL::Buffer<cl_float>* work_gamma,
       ::OpenCL::Buffer<cl_char>* out_K,
       ::OpenCL::Buffer<cl_char>* out_labels
    );
diff --git a/src/core/similarity_opencl_kmeans.cpp b/src/core/similarity_opencl_kmeans.cpp
deleted file mode 100644
index 7f43642..0000000
--- a/src/core/similarity_opencl_kmeans.cpp
+++ /dev/null
@@ -1,65 +0,0 @@
-#include "similarity_opencl_kmeans.h"
-
-
-
-using namespace std;
-
-
-
-
-
-
-Similarity::OpenCL::KMeans::KMeans(::OpenCL::Program* program, QObject* parent):
-   ::OpenCL::Kernel(program, "KMeans_compute", parent)
-{
-}
-
-
-
-
-
-
-::OpenCL::Event Similarity::OpenCL::KMeans::execute(
-   ::OpenCL::CommandQueue* queue,
-   int kernelSize,
-   ::OpenCL::Buffer<cl_float>* expressions,
-   cl_int sampleSize,
-   cl_int minSamples,
-   cl_char minClusters,
-   cl_char maxClusters,
-   cl_int removePreOutliers,
-   cl_int removePostOutliers,
-   ::OpenCL::Buffer<Pairwise::Vector2>* work_X,
-   ::OpenCL::Buffer<cl_int>* work_N,
-   ::OpenCL::Buffer<cl_float>* work_outlier,
-   ::OpenCL::Buffer<cl_char>* work_labels,
-   ::OpenCL::Buffer<Pairwise::Vector2>* work_means,
-   ::OpenCL::Buffer<cl_char>* out_K,
-   ::OpenCL::Buffer<cl_char>* out_labels
-)
-{
-   // acquire lock for this kernel
-   Locker locker {lock()};
-
-   // set kernel arguments
-   setBuffer(Expressions, expressions);
-   setArgument(SampleSize, sampleSize);
-   setArgument(MinSamples, minSamples);
-   setArgument(MinClusters, minClusters);
-   setArgument(MaxClusters, maxClusters);
-   setArgument(RemovePreOutliers, removePreOutliers);
-   setArgument(RemovePostOutliers, removePostOutliers);
-   setBuffer(WorkX, work_X);
-   setBuffer(WorkN, work_N);
-   setBuffer(WorkOutlier, work_outlier);
-   setBuffer(WorkLabels, work_labels);
-   setBuffer(WorkMeans, work_means);
-   setBuffer(OutK, out_K);
-   setBuffer(OutLabels, out_labels);
-
-   // set kernel sizes
-   setSizes(0, kernelSize, min(kernelSize, maxWorkGroupSize(queue->device())));
-
-   // execute kernel
-   return ::OpenCL::Kernel::execute(queue);
-}
diff --git a/src/core/similarity_opencl_kmeans.h b/src/core/similarity_opencl_kmeans.h
deleted file mode 100644
index 019624c..0000000
--- a/src/core/similarity_opencl_kmeans.h
+++ /dev/null
@@ -1,51 +0,0 @@
-#ifndef SIMILARITY_OPENCL_KMEANS_H
-#define SIMILARITY_OPENCL_KMEANS_H
-#include "similarity_opencl.h"
-
-
-
-class Similarity::OpenCL::KMeans : public ::OpenCL::Kernel
-{
-   Q_OBJECT
-public:
-   enum Argument
-   {
-      Expressions
-      ,SampleSize
-      ,MinSamples
-      ,MinClusters
-      ,MaxClusters
-      ,RemovePreOutliers
-      ,RemovePostOutliers
-      ,WorkX
-      ,WorkN
-      ,WorkOutlier
-      ,WorkLabels
-      ,WorkMeans
-      ,OutK
-      ,OutLabels
-   };
-   explicit KMeans(::OpenCL::Program* program, QObject* parent = nullptr);
-   ::OpenCL::Event execute(
-      ::OpenCL::CommandQueue* queue,
-      int kernelSize,
-      ::OpenCL::Buffer<cl_float>* expressions,
-      cl_int sampleSize,
-      cl_int minSamples,
-      cl_char minClusters,
-      cl_char maxClusters,
-      cl_int removePreOutliers,
-      cl_int removePostOutliers,
-      ::OpenCL::Buffer<Pairwise::Vector2>* work_X,
-      ::OpenCL::Buffer<cl_int>* work_N,
-      ::OpenCL::Buffer<cl_float>* work_outlier,
-      ::OpenCL::Buffer<cl_char>* work_labels,
-      ::OpenCL::Buffer<Pairwise::Vector2>* work_means,
-      ::OpenCL::Buffer<cl_char>* out_K,
-      ::OpenCL::Buffer<cl_char>* out_labels
-   );
-};
-
-
-
-#endif
diff --git a/src/core/similarity_opencl_outlier.cpp b/src/core/similarity_opencl_outlier.cpp
new file mode 100644
index 0000000..908ec62
--- /dev/null
+++ b/src/core/similarity_opencl_outlier.cpp
@@ -0,0 +1,99 @@
+#include "similarity_opencl_outlier.h"
+
+
+
+using namespace std;
+
+
+
+
+
+
+/*!
+ * Construct a new Outlier kernel object with the given OpenCL program and qt parent.
+ *
+ * @param program
+ * @param parent
+ */
+Similarity::OpenCL::Outlier::Outlier(::OpenCL::Program* program, QObject* parent):
+   ::OpenCL::Kernel(program, "removeOutliers", parent)
+{
+   EDEBUG_FUNC(this,program,parent);
+}
+
+
+
+
+
+
+/*!
+ * Execute this kernel object's OpenCL kernel using the given OpenCL command
+ * queue and kernel arguments, returning the OpenCL event associated with the
+ * kernel execution.
+ *
+ * @param queue
+ * @param globalWorkSize
+ * @param localWorkSize
+ * @param in_data
+ * @param in_N
+ * @param in_labels
+ * @param sampleSize
+ * @param in_K
+ * @param marker
+ * @param work_x
+ * @param work_y
+ */
+::OpenCL::Event Similarity::OpenCL::Outlier::execute(
+   ::OpenCL::CommandQueue* queue,
+   int globalWorkSize,
+   int localWorkSize,
+   ::OpenCL::Buffer<cl_float2>* in_data,
+   ::OpenCL::Buffer<cl_int>* in_N,
+   ::OpenCL::Buffer<cl_char>* in_labels,
+   cl_int sampleSize,
+   ::OpenCL::Buffer<cl_char>* in_K,
+   cl_char marker,
+   ::OpenCL::Buffer<cl_float>* work_x,
+   ::OpenCL::Buffer<cl_float>* work_y
+)
+{
+   EDEBUG_FUNC(this,
+      queue,
+      globalWorkSize,
+      localWorkSize,
+      in_data,
+      in_N,
+      in_labels,
+      sampleSize,
+      in_K,
+      marker,
+      work_x,
+      work_y);
+
+   // acquire lock for this kernel
+   Locker locker {lock()};
+
+   // set kernel arguments
+   setArgument(GlobalWorkSize, globalWorkSize);
+   setBuffer(InData, in_data);
+   setBuffer(InN, in_N);
+   setBuffer(InLabels, in_labels);
+   setArgument(SampleSize, sampleSize);
+   setBuffer(InK, in_K);
+   setArgument(Marker, marker);
+   setBuffer(WorkX, work_x);
+   setBuffer(WorkY, work_y);
+
+   // set work sizes
+   if ( localWorkSize == 0 )
+   {
+      localWorkSize = min(globalWorkSize, maxWorkGroupSize(queue->device()));
+   }
+
+   int numWorkgroups = (globalWorkSize + localWorkSize - 1) / localWorkSize;
+
+   setSizes(0, numWorkgroups * localWorkSize, localWorkSize);
+
+   // execute kernel
+   return ::OpenCL::Kernel::execute(queue);
+}
diff --git a/src/core/similarity_opencl_outlier.h b/src/core/similarity_opencl_outlier.h
new file mode 100644
index 0000000..26894d6
--- /dev/null
+++ b/src/core/similarity_opencl_outlier.h
@@ -0,0 +1,47 @@
+#ifndef SIMILARITY_OPENCL_OUTLIER_H
+#define SIMILARITY_OPENCL_OUTLIER_H
+#include "similarity_opencl.h"
+
+
+
+/*!
+ * This class implements the outlier removal kernel for the similarity analytic.
+ */
+class Similarity::OpenCL::Outlier : public ::OpenCL::Kernel
+{
+   Q_OBJECT
+public:
+   /*!
+    * Defines the arguments passed to the OpenCL kernel.
+    */
+   enum Argument
+   {
+      GlobalWorkSize
+      ,InData
+      ,InN
+      ,InLabels
+      ,SampleSize
+      ,InK
+      ,Marker
+      ,WorkX
+      ,WorkY
+   };
+   explicit Outlier(::OpenCL::Program* program, QObject* parent = nullptr);
+   ::OpenCL::Event execute(
+      ::OpenCL::CommandQueue* queue,
+      int globalWorkSize,
+      int localWorkSize,
+      ::OpenCL::Buffer<cl_float2>* in_data,
+      ::OpenCL::Buffer<cl_int>* in_N,
+      ::OpenCL::Buffer<cl_char>* in_labels,
+      cl_int sampleSize,
+      ::OpenCL::Buffer<cl_char>* in_K,
+      cl_char marker,
+      ::OpenCL::Buffer<cl_float>* work_x,
+      ::OpenCL::Buffer<cl_float>* work_y
+   );
+};
+
+
+
+#endif
diff --git a/src/core/similarity_opencl_pearson.cpp b/src/core/similarity_opencl_pearson.cpp
index 640993c..3caff15 100644
--- a/src/core/similarity_opencl_pearson.cpp
+++ b/src/core/similarity_opencl_pearson.cpp
@@ -9,9 +9,17 @@ using namespace std;
 
 
 
+/*!
+ * Construct a new Pearson kernel object with the given OpenCL program and
+ * qt parent.
+ *
+ * @param program
+ * @param parent
+ */
 Similarity::OpenCL::Pearson::Pearson(::OpenCL::Program* program, QObject* parent):
    ::OpenCL::Kernel(program, "Pearson_compute", parent)
 {
+   EDEBUG_FUNC(this,program,parent);
 }
 
 
@@ -19,10 +27,26 @@ Similarity::OpenCL::Pearson::Pearson(::OpenCL::Program* program, QObject* parent
 
 
 
+/*!
+ * Execute this kernel object's OpenCL kernel using the given OpenCL command
+ * queue and kernel arguments, returning the OpenCL event associated with the
+ * kernel execution.
+ *
+ * @param queue
+ * @param globalWorkSize
+ * @param localWorkSize
+ * @param in_data
+ * @param clusterSize
+ * @param in_labels
+ * @param sampleSize
+ * @param minSamples
+ * @param out_correlations
+ */
 ::OpenCL::Event Similarity::OpenCL::Pearson::execute(
    ::OpenCL::CommandQueue* queue,
-   int kernelSize,
-   ::OpenCL::Buffer<Pairwise::Vector2>* in_data,
+   int globalWorkSize,
+   int localWorkSize,
+   ::OpenCL::Buffer<cl_float2>* in_data,
    cl_char clusterSize,
    ::OpenCL::Buffer<cl_char>* in_labels,
    cl_int sampleSize,
@@ -30,10 +54,22 @@ ::OpenCL::Event Similarity::OpenCL::Pearson::execute(
    ::OpenCL::Buffer<cl_float>* out_correlations
 )
 {
+   EDEBUG_FUNC(this,
+      queue,
+      globalWorkSize,
+      localWorkSize,
+      in_data,
+      clusterSize,
+      in_labels,
+      sampleSize,
+      minSamples,
+      out_correlations);
+
    // acquire lock for this kernel
    Locker locker {lock()};
 
    // set kernel arguments
+   setArgument(GlobalWorkSize, globalWorkSize);
    setBuffer(InData, in_data);
    setArgument(ClusterSize, clusterSize);
    setBuffer(InLabels, in_labels);
@@ -41,8 +77,15 @@ ::OpenCL::Event Similarity::OpenCL::Pearson::execute(
    setArgument(MinSamples, minSamples);
    setBuffer(OutCorrelations, out_correlations);
 
-   // set kernel sizes
-   setSizes(0, kernelSize, min(kernelSize, maxWorkGroupSize(queue->device())));
+   // set work sizes
+   if ( localWorkSize == 0 )
+   {
+      localWorkSize = min(globalWorkSize, maxWorkGroupSize(queue->device()));
+   }
+
+   int numWorkgroups = (globalWorkSize + localWorkSize - 1) / localWorkSize;
+
+   setSizes(0, numWorkgroups * localWorkSize, localWorkSize);
 
    // execute kernel
    return ::OpenCL::Kernel::execute(queue);
diff --git a/src/core/similarity_opencl_pearson.h b/src/core/similarity_opencl_pearson.h
index f54bc0f..93824c6 100644
--- a/src/core/similarity_opencl_pearson.h
+++ b/src/core/similarity_opencl_pearson.h
@@ -4,13 +4,22 @@
 
 
 
+/*!
+ * This class implements the Pearson kernel for the similarity analytic. This
+ * kernel takes a list of pairwise data arrays (with cluster labels) and computes
+ * the Pearson correlation for each cluster in each pair.
+ */
 class Similarity::OpenCL::Pearson : public ::OpenCL::Kernel
 {
    Q_OBJECT
 public:
+   /*!
+    * Defines the arguments passed to the OpenCL kernel.
+    */
    enum Argument
    {
-      InData
+      GlobalWorkSize
+      ,InData
       ,ClusterSize
       ,InLabels
       ,SampleSize
@@ -20,8 +29,9 @@ class Similarity::OpenCL::Pearson : public ::OpenCL::Kernel
    explicit Pearson(::OpenCL::Program* program, QObject* parent = nullptr);
    ::OpenCL::Event execute(
       ::OpenCL::CommandQueue* queue,
-      int kernelSize,
-      ::OpenCL::Buffer<Pairwise::Vector2>* in_data,
+      int globalWorkSize,
+      int localWorkSize,
+      ::OpenCL::Buffer<cl_float2>* in_data,
       cl_char clusterSize,
       ::OpenCL::Buffer<cl_char>* in_labels,
       cl_int sampleSize,
diff --git a/src/core/similarity_opencl_spearman.cpp b/src/core/similarity_opencl_spearman.cpp
index 44bf263..24c5903 100644
--- a/src/core/similarity_opencl_spearman.cpp
+++ b/src/core/similarity_opencl_spearman.cpp
@@ -9,9 +9,17 @@ using namespace std;
 
 
 
+/*!
+ * Construct a new Spearman kernel object with the given OpenCL program and
+ * qt parent.
+ *
+ * @param program
+ * @param parent
+ */
 Similarity::OpenCL::Spearman::Spearman(::OpenCL::Program* program, QObject* parent):
    ::OpenCL::Kernel(program, "Spearman_compute", parent)
 {
+   EDEBUG_FUNC(this,parent);
 }
 
 
@@ -19,10 +27,29 @@ Similarity::OpenCL::Spearman::Spearman(::OpenCL::Program* program, QObject* pare
 
 
 
+/*!
+ * Execute this kernel object's OpenCL kernel using the given OpenCL command
+ * queue and kernel arguments, returning the OpenCL event associated with the
+ * kernel execution.
+ *
+ * @param queue
+ * @param globalWorkSize
+ * @param localWorkSize
+ * @param in_data
+ * @param clusterSize
+ * @param in_labels
+ * @param sampleSize
+ * @param minSamples
+ * @param work_x
+ * @param work_y
+ * @param work_rank
+ * @param out_correlations
+ */
 ::OpenCL::Event Similarity::OpenCL::Spearman::execute(
    ::OpenCL::CommandQueue* queue,
-   int kernelSize,
-   ::OpenCL::Buffer<Pairwise::Vector2>* in_data,
+   int globalWorkSize,
+   int localWorkSize,
+   ::OpenCL::Buffer<cl_float2>* in_data,
    cl_char clusterSize,
    ::OpenCL::Buffer<cl_char>* in_labels,
    cl_int sampleSize,
@@ -33,10 +60,25 @@ ::OpenCL::Event Similarity::OpenCL::Spearman::execute(
    ::OpenCL::Buffer<cl_float>* out_correlations
 )
 {
+   EDEBUG_FUNC(this,
+      queue,
+      globalWorkSize,
+      localWorkSize,
+      in_data,
+      clusterSize,
+      in_labels,
+      sampleSize,
+      minSamples,
+      work_x,
+      work_y,
+      work_rank,
+      out_correlations);
+
    // acquire lock for this kernel
    Locker locker {lock()};
 
    // set kernel arguments
+   setArgument(GlobalWorkSize, globalWorkSize);
    setBuffer(InData, in_data);
    setArgument(ClusterSize, clusterSize);
    setBuffer(InLabels, in_labels);
@@ -47,8 +89,15 @@ ::OpenCL::Event Similarity::OpenCL::Spearman::execute(
    setBuffer(WorkRank, work_rank);
    setBuffer(OutCorrelations, out_correlations);
 
-   // set kernel sizes
-   setSizes(0, kernelSize, min(kernelSize, maxWorkGroupSize(queue->device())));
+   // set work sizes
+   if ( localWorkSize == 0 )
+   {
+      localWorkSize = min(globalWorkSize, maxWorkGroupSize(queue->device()));
+   }
+
+   int numWorkgroups = (globalWorkSize + localWorkSize - 1) / localWorkSize;
+
+   setSizes(0, numWorkgroups * localWorkSize, localWorkSize);
 
    // execute kernel
    return ::OpenCL::Kernel::execute(queue);
diff --git a/src/core/similarity_opencl_spearman.h b/src/core/similarity_opencl_spearman.h
index d1a198e..e1d6693 100644
--- a/src/core/similarity_opencl_spearman.h
+++ b/src/core/similarity_opencl_spearman.h
@@ -4,13 +4,22 @@
 
 
 
+/*!
+ * This class implements the Pearson kernel for the similarity analytic. This
+ * kernel takes a list of pairwise data arrays (with cluster labels) and computes
+ * the Spearman correlation for each cluster in each pair.
+ */
 class Similarity::OpenCL::Spearman : public ::OpenCL::Kernel
 {
    Q_OBJECT
 public:
+   /*!
+    * Defines the arguments passed to the OpenCL kernel.
+    */
    enum Argument
    {
-      InData
+      GlobalWorkSize
+      ,InData
       ,ClusterSize
       ,InLabels
       ,SampleSize
@@ -23,8 +32,9 @@ class Similarity::OpenCL::Spearman : public ::OpenCL::Kernel
    explicit Spearman(::OpenCL::Program* program, QObject* parent = nullptr);
    ::OpenCL::Event execute(
       ::OpenCL::CommandQueue* queue,
-      int kernelSize,
-      ::OpenCL::Buffer<Pairwise::Vector2>* in_data,
+      int globalWorkSize,
+      int localWorkSize,
+      ::OpenCL::Buffer<cl_float2>* in_data,
       cl_char clusterSize,
       ::OpenCL::Buffer<cl_char>* in_labels,
       cl_int sampleSize,
diff --git a/src/core/similarity_opencl_worker.cpp b/src/core/similarity_opencl_worker.cpp
index 994175f..171f9e3 100644
--- a/src/core/similarity_opencl_worker.cpp
+++ b/src/core/similarity_opencl_worker.cpp
@@ -1,11 +1,8 @@
 #include "similarity_opencl_worker.h"
-#include "similarity_opencl_fetchpair.h"
-#include "similarity_opencl_gmm.h"
-#include "similarity_opencl_kmeans.h"
-#include "similarity_opencl_pearson.h"
-#include "similarity_opencl_spearman.h"
 #include "similarity_resultblock.h"
 #include "similarity_workblock.h"
+#include <ace/core/elog.h>
+#include "pairwise_spearman.h"
 
 
 
@@ -16,73 +13,50 @@ using namespace std;
 
 
 
-
-int nextPower2(int n)
-{
-   int pow2 = 2;
-   while ( pow2 < n )
-   {
-      pow2 *= 2;
-   }
-
-   return pow2;
-}
-
-
-
-
-
-
-template<class T>
-QVector<T> createVector(const T* data, int size)
-{
-   QVector<T> v(size);
-
-   memcpy(v.data(), data, size * sizeof(T));
-   return v;
-}
-
-
-
-
-
-
+/*!
+ * Construct a new OpenCL worker with the given parent analytic, OpenCL object,
+ * OpenCL context, and OpenCL program.
+ *
+ * @param base
+ * @param baseOpenCL
+ * @param context
+ * @param program
+ */
 Similarity::OpenCL::Worker::Worker(Similarity* base, Similarity::OpenCL* baseOpenCL, ::OpenCL::Context* context, ::OpenCL::Program* program):
    _base(base),
    _baseOpenCL(baseOpenCL),
    _queue(new ::OpenCL::CommandQueue(context, context->devices().first(), this))
 {
+   EDEBUG_FUNC(this,base,baseOpenCL,context,program);
+
    // initialize kernels
    _kernels.fetchPair = new OpenCL::FetchPair(program, this);
    _kernels.gmm = new OpenCL::GMM(program, this);
-   _kernels.kmeans = new OpenCL::KMeans(program, this);
+   _kernels.outlier = new OpenCL::Outlier(program, this);
    _kernels.pearson = new OpenCL::Pearson(program, this);
    _kernels.spearman = new OpenCL::Spearman(program, this);
 
    // initialize buffers
-   int kernelSize {_base->_kernelSize};
-   int N {_base->_input->getSampleSize()};
-   int N_pow2 {nextPower2(N)};
+   int W {_base->_globalWorkSize};
+   int N {_base->_input->sampleSize()};
+   int N_pow2 {Pairwise::Spearman::nextPower2(N)};
    int K {_base->_maxClusters};
 
-   _buffers.in_index = ::OpenCL::Buffer<cl_int2>(context, 1 * kernelSize);
-
-   _buffers.work_X = ::OpenCL::Buffer<Pairwise::Vector2>(context, N * kernelSize);
-   _buffers.work_N = ::OpenCL::Buffer<cl_int>(context, 1 * kernelSize);
-   _buffers.work_labels = ::OpenCL::Buffer<cl_char>(context, N * kernelSize);
-   _buffers.work_components = ::OpenCL::Buffer<Pairwise::GMM::Component>(context, K * kernelSize);
-   _buffers.work_MP = ::OpenCL::Buffer<Pairwise::Vector2>(context, K * kernelSize);
-   _buffers.work_counts = ::OpenCL::Buffer<cl_int>(context, K * kernelSize);
-   _buffers.work_logpi = ::OpenCL::Buffer<cl_float>(context, K * kernelSize);
-   _buffers.work_loggamma = ::OpenCL::Buffer<cl_float>(context, N * K * kernelSize);
-   _buffers.work_logGamma = ::OpenCL::Buffer<cl_float>(context, K * kernelSize);
-   _buffers.out_K = ::OpenCL::Buffer<cl_char>(context, 1 * kernelSize);
-   _buffers.out_labels = ::OpenCL::Buffer<cl_char>(context, N * kernelSize);
-
-   _buffers.work_x = ::OpenCL::Buffer<cl_float>(context, N_pow2 * kernelSize);
-   _buffers.work_y = ::OpenCL::Buffer<cl_float>(context, N_pow2 * kernelSize);
-   _buffers.work_rank = ::OpenCL::Buffer<cl_int>(context, N_pow2 * kernelSize);
-   _buffers.out_correlations = ::OpenCL::Buffer<cl_float>(context, K * kernelSize);
+   _buffers.in_index = ::OpenCL::Buffer<cl_int2>(context, 1 * W);
+   _buffers.work_X = ::OpenCL::Buffer<cl_float2>(context, N * W);
+   _buffers.work_N = ::OpenCL::Buffer<cl_int>(context, 1 * W);
+   _buffers.work_x = ::OpenCL::Buffer<cl_float>(context, N_pow2 * W);
+   _buffers.work_y = ::OpenCL::Buffer<cl_float>(context, N_pow2 * W);
+   _buffers.work_labels = ::OpenCL::Buffer<cl_char>(context, N * W);
+   _buffers.work_components = ::OpenCL::Buffer<cl_component>(context, K * W);
+   _buffers.work_MP = ::OpenCL::Buffer<cl_float2>(context, K * W);
+   _buffers.work_counts = ::OpenCL::Buffer<cl_int>(context, K * W);
+   _buffers.work_logpi = ::OpenCL::Buffer<cl_float>(context, K * W);
+   _buffers.work_gamma = ::OpenCL::Buffer<cl_float>(context, N * K * W);
+   _buffers.work_rank = ::OpenCL::Buffer<cl_int>(context, N_pow2 * W);
+   _buffers.out_K = ::OpenCL::Buffer<cl_char>(context, 1 * W);
+   _buffers.out_labels = ::OpenCL::Buffer<cl_char>(context, N * W);
+   _buffers.out_correlations = ::OpenCL::Buffer<cl_float>(context, K * W);
 }
 
 
@@ -90,8 +64,22 @@ Similarity::OpenCL::Worker::Worker(Similarity* base, Similarity::OpenCL* baseOpe
 
 
 
+/*!
+ * Read in the given work block, execute the algorithms necessary to produce
+ * results using OpenCL acceleration, and save those results in a new result
+ * block whose pointer is returned.
+ *
+ * @param block
+ */
 std::unique_ptr<EAbstractAnalytic::Block> Similarity::OpenCL::Worker::execute(const EAbstractAnalytic::Block* block)
 {
+   EDEBUG_FUNC(this,block);
+
+   if ( ELog::isActive() )
+   {
+      ELog() << tr("Executing(OpenCL) work index %1.\n").arg(block->index());
+   }
+
    // cast block to work block
    const WorkBlock* workBlock {block->cast<const WorkBlock>()};
 
@@ -101,32 +89,28 @@ std::unique_ptr<EAbstractAnalytic::Block> Similarity::OpenCL::Worker::execute(co
    // iterate through all pairs
    Pairwise::Index index {workBlock->start()};
 
-   for ( int i = 0; i < workBlock->size(); i += _base->_kernelSize )
+   for ( int i = 0; i < workBlock->size(); i += _base->_globalWorkSize )
    {
       // write input buffers to device
-      int steps {min(_base->_kernelSize, (int)workBlock->size() - i)};
+      int globalWorkSize {(int) min((qint64)_base->_globalWorkSize, workBlock->size() - i)};
 
       _buffers.in_index.mapWrite(_queue).wait();
 
-      for ( int j = 0; j < steps; ++j )
+      for ( int j = 0; j < globalWorkSize; ++j )
       {
          _buffers.in_index[j] = { index.getX(), index.getY() };
          ++index;
       }
 
-      for ( int j = steps; j < _base->_kernelSize; ++j )
-      {
-         _buffers.in_index[j] = { 0, 0 };
-      }
-
       _buffers.in_index.unmap(_queue).wait();
 
       // execute fetch-pair kernel
       _kernels.fetchPair->execute(
          _queue,
-         _base->_kernelSize,
+         globalWorkSize,
+         _base->_localWorkSize,
          &_baseOpenCL->_expressions,
-         _base->_input->getSampleSize(),
+         _base->_input->sampleSize(),
          &_buffers.in_index,
          _base->_minExpression,
          &_buffers.work_X,
@@ -134,50 +118,44 @@ std::unique_ptr<EAbstractAnalytic::Block> Similarity::OpenCL::Worker::execute(co
          &_buffers.out_labels
       ).wait();
 
-      // execute clustering kernel
-      if ( _base->_clusMethod == ClusteringMethod::GMM )
+      // execute outlier kernel (pre-clustering)
+      if ( _base->_removePreOutliers )
       {
-         _kernels.gmm->execute(
+         _kernels.outlier->execute(
             _queue,
-            _base->_kernelSize,
-            &_baseOpenCL->_expressions,
-            _base->_input->getSampleSize(),
-            _base->_minSamples,
-            _base->_minClusters,
-            _base->_maxClusters,
-            _base->_criterion,
-            _base->_removePreOutliers,
-            _base->_removePostOutliers,
+            globalWorkSize,
+            _base->_localWorkSize,
             &_buffers.work_X,
             &_buffers.work_N,
-            &_buffers.work_labels,
-            &_buffers.work_components,
-            &_buffers.work_MP,
-            &_buffers.work_counts,
-            &_buffers.work_logpi,
-            &_buffers.work_loggamma,
-            &_buffers.work_logGamma,
+            &_buffers.out_labels,
+            _base->_input->sampleSize(),
             &_buffers.out_K,
-            &_buffers.out_labels
-         ).wait();
+            -7,
+            &_buffers.work_x,
+            &_buffers.work_y
+         );
       }
-      else if ( _base->_clusMethod == ClusteringMethod::KMeans )
+
+      // execute clustering kernel
+      if ( _base->_clusMethod == ClusteringMethod::GMM )
       {
-         _kernels.kmeans->execute(
+         _kernels.gmm->execute(
             _queue,
-            _base->_kernelSize,
-            &_baseOpenCL->_expressions,
-            _base->_input->getSampleSize(),
+            globalWorkSize,
+            _base->_localWorkSize,
+            _base->_input->sampleSize(),
             _base->_minSamples,
             _base->_minClusters,
             _base->_maxClusters,
-            _base->_removePreOutliers,
-            _base->_removePostOutliers,
+            (cl_int) _base->_criterion,
             &_buffers.work_X,
             &_buffers.work_N,
-            &_buffers.work_loggamma,
             &_buffers.work_labels,
+            &_buffers.work_components,
             &_buffers.work_MP,
+            &_buffers.work_counts,
+            &_buffers.work_logpi,
+            &_buffers.work_gamma,
             &_buffers.out_K,
             &_buffers.out_labels
          ).wait();
@@ -187,7 +165,7 @@ std::unique_ptr<EAbstractAnalytic::Block> Similarity::OpenCL::Worker::execute(co
          // set cluster size to 1 if clustering is disabled
          _buffers.out_K.mapWrite(_queue).wait();
 
-         for ( int i = 0; i < _base->_kernelSize; ++i )
+         for ( int i = 0; i < globalWorkSize; ++i )
          {
             _buffers.out_K[i] = 1;
          }
@@ -195,16 +173,35 @@ std::unique_ptr<EAbstractAnalytic::Block> Similarity::OpenCL::Worker::execute(co
          _buffers.out_K.unmap(_queue).wait();
       }
 
+      // execute outlier kernel (post-clustering)
+      if ( _base->_removePostOutliers )
+      {
+         _kernels.outlier->execute(
+            _queue,
+            globalWorkSize,
+            _base->_localWorkSize,
+            &_buffers.work_X,
+            &_buffers.work_N,
+            &_buffers.out_labels,
+            _base->_input->sampleSize(),
+            &_buffers.out_K,
+            -8,
+            &_buffers.work_x,
+            &_buffers.work_y
+         );
+      }
+
       // execute correlation kernel
       if ( _base->_corrMethod == CorrelationMethod::Pearson )
       {
          _kernels.pearson->execute(
             _queue,
-            _base->_kernelSize,
+            globalWorkSize,
+            _base->_localWorkSize,
             &_buffers.work_X,
             _base->_maxClusters,
             &_buffers.out_labels,
-            _base->_input->getSampleSize(),
+            _base->_input->sampleSize(),
             _base->_minSamples,
             &_buffers.out_correlations
          );
@@ -213,11 +210,12 @@ std::unique_ptr<EAbstractAnalytic::Block> Similarity::OpenCL::Worker::execute(co
       {
          _kernels.spearman->execute(
             _queue,
-            _base->_kernelSize,
+            globalWorkSize,
+            _base->_localWorkSize,
             &_buffers.work_X,
             _base->_maxClusters,
             &_buffers.out_labels,
-            _base->_input->getSampleSize(),
+            _base->_input->sampleSize(),
             _base->_minSamples,
             &_buffers.work_x,
             &_buffers.work_y,
@@ -236,22 +234,27 @@ std::unique_ptr<EAbstractAnalytic::Block> Similarity::OpenCL::Worker::execute(co
       e3.wait();
 
       // save results
-      for ( int j = 0; j < steps; ++j )
+      for ( int j = 0; j < globalWorkSize; ++j )
       {
-         const qint8 *labels = &_buffers.out_labels.at(j * _base->_input->getSampleSize());
+         // get pointers to the cluster labels and correlations for this pair
+         const qint8 *labels = &_buffers.out_labels.at(j * _base->_input->sampleSize());
          const float *correlations = &_buffers.out_correlations.at(j * _base->_maxClusters);
 
          Pair pair;
+
+         // save the number of clusters
          pair.K = _buffers.out_K.at(j);
 
+         // save the cluster labels (if more than one cluster was found)
          if ( pair.K > 1 )
          {
-            pair.labels = createVector(labels, _base->_input->getSampleSize());
+            pair.labels = ResultBlock::makeVector(labels, _base->_input->sampleSize());
          }
 
+         // save the correlations (if the pair was able to be processed)
          if ( pair.K > 0 )
          {
-            pair.correlations = createVector(correlations, _base->_maxClusters);
+            pair.correlations = ResultBlock::makeVector(correlations, _base->_maxClusters);
          }
 
          resultBlock->append(pair);
diff --git a/src/core/similarity_opencl_worker.h b/src/core/similarity_opencl_worker.h
index 8e68aeb..c084363 100644
--- a/src/core/similarity_opencl_worker.h
+++ b/src/core/similarity_opencl_worker.h
@@ -1,9 +1,17 @@
 #ifndef SIMILARITY_OPENCL_WORKER_H
 #define SIMILARITY_OPENCL_WORKER_H
 #include "similarity_opencl.h"
+#include "similarity_opencl_fetchpair.h"
+#include "similarity_opencl_gmm.h"
+#include "similarity_opencl_outlier.h"
+#include "similarity_opencl_pearson.h"
+#include "similarity_opencl_spearman.h"
 
 
 
+/*!
+ * This class implements the OpenCL worker of the similarity analytic.
+ */
 class Similarity::OpenCL::Worker : public EAbstractAnalytic::OpenCL::Worker
 {
    Q_OBJECT
@@ -11,41 +19,48 @@ class Similarity::OpenCL::Worker : public EAbstractAnalytic::OpenCL::Worker
    explicit Worker(Similarity* base, Similarity::OpenCL* baseOpenCL, ::OpenCL::Context* context, ::OpenCL::Program* program);
    virtual std::unique_ptr<EAbstractAnalytic::Block> execute(const EAbstractAnalytic::Block* block) override final;
 private:
+   /*!
+    * Pointer to the base analytic.
+    */
    Similarity* _base;
+   /*!
+    * Pointer to the base OpenCL object.
+    */
    Similarity::OpenCL* _baseOpenCL;
+   /*!
+    * Pointer to this worker's unique and private command queue.
+    */
    ::OpenCL::CommandQueue* _queue;
-
+   /*!
+    * Structure of this worker's kernels.
+    */
    struct
    {
       OpenCL::FetchPair* fetchPair;
       OpenCL::GMM* gmm;
-      OpenCL::KMeans* kmeans;
+      OpenCL::Outlier* outlier;
       OpenCL::Pearson* pearson;
       OpenCL::Spearman* spearman;
    } _kernels;
-
+   /*!
+    * Structure of this worker's buffers.
+    */
    struct
    {
-      // input buffers
       ::OpenCL::Buffer<cl_int2> in_index;
-
-      // clustering buffers
-      ::OpenCL::Buffer<Pairwise::Vector2> work_X;
+      ::OpenCL::Buffer<cl_float2> work_X;
       ::OpenCL::Buffer<cl_int> work_N;
+      ::OpenCL::Buffer<cl_float> work_x;
+      ::OpenCL::Buffer<cl_float> work_y;
       ::OpenCL::Buffer<cl_char> work_labels;
-      ::OpenCL::Buffer<Pairwise::GMM::Component> work_components;
-      ::OpenCL::Buffer<Pairwise::Vector2> work_MP;
+      ::OpenCL::Buffer<cl_component> work_components;
+      ::OpenCL::Buffer<cl_float2> work_MP;
       ::OpenCL::Buffer<cl_int> work_counts;
       ::OpenCL::Buffer<cl_float> work_logpi;
-      ::OpenCL::Buffer<cl_float> work_loggamma;
-      ::OpenCL::Buffer<cl_float> work_logGamma;
+      ::OpenCL::Buffer<cl_float> work_gamma;
+      ::OpenCL::Buffer<cl_int> work_rank;
       ::OpenCL::Buffer<cl_char> out_K;
       ::OpenCL::Buffer<cl_char> out_labels;
-
-      // correlation buffers
-      ::OpenCL::Buffer<cl_float> work_x;
-      ::OpenCL::Buffer<cl_float> work_y;
-      ::OpenCL::Buffer<cl_int> work_rank;
       ::OpenCL::Buffer<cl_float> out_correlations;
    } _buffers;
 };
diff --git a/src/core/similarity_resultblock.cpp b/src/core/similarity_resultblock.cpp
index b24eca8..c76ede6 100644
--- a/src/core/similarity_resultblock.cpp
+++ b/src/core/similarity_resultblock.cpp
@@ -5,10 +5,17 @@
 
 
 
+/*!
+ * Construct a new block with the given index and starting pairwise index.
+ *
+ * @param index
+ * @param start
+ */
 Similarity::ResultBlock::ResultBlock(int index, qint64 start):
    EAbstractAnalytic::Block(index),
    _start(start)
 {
+   EDEBUG_FUNC(this,index,start);
 }
 
 
@@ -16,8 +23,15 @@ Similarity::ResultBlock::ResultBlock(int index, qint64 start):
 
 
 
+/*!
+ * Append a pair to the result block's list of pairs.
+ *
+ * @param pair
+ */
 void Similarity::ResultBlock::append(const Pair& pair)
 {
+   EDEBUG_FUNC(this,&pair);
+
    _pairs.append(pair);
 }
 
@@ -26,8 +40,15 @@ void Similarity::ResultBlock::append(const Pair& pair)
 
 
 
+/*!
+ * Write this block's data to the given data stream.
+ *
+ * @param stream
+ */
 void Similarity::ResultBlock::write(QDataStream& stream) const
 {
+   EDEBUG_FUNC(this,&stream);
+
    stream << _start;
    stream << _pairs.size();
 
@@ -44,8 +65,15 @@ void Similarity::ResultBlock::write(QDataStream& stream) const
 
 
 
+/*!
+ * Read this block's data from the given data stream.
+ *
+ * @param stream
+ */
 void Similarity::ResultBlock::read(QDataStream& stream)
 {
+   EDEBUG_FUNC(this,&stream);
+
    stream >> _start;
 
    int size;
diff --git a/src/core/similarity_resultblock.h b/src/core/similarity_resultblock.h
index add5811..caf7353 100644
--- a/src/core/similarity_resultblock.h
+++ b/src/core/similarity_resultblock.h
@@ -4,12 +4,19 @@
 
 
 
+/*!
+ * This class implements the result block of the similarity analytic.
+ */
 class Similarity::ResultBlock : public EAbstractAnalytic::Block
 {
    Q_OBJECT
 public:
+   /*!
+    * Construct a new result block in an uninitialized null state.
+    */
    explicit ResultBlock() = default;
    explicit ResultBlock(int index, qint64 start);
+   template<class T> static QVector<T> makeVector(const T* data, int size);
    qint64 start() const { return _start; }
    const QVector<Pair>& pairs() const { return _pairs; }
    QVector<Pair>& pairs() { return _pairs; }
@@ -18,10 +25,37 @@ class Similarity::ResultBlock : public EAbstractAnalytic::Block
    virtual void write(QDataStream& stream) const override final;
    virtual void read(QDataStream& stream) override final;
 private:
+   /*!
+    * The pairwise index of the first pair in the result block.
+    */
    qint64 _start;
+   /*!
+    * The list of pairs that were processed.
+    */
    QVector<Pair> _pairs;
 };
 
 
 
+
+
+
+/*!
+ * Create a vector from the given pointer and size. The contents of the
+ * pointer are copied into the vector.
+ *
+ * @param data
+ * @param size
+ */
+template<class T>
+QVector<T> Similarity::ResultBlock::makeVector(const T* data, int size)
+{
+   QVector<T> v(size);
+
+   memcpy(v.data(), data, size * sizeof(T));
+   return v;
+}
+
+
+
 #endif
diff --git a/src/core/similarity_serial.cpp b/src/core/similarity_serial.cpp
index 92c7349..558a18f 100644
--- a/src/core/similarity_serial.cpp
+++ b/src/core/similarity_serial.cpp
@@ -1,6 +1,11 @@
 #include "similarity_serial.h"
 #include "similarity_resultblock.h"
 #include "similarity_workblock.h"
+#include "expressionmatrix_gene.h"
+#include "pairwise_gmm.h"
+#include "pairwise_pearson.h"
+#include "pairwise_spearman.h"
+#include <ace/core/elog.h>
 
 
 
@@ -11,18 +16,38 @@ using namespace std;
 
 
 
+/*!
+ * Construct a new serial object with the given analytic as its parent.
+ *
+ * @param parent
+ */
 Similarity::Serial::Serial(Similarity* parent):
    EAbstractAnalytic::Serial(parent),
    _base(parent)
 {
+   EDEBUG_FUNC(this,parent);
+
    // initialize clustering model
-   if ( _base->_clusMethod != ClusteringMethod::None )
+   switch ( _base->_clusMethod )
    {
-      _base->_clusModel->initialize(_base->_input);
+   case ClusteringMethod::None:
+      _clusModel = nullptr;
+      break;
+   case ClusteringMethod::GMM:
+      _clusModel = new Pairwise::GMM(_base->_input);
+      break;
    }
 
    // initialize correlation model
-   _base->_corrModel->initialize(_base->_input);
+   switch ( _base->_corrMethod )
+   {
+   case CorrelationMethod::Pearson:
+      _corrModel = new Pairwise::Pearson();
+      break;
+   case CorrelationMethod::Spearman:
+      _corrModel = new Pairwise::Spearman(_base->_input);
+      break;
+   }
 }
 
 
@@ -30,8 +55,22 @@ Similarity::Serial::Serial(Similarity* parent):
 
 
 
+/*!
+ * Read in the given work block and save the results in a new result block. This
+ * implementation takes the starting pairwise index and pair size from the work
+ * block and processes those pairs.
+ *
+ * @param block
+ */
 std::unique_ptr<EAbstractAnalytic::Block> Similarity::Serial::execute(const EAbstractAnalytic::Block* block)
 {
+   EDEBUG_FUNC(this,block);
+
+   if ( ELog::isActive() )
+   {
+      ELog() << tr("Executing(serial) work index %1.\n").arg(block->index());
+   }
+
    // cast block to work block
    const WorkBlock* workBlock {block->cast<WorkBlock>()};
 
@@ -39,8 +78,8 @@ std::unique_ptr<EAbstractAnalytic::Block> Similarity::Serial::execute(const EAbs
    ResultBlock* resultBlock {new ResultBlock(workBlock->index(), workBlock->start())};
 
    // initialize workspace
-   QVector<Pairwise::Vector2> X(_base->_input->getSampleSize());
-   QVector<qint8> labels(_base->_input->getSampleSize());
+   QVector<Pairwise::Vector2> data(_base->_input->sampleSize());
+   QVector<qint8> labels(_base->_input->sampleSize());
 
    // iterate through all pairs
    Pairwise::Index index {workBlock->start()};
@@ -48,29 +87,39 @@ std::unique_ptr<EAbstractAnalytic::Block> Similarity::Serial::execute(const EAbs
    for ( int i = 0; i < workBlock->size(); ++i )
    {
       // fetch pairwise input data
-      int numSamples = fetchPair(index, X, labels);
+      int numSamples = fetchPair(index, data, labels);
+
+      // remove pre-clustering outliers
+      if ( _base->_removePreOutliers )
+      {
+         numSamples = removeOutliers(data, numSamples, labels, 1, -7);
+      }
 
       // compute clusters
       qint8 K {1};
 
       if ( _base->_clusMethod != ClusteringMethod::None )
       {
-         K = _base->_clusModel->compute(
-            X,
+         K = _clusModel->compute(
+            data,
             numSamples,
             labels,
             _base->_minSamples,
             _base->_minClusters,
             _base->_maxClusters,
-            _base->_criterion,
-            _base->_removePreOutliers,
-            _base->_removePostOutliers
+            _base->_criterion
          );
       }
 
+      // remove post-clustering outliers
+      if ( _base->_removePostOutliers )
+      {
+         numSamples = removeOutliers(data, numSamples, labels, K, -8);
+      }
+
       // compute correlations
-      QVector<float> correlations = _base->_corrModel->compute(
-         X,
+      QVector<float> correlations = _corrModel->compute(
+         data,
          K,
          labels,
          _base->_minSamples
@@ -105,8 +154,19 @@ std::unique_ptr<EAbstractAnalytic::Block> Similarity::Serial::execute(const EAbs
 
 
 
-int Similarity::Serial::fetchPair(Pairwise::Index index, QVector<Pairwise::Vector2>& X, QVector<qint8>& labels)
+/*!
+ * Extract pairwise data from an expression matrix given a pairwise index. Samples
+ * with missing values and samples that fall below the expression threshold are
+ * excluded. The number of extracted samples is returned.
+ *
+ * @param index
+ * @param data
+ * @param labels
+ */
+int Similarity::Serial::fetchPair(const Pairwise::Index& index, QVector<Pairwise::Vector2>& data, QVector<qint8>& labels)
 {
+   EDEBUG_FUNC(this,&index,&data,&labels);
+
    // read in gene expressions
    ExpressionMatrix::Gene gene1(_base->_input);
    ExpressionMatrix::Gene gene2(_base->_input);
@@ -114,28 +174,159 @@ int Similarity::Serial::fetchPair(Pairwise::Index index, QVector<Pairwise::Vecto
    gene1.read(index.getX());
    gene2.read(index.getY());
 
-   // populate X with shared expressions of gene pair
+   // extract pairwise samples
    int numSamples = 0;
 
-   for ( int i = 0; i < _base->_input->getSampleSize(); ++i )
+   for ( int i = 0; i < _base->_input->sampleSize(); ++i )
    {
+      // exclude samples with missing values
       if ( std::isnan(gene1.at(i)) || std::isnan(gene2.at(i)) )
       {
          labels[i] = -9;
       }
+
+      // exclude samples which fall below the expression threshold
       else if ( gene1.at(i) < _base->_minExpression || gene2.at(i) < _base->_minExpression )
       {
          labels[i] = -6;
       }
+
+      // include any remaining samples
       else
       {
-         X[numSamples] = { gene1.at(i), gene2.at(i) };
+         data[numSamples] = { gene1.at(i), gene2.at(i) };
          numSamples++;
 
          labels[i] = 0;
       }
    }
 
-   // return size of X
+   // return number of extracted samples
+   return numSamples;
+}
+
+
+
+
+
+
+/*!
+ * Remove outliers from a vector of pairwise data. Outliers are detected independently
+ * on each axis using the Tukey method, and marked with the given marker. Only the
+ * samples in the given cluster are used in outlier detection. For unclustered data,
+ * all samples are labeled as 0, so a cluster value of 0 should be used. The data
+ * array should only contain samples that have a non-negative label.
+ *
+ * @param data
+ * @param labels
+ * @param cluster
+ * @param marker
+ */
+int Similarity::Serial::removeOutliersCluster(QVector<Pairwise::Vector2>& data, QVector<qint8>& labels, qint8 cluster, qint8 marker)
+{
+   EDEBUG_FUNC(this,&data,&labels,cluster,marker);
+
+   // extract univariate data from the given cluster
+   QVector<float> x_sorted;
+   QVector<float> y_sorted;
+
+   x_sorted.reserve(labels.size());
+   y_sorted.reserve(labels.size());
+
+   for ( int i = 0, j = 0; i < labels.size(); i++ )
+   {
+      if ( labels[i] >= 0 )
+      {
+         if ( labels[i] == cluster )
+         {
+            x_sorted.append(data[j].s[0]);
+            y_sorted.append(data[j].s[1]);
+         }
+
+         j++;
+      }
+   }
+
+   // return if the given cluster is empty
+   if ( x_sorted.size() == 0 || y_sorted.size() == 0 )
+   {
+      return 0;
+   }
+
+   // sort samples for each axis
+   std::sort(x_sorted.begin(), x_sorted.end());
+   std::sort(y_sorted.begin(), y_sorted.end());
+
+   // compute quartiles and thresholds for each axis
+   const int n = x_sorted.size();
+
+   float Q1_x = x_sorted[n * 1 / 4];
+   float Q3_x = x_sorted[n * 3 / 4];
+   float T_x_min = Q1_x - 1.5f * (Q3_x - Q1_x);
+   float T_x_max = Q3_x + 1.5f * (Q3_x - Q1_x);
+
+   float Q1_y = y_sorted[n * 1 / 4];
+   float Q3_y = y_sorted[n * 3 / 4];
+   float T_y_min = Q1_y - 1.5f * (Q3_y - Q1_y);
+   float T_y_max = Q3_y + 1.5f * (Q3_y - Q1_y);
+
+   // remove outliers
+   int numSamples = 0;
+
+   for ( int i = 0, j = 0; i < labels.size(); i++ )
+   {
+      if ( labels[i] >= 0 )
+      {
+         // mark samples in the given cluster that are outliers on either axis
+         if ( labels[i] == cluster && (data[j].s[0] < T_x_min || T_x_max < data[j].s[0] || data[j].s[1] < T_y_min || T_y_max < data[j].s[1]) )
+         {
+            labels[i] = marker;
+         }
+
+         // preserve all other non-outlier samples in the data array
+         else
+         {
+            data[numSamples] = data[j];
+            numSamples++;
+         }
+
+         j++;
+      }
+   }
+
+   // return number of remaining samples
+   return numSamples;
+}
+
+
+
+
+
+
+/*!
+ * Perform outlier removal on each cluster in a parwise data array.
+ *
+ * @param data
+ * @param numSamples
+ * @param labels
+ * @param clusterSize
+ * @param marker
+ */
+int Similarity::Serial::removeOutliers(QVector<Pairwise::Vector2>& data, int numSamples, QVector<qint8>& labels, qint8 clusterSize, qint8 marker)
+{
+   EDEBUG_FUNC(this,&data,numSamples,&labels,clusterSize,marker);
+
+   // do not perform post-clustering outlier removal if there is only one cluster
+   if ( marker == -8 && clusterSize <= 1 )
+   {
+      return numSamples;
+   }
+
+   // perform outlier removal on each cluster
+   for ( qint8 k = 0; k < clusterSize; ++k )
+   {
+      numSamples = removeOutliersCluster(data, labels, k, marker);
+   }
+
    return numSamples;
 }
diff --git a/src/core/similarity_serial.h b/src/core/similarity_serial.h
index d1f2d85..d94f66c 100644
--- a/src/core/similarity_serial.h
+++ b/src/core/similarity_serial.h
@@ -1,9 +1,14 @@
 #ifndef SIMILARITY_SERIAL_H
 #define SIMILARITY_SERIAL_H
 #include "similarity.h"
+#include "pairwise_clusteringmodel.h"
+#include "pairwise_correlationmodel.h"
 
 
 
+/*!
+ * This class implements the serial working class of the similarity analytic.
+ */
 class Similarity::Serial : public EAbstractAnalytic::Serial
 {
    Q_OBJECT
@@ -11,9 +16,21 @@ class Similarity::Serial : public EAbstractAnalytic::Serial
    explicit Serial(Similarity* parent);
    virtual std::unique_ptr<EAbstractAnalytic::Block> execute(const EAbstractAnalytic::Block* block) override final;
 private:
-   int fetchPair(Pairwise::Index index, QVector<Pairwise::Vector2>& X, QVector<qint8>& labels);
-
+   int fetchPair(const Pairwise::Index& index, QVector<Pairwise::Vector2>& data, QVector<qint8>& labels);
+   int removeOutliersCluster(QVector<Pairwise::Vector2>& data, QVector<qint8>& labels, qint8 cluster, qint8 marker);
+   int removeOutliers(QVector<Pairwise::Vector2>& data, int numSamples, QVector<qint8>& labels, qint8 clusterSize, qint8 marker);
+   /*!
+    * Pointer to the base analytic for this object.
+    */
    Similarity* _base;
+   /*!
+    * Pointer to the clustering model to use.
+    */
+   Pairwise::ClusteringModel* _clusModel {nullptr};
+   /*!
+    * Pointer to the correlation model to use.
+    */
+   Pairwise::CorrelationModel* _corrModel {nullptr};
 };
 
 
diff --git a/src/core/similarity_workblock.cpp b/src/core/similarity_workblock.cpp
index de40777..85b80c4 100644
--- a/src/core/similarity_workblock.cpp
+++ b/src/core/similarity_workblock.cpp
@@ -5,11 +5,20 @@
 
 
 
+/*!
+ * Construct a new block with the given index, starting pairwise index,
+ * and pair size.
+ *
+ * @param index
+ * @param start
+ * @param size
+ */
 Similarity::WorkBlock::WorkBlock(int index, qint64 start, qint64 size):
    EAbstractAnalytic::Block(index),
    _start(start),
    _size(size)
 {
+   EDEBUG_FUNC(this,index,start,size);
 }
 
 
@@ -17,8 +26,15 @@ Similarity::WorkBlock::WorkBlock(int index, qint64 start, qint64 size):
 
 
 
+/*!
+ * Write this block's data to the given data stream.
+ *
+ * @param stream
+ */
 void Similarity::WorkBlock::write(QDataStream& stream) const
 {
+   EDEBUG_FUNC(this,&stream);
+
    stream << _start << _size;
 }
 
@@ -27,7 +43,14 @@ void Similarity::WorkBlock::write(QDataStream& stream) const
 
 
 
+/*!
+ * Read this block's data from the given data stream.
+ *
+ * @param stream
+ */
 void Similarity::WorkBlock::read(QDataStream& stream)
 {
+   EDEBUG_FUNC(this,&stream);
+
    stream >> _start >> _size;
 }
diff --git a/src/core/similarity_workblock.h b/src/core/similarity_workblock.h
index 38f885f..4d1aa5b 100644
--- a/src/core/similarity_workblock.h
+++ b/src/core/similarity_workblock.h
@@ -4,10 +4,16 @@
 
 
 
+/*!
+ * This class implements the work block of the similarity analytic.
+ */
 class Similarity::WorkBlock : public EAbstractAnalytic::Block
 {
    Q_OBJECT
 public:
+   /*!
+    * Construct a new work block in an uninitialized null state.
+    */
    explicit WorkBlock() = default;
    explicit WorkBlock(int index, qint64 start, qint64 size);
    qint64 start() const { return _start; }
@@ -16,7 +22,13 @@ class Similarity::WorkBlock : public EAbstractAnalytic::Block
    virtual void write(QDataStream& stream) const override final;
    virtual void read(QDataStream& stream) override final;
 private:
+   /*!
+    * The pairwise index of the first pair to process.
+    */
    qint64 _start;
+   /*!
+    * The number of pairs to process.
+    */
    qint64 _size;
 };
 
diff --git a/src/gui/gui.pro b/src/gui/gui.pro
index ccce99e..185401a 100644
--- a/src/gui/gui.pro
+++ b/src/gui/gui.pro
@@ -5,6 +5,7 @@ include (../KINC.pri)
 # Basic settings
 QT += gui widgets
 TARGET = qkinc
+TEMPLATE = app
 
 # External libraries
 LIBS += -lacegui
@@ -12,6 +13,10 @@ LIBS += -lacegui
 # Compiler defines
 DEFINES += GUI=1
 
+# Source files
+SOURCES += \
+    ../main.cpp
+
 # Installation instructions
 isEmpty(PREFIX) { PREFIX = /usr/local }
 program.path = $${PREFIX}/bin
diff --git a/src/main.cpp b/src/main.cpp
index a905bb0..9b244a2 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -17,7 +17,7 @@ using namespace std;
 
 int main(int argc, char *argv[])
 {
-   EApplication application(""
+   EApplication application("SystemsGenetics"
                             ,"kinc"
                             ,MAJOR_VERSION
                             ,MINOR_VERSION
diff --git a/src/opencl.qrc b/src/opencl.qrc
index c8ffac2..611bd7d 100644
--- a/src/opencl.qrc
+++ b/src/opencl.qrc
@@ -2,7 +2,6 @@
     <qresource prefix="/">
         <file>opencl/fetchpair.cl</file>
         <file>opencl/gmm.cl</file>
-        <file>opencl/kmeans.cl</file>
         <file>opencl/linalg.cl</file>
         <file>opencl/outlier.cl</file>
         <file>opencl/pearson.cl</file>
diff --git a/src/opencl/fetchpair.cl b/src/opencl/fetchpair.cl
index 426926b..4ba2de3 100644
--- a/src/opencl/fetchpair.cl
+++ b/src/opencl/fetchpair.cl
@@ -6,10 +6,12 @@
 
 
 
-/**
- * Fetch pairwise data for a pair of genes. Samples which are nan or are
- * below a threshold are excluded.
+/*!
+ * Extract pairwise data from an expression matrix given a pairwise index. Samples
+ * with missing values and samples that fall below the expression threshold are
+ * excluded. The number of extracted samples is returned.
  *
+ * @param globalWorkSize
  * @param expressions
  * @param sampleSize
  * @param in_index
@@ -19,6 +21,7 @@
  * @param out_labels
  */
 __kernel void fetchPair(
+   int globalWorkSize,
    __global const float *expressions,
    int sampleSize,
    __global const int2 *in_index,
@@ -29,23 +32,23 @@ __kernel void fetchPair(
 {
    int i = get_global_id(0);
 
+   if ( i >= globalWorkSize )
+   {
+      return;
+   }
+
    // initialize variables
    int2 index = in_index[i];
    __global Vector2 *X = &out_X[i * sampleSize];
    __global char *labels = &out_labels[i * sampleSize];
-   __global int *p_N = &out_N[i];
-
-   if ( index.x == 0 && index.y == 0 )
-   {
-      return;
-   }
+   __global int *p_numSamples = &out_N[i];
 
    // index into gene expressions
    __global const float *gene1 = &expressions[index.x * sampleSize];
    __global const float *gene2 = &expressions[index.y * sampleSize];
 
    // populate X with shared expressions of gene pair
-   int N = 0;
+   int numSamples = 0;
 
    for ( int i = 0; i < sampleSize; ++i )
    {
@@ -59,13 +62,13 @@ __kernel void fetchPair(
       }
       else
       {
-         X[N].v2 = (float2) ( gene1[i], gene2[i] );
-         N++;
+         X[numSamples] = (float2) ( gene1[i], gene2[i] );
+         numSamples++;
 
          labels[i] = 0;
       }
    }
 
    // return size of X
-   *p_N = N;
+   *p_numSamples = numSamples;
 }
diff --git a/src/opencl/gmm.cl b/src/opencl/gmm.cl
index 970e549..ca4cfb2 100644
--- a/src/opencl/gmm.cl
+++ b/src/opencl/gmm.cl
@@ -22,21 +22,61 @@ typedef struct
 
 
 
+typedef struct
+{
+   __global Component *components;
+   int K;
+   float logL;
+   float entropy;
+   __global Vector2 *_Mu;
+   __global int *_counts;
+   __global float *_logpi;
+   __global float *_gamma;
+} GMM;
+
+
+
+
+
+
+/*!
+ * Implementation of rand(), taken from POSIX example.
+ *
+ * @param state
+ */
+int rand(ulong *state)
+{
+   *state = (*state) * 1103515245 + 12345;
+   return ((unsigned)((*state)/65536) % 32768);
+}
+
+
+
+
+
+/*!
+ * Initialize a mixture component with the given mixture weight and mean.
+ *
+ * @param component
+ * @param pi
+ * @param mu
+ */
 void GMM_Component_initialize(
    __global Component *component,
    float pi,
    __global const Vector2 *mu)
 {
-   // initialize pi and mu as given
+   // initialize mixture weight and mean
    component->pi = pi;
    component->mu = *mu;
 
-   // Use identity covariance- assume dimensions are independent
+   // initialize covariance to identity matrix
    matrixInitIdentity(&component->sigma);
 
-   // Initialize zero artifacts
+   // initialize precision to zero matrix
    matrixInitZero(&component->sigmaInv);
 
+   // initialize normalizer term to 0
    component->normalizer = 0;
 }
 
@@ -45,23 +85,29 @@ void GMM_Component_initialize(
 
 
 
-bool GMM_Component_prepareCovariance(__global Component *component)
+/*!
+ * Pre-compute the precision matrix and normalizer term for a mixture component.
+ *
+ * @param component
+ */
+bool GMM_Component_prepare(__global Component *component)
 {
    const int D = 2;
 
-   // Compute inverse of Sigma once each iteration instead of
-   // repeatedly for each calcLogMvNorm execution.
+   // compute precision (inverse of covariance)
    float det;
    matrixInverse(&component->sigma, &component->sigmaInv, &det);
 
-   if ( fabs(det) <= 0 )
+   // return failure if matrix inverse failed
+   if ( det <= 0 )
    {
       return false;
    }
 
-   // Compute normalizer for multivariate normal distribution
+   // compute normalizer term for multivariate normal distribution
    component->normalizer = -0.5f * (D * log(2.0f * M_PI) + log(det));
 
+   // return success
    return true;
 }
 
@@ -70,30 +116,40 @@ bool GMM_Component_prepareCovariance(__global Component *component)
 
 
 
-void GMM_Component_calcLogMvNorm(
+/*!
+ * Compute the log of the probability density function of the multivariate normal
+ * distribution conditioned on a single component for each point in X:
+ *
+ *   P(x|k) = exp(-0.5 * (x - mu)^T Sigma^-1 (x - mu)) / sqrt((2pi)^d det(Sigma))
+ *
+ * Therefore the log-probability is:
+ *
+ *   log(P(x|k)) = -0.5 * (x - mu)^T Sigma^-1 (x - mu) - 0.5 * (d * log(2pi) + log(det(Sigma)))
+ *
+ * @param component
+ * @param X
+ * @param N
+ * @param logP
+ */
+void GMM_Component_computeLogProbNorm(
    __global const Component *component,
    __global const Vector2 *X, int N,
    __global float *logP)
 {
-   // Here we are computing the probability density function of the multivariate
-   // normal distribution conditioned on a single component for the set of points
-   // given by X.
-   //
-   // P(x|k) = exp{ -0.5 * (x - mu)^T Sigma^{-} (x - mu) } / sqrt{ (2pi)^d det(Sigma) }
-
    for (int i = 0; i < N; ++i)
    {
-      // Let xm = (x - mu)
+      // compute xm = (x - mu)
       Vector2 xm = X[i];
       vectorSubtract(&xm, &component->mu);
 
-      // Compute xm^T Sxm = xm^T S^-1 xm
+      // compute Sxm = Sigma^-1 xm
       Vector2 Sxm;
       matrixProduct(&component->sigmaInv, &xm, &Sxm);
 
+      // compute xmSxm = xm^T Sigma^-1 xm
       float xmSxm = vectorDot(&xm, &Sxm);
 
-      // Compute log(P) = normalizer - 0.5 * xm^T * S^-1 * xm
+      // compute log(P) = normalizer - 0.5 * xm^T * Sigma^-1 * xm
       logP[i] = component->normalizer - 0.5f * xmSxm;
    }
 }
@@ -103,61 +159,73 @@ void GMM_Component_calcLogMvNorm(
 
 
 
-void GMM_kmeans(
-   __global Component *components, int K,
-   __global const Vector2 *X, int N,
-   __global Vector2 *MP,
-   __global int *counts)
+/*!
+ * Initialize the mean of each component in the mixture model using k-means
+ * clustering.
+ *
+ * @param gmm
+ * @param X
+ * @param N
+ */
+void GMM_initializeMeans(GMM *gmm, __global const Vector2 *X, int N)
 {
+   const int K = gmm->K;
+
    const int MAX_ITERATIONS = 20;
    const float TOLERANCE = 1e-3;
    float diff = 0;
 
+   // initialize workspace
+   __global Vector2 *Mu = gmm->_Mu;
+   __global int *counts = gmm->_counts;
+
    for (int t = 0; t < MAX_ITERATIONS && diff > TOLERANCE; ++t)
    {
-      // initialize old means
+      // compute mean and sample count for each component
       for (int k = 0; k < K; ++k)
       {
-         vectorInitZero(&MP[k]);
+         vectorInitZero(&Mu[k]);
          counts[k] = 0;
       }
 
-      // compute new means
       for (int i = 0; i < N; ++i)
       {
-         float minD = INFINITY;
-         int minDk = 0;
+         // determine the component mean which is nearest to x_i
+         float min_dist = INFINITY;
+         int min_k = 0;
          for (int k = 0; k < K; ++k)
          {
-            float dist = vectorDiffNorm(&X[i], &components[k].mu);
-            if (minD > dist)
+            float dist = vectorDiffNorm(&X[i], &gmm->components[k].mu);
+            if (min_dist > dist)
             {
-               minD = dist;
-               minDk = k;
+               min_dist = dist;
+               min_k = k;
             }
          }
 
-         vectorAdd(&MP[minDk], &X[i]);
-         ++counts[minDk];
+         // update mean and sample count
+         vectorAdd(&Mu[min_k], &X[i]);
+         ++counts[min_k];
       }
 
+      // scale each mean by its sample count
       for (int k = 0; k < K; ++k)
       {
-         vectorScale(&MP[k], 1.0f / counts[k]);
+         vectorScale(&Mu[k], 1.0f / counts[k]);
       }
 
-      // check for convergence
+      // compute the total change of all means
       diff = 0;
       for (int k = 0; k < K; ++k)
       {
-         diff += vectorDiffNorm(&MP[k], &components[k].mu);
+         diff += vectorDiffNorm(&Mu[k], &gmm->components[k].mu);
       }
       diff /= K;
 
-      // copy new means to components
+      // update component means
       for (int k = 0; k < K; ++k)
       {
-         components[k].mu = MP[k];
+         gmm->components[k].mu = Mu[k];
       }
    }
 }
@@ -167,117 +235,77 @@ void GMM_kmeans(
 
 
 
-void GMM_calcLogMvNorm(
-   __global const Component *components, int K,
-   __global const Vector2 *X, int N,
-   __global float *loggamma)
+/*!
+ * Perform the expectation step of the EM algorithm. In this step we compute
+ * gamma, the posterior probabilities for each component in the mixture model
+ * and each sample in X, as well as the log-likelihood of the model:
+ *
+ *   log(p(x_i)) = a + log(sum(exp(log(pi_k) + log(P(x_i|k))) - a))
+ *
+ *   gamma_ki = exp(log(pi_k) + log(P(x_i|k)) - log(p(x_i)))
+ *
+ *   log(L) = sum(log(p(x_i)))
+ *
+ * @param gmm
+ * @param X
+ * @param N
+ */
+float GMM_computeEStep(GMM *gmm, __global const Vector2 *X, int N)
 {
-   for ( int k = 0; k < K; ++k )
+   const int K = gmm->K;
+
+   // compute logpi
+   for (int k = 0; k < K; ++k)
    {
-      GMM_Component_calcLogMvNorm(&components[k], X, N, &loggamma[k * N]);
+      gmm->_logpi[k] = log(gmm->components[k].pi);
    }
-}
-
-
 
+   // compute the log-probability for each component and each point in X
+   __global float *logProb = gmm->_gamma;
 
+   for ( int k = 0; k < K; ++k )
+   {
+      GMM_Component_computeLogProbNorm(&gmm->components[k], X, N, &logProb[k * N]);
+   }
 
+   // compute gamma and log-likelihood
+   float logL = 0.0;
 
-void GMM_calcLogLikelihoodAndGammaNK(
-   __global const float *logpi, int K,
-   __global float *loggamma, int N,
-   float *logL)
-{
-   *logL = 0.0;
    for (int i = 0; i < N; ++i)
    {
+      // compute a = argmax(logpi_k + logProb_ki, k)
       float maxArg = -INFINITY;
       for (int k = 0; k < K; ++k)
       {
-         const float logProbK = logpi[k] + loggamma[k * N + i];
-         if (logProbK > maxArg)
+         float arg = gmm->_logpi[k] + logProb[k * N + i];
+         if (maxArg < arg)
          {
-            maxArg = logProbK;
+            maxArg = arg;
          }
       }
 
+      // compute logpx
       float sum = 0.0;
       for (int k = 0; k < K; ++k)
       {
-         const float logProbK = logpi[k] + loggamma[k * N + i];
-         sum += exp(logProbK - maxArg);
-      }
-
-      const float logpx = maxArg + log(sum);
-      *logL += logpx;
-      for (int k = 0; k < K; ++k)
-      {
-         loggamma[k * N + i] += -logpx;
-      }
-   }
-}
-
-
-
-
-
-
-void GMM_calcLogGammaK(
-   __global const float *loggamma, int N, int K,
-   __global float *logGamma)
-{
-   for (int k = 0; k < K; ++k)
-   {
-      __global const float *loggammak = &loggamma[k * N];
-
-      float maxArg = -INFINITY;
-      for (int i = 0; i < N; ++i)
-      {
-         const float loggammank = loggammak[i];
-         if (loggammank > maxArg)
-         {
-            maxArg = loggammank;
-         }
-      }
-
-      float sum = 0;
-      for (int i = 0; i < N; ++i)
-      {
-         const float loggammank = loggammak[i];
-         sum += exp(loggammank - maxArg);
+         sum += exp(gmm->_logpi[k] + logProb[k * N + i] - maxArg);
       }
 
-      logGamma[k] = maxArg + log(sum);
-   }
-}
-
-
-
+      float logpx = maxArg + log(sum);
 
-
-
-float GMM_calcLogGammaSum(
-   __global const float *logpi, int K,
-   __global const float *logGamma)
-{
-   float maxArg = -INFINITY;
-   for (int k = 0; k < K; ++k)
-   {
-      const float arg = logpi[k] + logGamma[k];
-      if (arg > maxArg)
+      // compute gamma_ki
+      for (int k = 0; k < K; ++k)
       {
-         maxArg = arg;
+         gmm->_gamma[k * N + i] += gmm->_logpi[k] - logpx;
+         gmm->_gamma[k * N + i] = exp(gmm->_gamma[k * N + i]);
       }
-   }
 
-   float sum = 0;
-   for (int k = 0; k < K; ++k)
-   {
-      const float arg = logpi[k] + logGamma[k];
-      sum += exp(arg - maxArg);
+      // update log-likelihood
+      logL += logpx;
    }
 
-   return maxArg + log(sum);
+   // return log-likelihood
+   return logL;
 }
 
 
@@ -285,79 +313,83 @@ float GMM_calcLogGammaSum(
 
 
 
-bool GMM_performMStep(
-   __global Component *components, int K,
-   __global float *logpi,
-   __global float *loggamma,
-   __global float *logGamma,
-   float logGammaSum,
-   __global const Vector2 *X, int N)
+/*!
+ * Perform the maximization step of the EM algorithm. In this step we update the
+ * parameters of the the mixture model using gamma, which is computed during the
+ * expectation step:
+ *
+ *   n_k = sum(gamma_ki)
+ *
+ *   pi_k = n_k / N
+ *
+ *   mu_k = sum(gamma_ki * x_i)) / n_k
+ *
+ *   Sigma_k = sum(gamma_ki * (x_i - mu_k) * (x_i - mu_k)^T) / n_k
+ *
+ * @param gmm
+ * @param X
+ * @param N
+ */
+bool GMM_computeMStep(GMM *gmm, __global const Vector2 *X, int N)
 {
-   // update pi
-   for (int k = 0; k < K; ++k)
-   {
-      logpi[k] += logGamma[k] - logGammaSum;
-
-      components[k].pi = exp(logpi[k]);
-   }
+   const int K = gmm->K;
 
-   // convert loggamma / logGamma to gamma / Gamma to avoid duplicate exp(x) calls
    for (int k = 0; k < K; ++k)
    {
+      // compute n_k = sum(gamma_ki)
+      float n_k = 0;
+
       for (int i = 0; i < N; ++i)
       {
-         const int idx = k * N + i;
-         loggamma[idx] = exp(loggamma[idx]);
+         n_k += gmm->_gamma[k * N + i];
       }
-   }
 
-   for (int k = 0; k < K; ++k)
-   {
-      logGamma[k] = exp(logGamma[k]);
-   }
+      // update mixture weight
+      gmm->components[k].pi = n_k / N;
 
-   for (int k = 0; k < K; ++k)
-   {
-      // Update mu
-      __global Vector2 *mu = &components[k].mu;
+      // update mean
+      __global Vector2 *mu = &gmm->components[k].mu;
 
       vectorInitZero(mu);
 
       for (int i = 0; i < N; ++i)
       {
-         vectorAddScaled(mu, loggamma[k * N + i], &X[i]);
+         vectorAddScaled(mu, gmm->_gamma[k * N + i], &X[i]);
       }
 
-      vectorScale(mu, 1.0f / logGamma[k]);
+      vectorScale(mu, 1.0f / n_k);
 
-      // Update sigma
-      __global Matrix2x2 *sigma = &components[k].sigma;
+      // update covariance matrix
+      __global Matrix2x2 *sigma = &gmm->components[k].sigma;
 
       matrixInitZero(sigma);
 
       for (int i = 0; i < N; ++i)
       {
-         // xm = (x - mu)
+         // compute xm = (x_i - mu_k)
          Vector2 xm = X[i];
          vectorSubtract(&xm, mu);
 
-         // S_i = gamma_ik * (x - mu) (x - mu)^T
+         // compute Sigma_ki = gamma_ki * (x_i - mu_k) (x_i - mu_k)^T
          Matrix2x2 outerProduct;
          matrixOuterProduct(&xm, &xm, &outerProduct);
 
-         matrixAddScaled(sigma, loggamma[k * N + i], &outerProduct);
+         matrixAddScaled(sigma, gmm->_gamma[k * N + i], &outerProduct);
       }
 
-      matrixScale(sigma, 1.0f / logGamma[k]);
+      matrixScale(sigma, 1.0f / n_k);
 
-      bool success = GMM_Component_prepareCovariance(&components[k]);
+      // pre-compute precision matrix and normalizer term
+      bool success = GMM_Component_prepare(&gmm->components[k]);
 
+      // return failure if matrix inverse failed
       if ( !success )
       {
          return false;
       }
    }
 
+   // return success
    return true;
 }
 
@@ -366,24 +398,36 @@ bool GMM_performMStep(
 
 
 
-void GMM_calcLabels(
-   __global const float *loggamma, int N, int K,
+/*!
+ * Compute the cluster labels of a dataset using gamma:
+ *
+ *   y_i = argmax(gamma_ki, k)
+ *
+ * @param gamma
+ * @param N
+ * @param K
+ * @param labels
+ */
+void GMM_computeLabels(
+   __global const float *gamma, int N, int K,
    __global char *labels)
 {
    for ( int i = 0; i < N; ++i )
    {
+      // determine the value k for which gamma_ki is highest
       int max_k = -1;
       float max_gamma = -INFINITY;
 
       for ( int k = 0; k < K; ++k )
       {
-         if ( max_gamma < loggamma[k * N + i] )
+         if ( max_gamma < gamma[k * N + i] )
          {
             max_k = k;
-            max_gamma = loggamma[k * N + i];
+            max_gamma = gamma[k * N + i];
          }
       }
 
+      // assign x_i to cluster k
       labels[i] = max_k;
    }
 }
@@ -393,8 +437,18 @@ void GMM_calcLabels(
 
 
 
-float GMM_calcEntropy(
-   __global const float *loggamma, int N,
+/*!
+ * Compute the entropy of the mixture model for a dataset using gamma
+ * and the given cluster labels:
+ *
+ *   E = sum(sum(z_ki * log(gamma_ki))), z_ki = (y_i == k)
+ *
+ * @param gamma
+ * @param N
+ * @param labels
+ */
+float GMM_computeEntropy(
+   __global const float *gamma, int N,
    __global const char *labels)
 {
    float E = 0;
@@ -403,7 +457,7 @@ float GMM_calcEntropy(
    {
       int k = labels[i];
 
-      E += log(loggamma[k * N + i]);
+      E += log(gamma[k * N + i]);
    }
 
    return E;
@@ -414,72 +468,60 @@ float GMM_calcEntropy(
 
 
 
-/**
- * Compute a Gaussian mixture model from a dataset.
+/*!
+ * Fit the mixture model to a pairwise data array and compute the output cluster
+ * labels for the data. The data array should only contain clean samples.
+ *
+ * @param gmm
+ * @param X
+ * @param N
+ * @param K
+ * @param labels
  */
 bool GMM_fit(
+   GMM *gmm,
    __global const Vector2 *X, int N, int K,
-   __global char *labels,
-   float *logL,
-   float *entropy,
-   __global Component *components,
-   __global Vector2 *MP,
-   __global int *counts,
-   __global float *logpi,
-   __global float *loggamma,
-   __global float *logGamma)
+   __global char *labels)
 {
    ulong state = 1;
 
    // initialize components
+   gmm->K = K;
+
    for ( int k = 0; k < K; ++k )
    {
-      // use uniform mixture proportion and randomly sampled mean
+      // use uniform mixture weight and randomly sampled mean
       int i = rand(&state) % N;
 
-      GMM_Component_initialize(&components[k], 1.0f / K, &X[i]);
-      GMM_Component_prepareCovariance(&components[k]);
+      GMM_Component_initialize(&gmm->components[k], 1.0f / K, &X[i]);
+      GMM_Component_prepare(&gmm->components[k]);
    }
 
    // initialize means with k-means
-   GMM_kmeans(components, K, X, N, MP, counts);
-
-   // initialize workspace
-   for (int k = 0; k < K; ++k)
-   {
-      logpi[k] = log(components[k].pi);
-   }
+   GMM_initializeMeans(gmm, X, N);
 
    // run EM algorithm
    const int MAX_ITERATIONS = 100;
    const float TOLERANCE = 1e-8;
    float prevLogL = -INFINITY;
-   float currentLogL = -INFINITY;
+   float currLogL = -INFINITY;
 
    for ( int t = 0; t < MAX_ITERATIONS; ++t )
    {
-      // E step
-      // compute gamma, log-likelihood
-      GMM_calcLogMvNorm(components, K, X, N, loggamma);
-
-      prevLogL = currentLogL;
-      GMM_calcLogLikelihoodAndGammaNK(logpi, K, loggamma, N, &currentLogL);
+      // perform E step
+      prevLogL = currLogL;
+      currLogL = GMM_computeEStep(gmm, X, N);
 
       // check for convergence
-      if ( fabs(currentLogL - prevLogL) < TOLERANCE )
+      if ( fabs(currLogL - prevLogL) < TOLERANCE )
       {
          break;
       }
 
-      // M step
-      // Let Gamma[k] = \Sum_i gamma[k, i]
-      GMM_calcLogGammaK(loggamma, N, K, logGamma);
-
-      float logGammaSum = GMM_calcLogGammaSum(logpi, K, logGamma);
-
-      // Update parameters
-      bool success = GMM_performMStep(components, K, logpi, loggamma, logGamma, logGammaSum, X, N);
+      // perform M step
+      bool success = GMM_computeMStep(gmm, X, N);
 
+      // return failure if M-step failed (due to matrix inverse)
       if ( !success )
       {
          return false;
@@ -487,9 +529,9 @@ bool GMM_fit(
    }
 
    // save outputs
-   *logL = currentLogL;
-   GMM_calcLabels(loggamma, N, K, labels);
-   *entropy = GMM_calcEntropy(loggamma, N, labels);
+   gmm->logL = currLogL;
+   GMM_computeLabels(gmm->_gamma, N, K, labels);
+   gmm->entropy = GMM_computeEntropy(gmm->_gamma, N, labels);
 
    return true;
 }
@@ -501,6 +543,7 @@ bool GMM_fit(
 
 typedef enum
 {
+   AIC,
    BIC,
    ICL
 } Criterion;
@@ -510,10 +553,34 @@ typedef enum
 
 
 
-/**
- * Compute the Bayes Information Criterion of a GMM.
+/*!
+ * Compute the Akaike Information Criterion of a Gaussian mixture model.
+ *
+ * @param K
+ * @param D
+ * @param logL
+ */
+float GMM_computeAIC(int K, int D, float logL)
+{
+   int p = K * (1 + D + D * D);
+
+   return 2 * p - 2 * logL;
+}
+
+
+
+
+
+
+/*!
+ * Compute the Bayesian Information Criterion of a Gaussian mixture model.
+ *
+ * @param K
+ * @param D
+ * @param logL
+ * @param N
  */
-float GMM_computeBIC(int K, float logL, int N, int D)
+float GMM_computeBIC(int K, int D, float logL, int N)
 {
    int p = K * (1 + D + D * D);
 
@@ -525,10 +592,16 @@ float GMM_computeBIC(int K, float logL, int N, int D)
 
 
 
-/**
- * Compute the Integrated Completed Likelihood of a GMM.
+/*!
+ * Compute the Integrated Completed Likelihood of a Gaussian mixture model.
+ *
+ * @param K
+ * @param D
+ * @param logL
+ * @param N
+ * @param E
  */
-float GMM_computeICL(int K, float logL, int N, int D, float E)
+float GMM_computeICL(int K, int D, float logL, int N, float E)
 {
    int p = K * (1 + D + D * D);
 
@@ -540,22 +613,28 @@ float GMM_computeICL(int K, float logL, int N, int D, float E)
 
 
 
-/**
- * Compute a block of GMMs given a block of gene pairs.
+/*!
+ * Determine the number of clusters in a pairwise data array. Several sub-models,
+ * each one having a different number of clusters, are fit to the data and the
+ * sub-model with the best criterion value is selected. The data array should
+ * only contain samples that have a non-negative label.
  *
- * For each gene pair, several models are computed and the best model
- * is selected according to a criterion (BIC). The selected K and the
- * resulting sample mask for each pair is returned.
+ * @param globalWorkSize
+ * @param sampleSize
+ * @param minSamples
+ * @param minClusters
+ * @param maxClusters
+ * @param criterion
+ * @param out_K
+ * @param out_labels
  */
 __kernel void GMM_compute(
-   __global const float *expressions,
+   int globalWorkSize,
    int sampleSize,
    int minSamples,
    char minClusters,
    char maxClusters,
    Criterion criterion,
-   int removePreOutliers,
-   int removePostOutliers,
    __global Vector2 *work_X,
    __global int *work_N,
    __global char *work_labels,
@@ -563,71 +642,68 @@ __kernel void GMM_compute(
    __global Vector2 *work_MP,
    __global int *work_counts,
    __global float *work_logpi,
-   __global float *work_loggamma,
-   __global float *work_logGamma,
+   __global float *work_gamma,
    __global char *out_K,
    __global char *out_labels)
 {
    int i = get_global_id(0);
 
+   if ( i >= globalWorkSize )
+   {
+      return;
+   }
+
    // initialize workspace variables
-   __global Vector2 *X = &work_X[i * sampleSize];
-   int N = work_N[i];
+   __global Vector2 *data = &work_X[i * sampleSize];
+   int numSamples = work_N[i];
    __global char *labels = &work_labels[i * sampleSize];
    __global Component *components = &work_components[i * maxClusters];
-   __global Vector2 *MP = &work_MP[i * maxClusters];
+   __global Vector2 *Mu = &work_MP[i * maxClusters];
    __global int *counts = &work_counts[i * maxClusters];
    __global float *logpi = &work_logpi[i * maxClusters];
-   __global float *loggamma = &work_loggamma[i * maxClusters * sampleSize];
-   __global float *logGamma = &work_logGamma[i * maxClusters];
+   __global float *gamma = &work_gamma[i * maxClusters * sampleSize];
    __global char *bestK = &out_K[i];
    __global char *bestLabels = &out_labels[i * sampleSize];
 
-   // remove pre-clustering outliers
-   __global float *work = loggamma;
-
-   if ( removePreOutliers )
-   {
-      markOutliers(X, N, 0, bestLabels, 0, -7, work);
-      markOutliers(X, N, 1, bestLabels, 0, -7, work);
-   }
+   // initialize GMM struct
+   GMM gmm = {
+      .components = components,
+      ._Mu = Mu,
+      ._counts = counts,
+      ._logpi = logpi,
+      ._gamma = gamma
+   };
 
    // perform clustering only if there are enough samples
    *bestK = 0;
 
-   if ( N >= minSamples )
+   if ( numSamples >= minSamples )
    {
       float bestValue = INFINITY;
 
       for ( char K = minClusters; K <= maxClusters; ++K )
       {
-         // run each clustering model
-         float logL;
-         float entropy;
-
-         bool success = GMM_fit(
-            X, N, K,
-            labels, &logL, &entropy,
-            components,
-            MP, counts,
-            logpi, loggamma, logGamma
-         );
+         // run each clustering sub-model
+         bool success = GMM_fit(&gmm, data, numSamples, K, labels);
 
          if ( !success )
          {
             continue;
          }
 
-         // evaluate model
+         // compute the criterion value of the sub-model
          float value = INFINITY;
 
          switch (criterion)
          {
+         case AIC:
+            value = GMM_computeAIC(K, 2, gmm.logL);
+            break;
          case BIC:
-            value = GMM_computeBIC(K, logL, N, 2);
+            value = GMM_computeBIC(K, 2, gmm.logL, numSamples);
             break;
          case ICL:
-            value = GMM_computeICL(K, logL, N, 2, entropy);
+            value = GMM_computeICL(K, 2, gmm.logL, numSamples, gmm.entropy);
             break;
          }
 
@@ -637,7 +713,7 @@ __kernel void GMM_compute(
             *bestK = K;
             bestValue = value;
 
-            for ( int i = 0, j = 0; i < N; ++i )
+            for ( int i = 0, j = 0; i < sampleSize; ++i )
             {
                if ( bestLabels[i] >= 0 )
                {
@@ -648,17 +724,4 @@ __kernel void GMM_compute(
          }
       }
    }
-
-   if ( *bestK > 1 )
-   {
-      // remove post-clustering outliers
-      if ( removePostOutliers )
-      {
-         for ( char k = 0; k < *bestK; ++k )
-         {
-            markOutliers(X, N, 0, bestLabels, k, -8, work);
-            markOutliers(X, N, 1, bestLabels, k, -8, work);
-         }
-      }
-   }
 }
diff --git a/src/opencl/kmeans.cl b/src/opencl/kmeans.cl
deleted file mode 100644
index 305e196..0000000
--- a/src/opencl/kmeans.cl
+++ /dev/null
@@ -1,277 +0,0 @@
-
-// #include "fetchpair.cl"
-// #include "linalg.cl"
-// #include "outlier.cl"
-
-
-
-
-
-
-/**
- * Compute the log-likelihood of a K-means model given data X.
- *
- * @param X
- * @param N
- * @param y
- * @param means
- * @param K
- */
-float KMeans_computeLogLikelihood(
-   __global const Vector2 *X, int N,
-   __global const char *y,
-   __global const Vector2 *means, int K)
-{
-   // compute within-class scatter
-   float S = 0;
-
-   for ( int k = 0; k < K; ++k )
-   {
-      for ( int i = 0; i < N; ++i )
-      {
-         if ( y[i] != k )
-         {
-            continue;
-         }
-
-         float dist = vectorDiffNorm(&X[i], &means[k]);
-
-         S += dist * dist;
-      }
-   }
-
-   return -S;
-}
-
-
-
-
-
-
-/**
- * Compute a K-means clustering model from a dataset.
- */
-void KMeans_fit(
-   __global const Vector2 *X, int N, int K,
-   float *logL,
-   __global char *labels,
-   __global Vector2 *means,
-   __global char *y,
-   __global char *y_next)
-{
-   ulong state = 1;
-
-   const int NUM_INITS = 10;
-   const int MAX_ITERATIONS = 300;
-
-   // repeat with several initializations
-   *logL = -INFINITY;
-
-   for ( int init = 0; init < NUM_INITS; ++init )
-   {
-      // initialize means randomly from X
-      for ( int k = 0; k < K; ++k )
-      {
-         int i = rand(&state) % N;
-         means[k] = X[i];
-      }
-
-      // iterate K means until convergence
-      for ( int t = 0; t < MAX_ITERATIONS; ++t )
-      {
-         // compute new labels
-         for ( int i = 0; i < N; ++i )
-         {
-            // find k that minimizes norm(x_i - mu_k)
-            int min_k = -1;
-            float min_dist;
-
-            for ( int k = 0; k < K; ++k )
-            {
-               float dist = vectorDiffNorm(&X[i], &means[k]);
-
-               if ( min_k == -1 || dist < min_dist )
-               {
-                  min_k = k;
-                  min_dist = dist;
-               }
-            }
-
-            y_next[i] = min_k;
-         }
-
-         // check for convergence
-         bool converged = true;
-
-         for ( int i = 0; i < N; ++i )
-         {
-            if ( y[i] != y_next[i] )
-            {
-               converged = false;
-               break;
-            }
-         }
-
-         if ( converged )
-         {
-            break;
-         }
-
-         // update labels
-         for ( int i = 0; i < N; ++i )
-         {
-            y[i] = y_next[i];
-         }
-
-         // update means
-         for ( int k = 0; k < K; ++k )
-         {
-            // compute mu_k = mean of all x_i in cluster k
-            int n_k = 0;
-
-            vectorInitZero(&means[k]);
-
-            for ( int i = 0; i < N; ++i )
-            {
-               if ( y[i] == k )
-               {
-                  vectorAdd(&means[k], &X[i]);
-                  n_k++;
-               }
-            }
-
-            vectorScale(&means[k], 1.0f / n_k);
-         }
-      }
-
-      // save the run with the greatest log-likelihood
-      float nextLogL = KMeans_computeLogLikelihood(X, N, y, means, K);
-
-      if ( *logL < nextLogL )
-      {
-         *logL = nextLogL;
-
-         for ( int i = 0; i < N; ++i )
-         {
-            labels[i] = y[i];
-         }
-      }
-   }
-}
-
-
-
-
-
-
-/**
- * Compute the Bayes information criterion of a K-means model.
- *
- * @param K
- * @param logL
- * @param N
- * @param D
- */
-float KMeans_computeBIC(int K, float logL, int N, int D)
-{
-   int p = K * D;
-
-   return log((float) N) * p - 2 * logL;
-}
-
-
-
-
-
-
-/**
- * Compute a block of K-means models given a block of gene pairs.
- *
- * For each gene pair, several models are computed and the best model
- * is selected according to a criterion (BIC). The selected K and the
- * resulting sample mask for each pair is returned.
- */
-__kernel void KMeans_compute(
-   __global const float *expressions,
-   int sampleSize,
-   int minSamples,
-   char minClusters,
-   char maxClusters,
-   int removePreOutliers,
-   int removePostOutliers,
-   __global Vector2 *work_X,
-   __global int *work_N,
-   __global float *work_outlier,
-   __global char *work_labels,
-   __global Vector2 *work_means,
-   __global char *out_K,
-   __global char *out_labels)
-{
-   int i = get_global_id(0);
-
-   // initialize workspace variables
-   __global Vector2 *X = &work_X[i * sampleSize];
-   int N = work_N[i];
-   __global char *labels = &work_labels[(3*i+0) * sampleSize];
-   __global Vector2 *means = &work_means[i * maxClusters];
-   __global char *y = &work_labels[(3*i+1) * sampleSize];
-   __global char *y_next = &work_labels[(3*i+2) * sampleSize];
-   __global char *bestK = &out_K[i];
-   __global char *bestLabels = &out_labels[i * sampleSize];
-
-   // remove pre-clustering outliers
-   __global float *work = &work_outlier[i * sampleSize];
-
-   if ( removePreOutliers )
-   {
-      markOutliers(X, N, 0, bestLabels, 0, -7, work);
-      markOutliers(X, N, 1, bestLabels, 0, -7, work);
-   }
-
-   // perform clustering only if there are enough samples
-   *bestK = 0;
-
-   if ( N >= minSamples )
-   {
-      float bestValue = INFINITY;
-
-      for ( char K = minClusters; K <= maxClusters; ++K )
-      {
-         // run each clustering model
-         float logL;
-         KMeans_fit(X, N, K, &logL, labels, means, y, y_next);
-
-         // evaluate model
-         float value = KMeans_computeBIC(K, logL, N, 2);
-
-         // save the best model
-         if ( value < bestValue )
-         {
-            *bestK = K;
-            bestValue = value;
-
-            for ( int i = 0, j = 0; i < N; ++i )
-            {
-               if ( bestLabels[i] >= 0 )
-               {
-                  bestLabels[i] = y[j];
-                  ++j;
-               }
-            }
-         }
-      }
-   }
-
-   if ( *bestK > 1 )
-   {
-      // remove post-clustering outliers
-      if ( removePostOutliers )
-      {
-         for ( char k = 0; k < *bestK; ++k )
-         {
-            markOutliers(X, N, 0, bestLabels, k, -8, work);
-            markOutliers(X, N, 1, bestLabels, k, -8, work);
-         }
-      }
-   }
-}
diff --git a/src/opencl/linalg.cl b/src/opencl/linalg.cl
index ce44046..504b194 100644
--- a/src/opencl/linalg.cl
+++ b/src/opencl/linalg.cl
@@ -1,22 +1,22 @@
 
-typedef union
-{
-   float s[2];
-   float2 v2;
-} Vector2;
-
-typedef union
-{
-   float s[4];
-   float4 v4;
-} Matrix2x2;
-
-
-
-
-
-
-#define ELEM(M, i, j) ((M)->s[(i) * 2 + (j)])
+/*!
+ * This file provides structure and function definitions for the Vector2 and
+ * Matrix2x2 types, which are vector and matrix types with fixed dimensions.
+ * The operations defined for these types compute outputs directly without the
+ * use of loops. These types are useful for any algorithm that operates on
+ * pairwise data.
+ *
+ * Since OpenCL provides built-in vector types, Vector2 and Matrix2x2 are
+ * defined in terms of these types. The following mapping is used to map
+ * indices to xyzw:
+ *
+ *   ELEM(M, 0, 0) = M->x
+ *   ELEM(M, 0, 1) = M->y
+ *   ELEM(M, 1, 0) = M->z
+ *   ELEM(M, 1, 1) = M->w
+ */
+typedef float2 Vector2;
+typedef float4 Matrix2x2;
 
 
 
@@ -24,8 +24,8 @@ typedef union
 
 
 #define vectorInitZero(a) \
-   (a)->s[0] = 0; \
-   (a)->s[1] = 0;
+   (a)->x = 0; \
+   (a)->y = 0;
 
 
 
@@ -33,8 +33,8 @@ typedef union
 
 
 #define vectorAdd(a, b) \
-   (a)->s[0] += (b)->s[0]; \
-   (a)->s[1] += (b)->s[1];
+   (a)->x += (b)->x; \
+   (a)->y += (b)->y;
 
 
 
@@ -42,8 +42,8 @@ typedef union
 
 
 #define vectorAddScaled(a, c, b) \
-   (a)->s[0] += (c) * (b)->s[0]; \
-   (a)->s[1] += (c) * (b)->s[1];
+   (a)->x += (c) * (b)->x; \
+   (a)->y += (c) * (b)->y;
 
 
 
@@ -51,8 +51,8 @@ typedef union
 
 
 #define vectorSubtract(a, b) \
-   (a)->s[0] -= (b)->s[0]; \
-   (a)->s[1] -= (b)->s[1];
+   (a)->x -= (b)->x; \
+   (a)->y -= (b)->y;
 
 
 
@@ -60,8 +60,8 @@ typedef union
 
 
 #define vectorScale(a, c) \
-   (a)->s[0] *= (c); \
-   (a)->s[1] *= (c);
+   (a)->x *= (c); \
+   (a)->y *= (c);
 
 
 
@@ -69,7 +69,7 @@ typedef union
 
 
 #define vectorDot(a, b) \
-   ((a)->s[0] * (b)->s[0] + (a)->s[1] * (b)->s[1])
+   ((a)->x * (b)->x + (a)->y * (b)->y)
 
 
 
@@ -78,7 +78,7 @@ typedef union
 
 #define SQR(x) ((x)*(x))
 #define vectorDiffNorm(a, b) \
-   sqrt(SQR((a)->s[0] - (b)->s[0]) + SQR((a)->s[1] - (b)->s[1]))
+   sqrt(SQR((a)->x - (b)->x) + SQR((a)->y - (b)->y))
 
 
 
@@ -86,10 +86,10 @@ typedef union
 
 
 #define matrixInitIdentity(M) \
-   ELEM(M, 0, 0) = 1; \
-   ELEM(M, 0, 1) = 0; \
-   ELEM(M, 1, 0) = 0; \
-   ELEM(M, 1, 1) = 1;
+   (M)->x = 1; \
+   (M)->y = 0; \
+   (M)->z = 0; \
+   (M)->w = 1;
 
 
 
@@ -97,10 +97,10 @@ typedef union
 
 
 #define matrixInitZero(M) \
-   ELEM(M, 0, 0) = 0; \
-   ELEM(M, 0, 1) = 0; \
-   ELEM(M, 1, 0) = 0; \
-   ELEM(M, 1, 1) = 0;
+   (M)->x = 0; \
+   (M)->y = 0; \
+   (M)->z = 0; \
+   (M)->w = 0;
 
 
 
@@ -108,10 +108,10 @@ typedef union
 
 
 #define matrixAddScaled(A, c, B) \
-   ELEM(A, 0, 0) += (c) * ELEM(B, 0, 0); \
-   ELEM(A, 0, 1) += (c) * ELEM(B, 0, 1); \
-   ELEM(A, 1, 0) += (c) * ELEM(B, 1, 0); \
-   ELEM(A, 1, 1) += (c) * ELEM(B, 1, 1);
+   (A)->x += (c) * (B)->x; \
+   (A)->y += (c) * (B)->y; \
+   (A)->z += (c) * (B)->z; \
+   (A)->w += (c) * (B)->w;
 
 
 
@@ -119,10 +119,10 @@ typedef union
 
 
 #define matrixScale(A, c) \
-   ELEM(A, 0, 0) *= (c); \
-   ELEM(A, 0, 1) *= (c); \
-   ELEM(A, 1, 0) *= (c); \
-   ELEM(A, 1, 1) *= (c);
+   (A)->x *= (c); \
+   (A)->y *= (c); \
+   (A)->z *= (c); \
+   (A)->w *= (c);
 
 
 
@@ -130,20 +130,20 @@ typedef union
 
 
 #define matrixInverse(A, B, det) \
-   *det = ELEM(A, 0, 0) * ELEM(A, 1, 1) - ELEM(A, 0, 1) * ELEM(A, 1, 0); \
-   ELEM(B, 0, 0) = +ELEM(A, 1, 1) / (*det); \
-   ELEM(B, 0, 1) = -ELEM(A, 0, 1) / (*det); \
-   ELEM(B, 1, 0) = -ELEM(A, 1, 0) / (*det); \
-   ELEM(B, 1, 1) = +ELEM(A, 0, 0) / (*det);
+   *det = (A)->x * (A)->w - (A)->y * (A)->z; \
+   (B)->x = +(A)->w / (*det); \
+   (B)->y = -(A)->y / (*det); \
+   (B)->z = -(A)->z / (*det); \
+   (B)->w = +(A)->x / (*det);
 
 
 
 
 
 
-#define matrixProduct(A, x, b) \
-   (b)->s[0] = ELEM(A, 0, 0) * (x)->s[0] + ELEM(A, 0, 1) * (x)->s[1]; \
-   (b)->s[1] = ELEM(A, 1, 0) * (x)->s[0] + ELEM(A, 1, 1) * (x)->s[1];
+#define matrixProduct(A, x_, b) \
+   (b)->x = (A)->x * (x_)->x + (A)->y * (x_)->y; \
+   (b)->y = (A)->z * (x_)->x + (A)->w * (x_)->y;
 
 
 
@@ -151,7 +151,7 @@ typedef union
 
 
 #define matrixOuterProduct(a, b, C) \
-   ELEM(C, 0, 0) = (a)->s[0] * (b)->s[0]; \
-   ELEM(C, 0, 1) = (a)->s[0] * (b)->s[1]; \
-   ELEM(C, 1, 0) = (a)->s[1] * (b)->s[0]; \
-   ELEM(C, 1, 1) = (a)->s[1] * (b)->s[1];
+   (C)->x = (a)->x * (b)->x; \
+   (C)->y = (a)->x * (b)->y; \
+   (C)->z = (a)->y * (b)->x; \
+   (C)->w = (a)->y * (b)->y;
diff --git a/src/opencl/outlier.cl b/src/opencl/outlier.cl
index e8e29e6..761693b 100644
--- a/src/opencl/outlier.cl
+++ b/src/opencl/outlier.cl
@@ -6,69 +6,153 @@
 
 
 
-/**
- * Implementation of rand(), taken from POSIX example.
+/*!
+ * Remove outliers from a vector of pairwise data. Outliers are detected independently
+ * on each axis using the Tukey method, and marked with the given marker. Only the
+ * samples in the given cluster are used in outlier detection. For unclustered data,
+ * all samples are labeled as 0, so a cluster value of 0 should be used. The data
+ * array should only contain samples that have a non-negative label.
  *
- * @param state
- */
-int rand(ulong *state)
-{
-   *state = (*state) * 1103515245 + 12345;
-   return ((unsigned)((*state)/65536) % 32768);
-}
-
-
-
-
-
-/**
- * Remove outliers from a gene in a gene pair.
- *
- * @param X
- * @param N
- * @param j
+ * @param data
  * @param labels
+ * @param sampleSize
  * @param cluster
  * @param marker
+ * @param x_sorted
+ * @param y_sorted
  */
-void markOutliers(
-   __global const Vector2 *X, int N, int j,
-   __global char *labels, char cluster,
+int removeOutliersCluster(
+   __global Vector2 *data,
+   __global char *labels,
+   int sampleSize,
+   char cluster,
    char marker,
-   __global float *x_sorted)
+   __global float *x_sorted,
+   __global float *y_sorted)
 {
-   // compute x_sorted = X[:, j], filtered and sorted
+   // extract univariate data from the given cluster
    int n = 0;
 
-   for ( int i = 0; i < N; i++ )
+   for ( int i = 0, j = 0; i < sampleSize; i++ )
    {
-      if ( labels[i] == cluster || labels[i] == marker )
+      if ( labels[i] >= 0 )
       {
-         x_sorted[n] = X[i].s[j];
-         n++;
+         if ( labels[i] == cluster )
+         {
+            x_sorted[n] = data[j].x;
+            y_sorted[n] = data[j].y;
+            n++;
+         }
+
+         j++;
       }
    }
 
+   // return if the given cluster is empty
    if ( n == 0 )
    {
-      return;
+      return 0;
    }
 
+   // sort samples for each axis
    heapSort(x_sorted, n);
+   heapSort(y_sorted, n);
 
-   // compute quartiles, interquartile range, upper and lower bounds
-   float Q1 = x_sorted[n * 1 / 4];
-   float Q3 = x_sorted[n * 3 / 4];
+   // compute interquartile range and thresholds for each axis
+   float Q1_x = x_sorted[n * 1 / 4];
+   float Q3_x = x_sorted[n * 3 / 4];
+   float T_x_min = Q1_x - 1.5f * (Q3_x - Q1_x);
+   float T_x_max = Q3_x + 1.5f * (Q3_x - Q1_x);
 
-   float T_min = Q1 - 1.5f * (Q3 - Q1);
-   float T_max = Q3 + 1.5f * (Q3 - Q1);
+   float Q1_y = y_sorted[n * 1 / 4];
+   float Q3_y = y_sorted[n * 3 / 4];
+   float T_y_min = Q1_y - 1.5f * (Q3_y - Q1_y);
+   float T_y_max = Q3_y + 1.5f * (Q3_y - Q1_y);
 
-   // mark outliers
-   for ( int i = 0; i < N; ++i )
+   // remove outliers
+   int numSamples = 0;
+
+   for ( int i = 0, j = 0; i < sampleSize; i++ )
    {
-      if ( labels[i] == cluster && (X[i].s[j] < T_min || T_max < X[i].s[j]) )
+      if ( labels[i] >= 0 )
       {
-         labels[i] = marker;
+         // mark samples in the given cluster that are outliers on either axis
+         if ( labels[i] == cluster && (data[j].x < T_x_min || T_x_max < data[j].x || data[j].y < T_y_min || T_y_max < data[j].y) )
+         {
+            labels[i] = marker;
+         }
+
+         // preserve all other non-outlier samples in the data array
+         else
+         {
+            data[numSamples] = data[j];
+            numSamples++;
+         }
+
+         j++;
       }
    }
+
+   // return number of remaining samples
+   return numSamples;
+}
+
+
+
+
+
+
+/*!
+ * Perform outlier removal on each cluster in a parwise data array.
+ *
+ * @param globalWorkSize
+ * @param in_data
+ * @param in_N
+ * @param in_labels
+ * @param sampleSize
+ * @param in_K
+ * @param marker
+ */
+__kernel void removeOutliers(
+   int globalWorkSize,
+   __global Vector2 *in_data,
+   __global int *in_N,
+   __global char *in_labels,
+   int sampleSize,
+   __global char *in_K,
+   char marker,
+   __global float *work_x,
+   __global float *work_y)
+{
+   int i = get_global_id(0);
+
+   if ( i >= globalWorkSize )
+   {
+      return;
+   }
+
+   // initialize workspace variables
+   __global Vector2 *data = &in_data[i * sampleSize];
+   __global int *numSamples = &in_N[i];
+   __global char *labels = &in_labels[i * sampleSize];
+   char clusterSize = in_K[i];
+   __global float *x_sorted = &work_x[i * sampleSize];
+   __global float *y_sorted = &work_y[i * sampleSize];
+
+   if ( marker == -7 )
+   {
+      clusterSize = 1;
+   }
+
+   // do not perform post-clustering outlier removal if there is only one cluster
+   if ( marker == -8 && clusterSize <= 1 )
+   {
+      return;
+   }
+
+   // perform outlier removal on each cluster
+   for ( char k = 0; k < clusterSize; ++k )
+   {
+      *numSamples = removeOutliersCluster(data, labels, sampleSize, k, marker, x_sorted, y_sorted);
+   }
 }
diff --git a/src/opencl/pearson.cl b/src/opencl/pearson.cl
index 1da3b85..c08f143 100644
--- a/src/opencl/pearson.cl
+++ b/src/opencl/pearson.cl
@@ -4,9 +4,20 @@
 
 
 
+/*!
+ * Compute the Pearson correlation of a cluster in a pairwise data array. The
+ * data array should only contain samples that have a non-negative label.
+ *
+ * @param data
+ * @param labels
+ * @param sampleSize
+ * @param cluster
+ * @param minSamples
+ */
 float Pearson_computeCluster(
    __global const float2 *data,
-   __global const char *labels, int N,
+   __global const char *labels,
+   int sampleSize,
    char cluster,
    int minSamples)
 {
@@ -18,20 +29,25 @@ float Pearson_computeCluster(
    float sumy2 = 0;
    float sumxy = 0;
 
-   for ( int i = 0; i < N; ++i )
+   for ( int i = 0, j = 0; i < sampleSize; ++i )
    {
-      if ( labels[i] == cluster )
+      if ( labels[i] >= 0 )
       {
-         float x_i = data[i].x;
-         float y_i = data[i].y;
+         if ( labels[i] == cluster )
+         {
+            float x_i = data[j].x;
+            float y_i = data[j].y;
+
+            sumx += x_i;
+            sumy += y_i;
+            sumx2 += x_i * x_i;
+            sumy2 += y_i * y_i;
+            sumxy += x_i * y_i;
 
-         sumx += x_i;
-         sumy += y_i;
-         sumx2 += x_i * x_i;
-         sumy2 += y_i * y_i;
-         sumxy += x_i * y_i;
+            ++n;
+         }
 
-         ++n;
+         ++j;
       }
    }
 
@@ -51,7 +67,21 @@ float Pearson_computeCluster(
 
 
 
+/*!
+ * Compute the correlation of each cluster in a pairwise data array. The data array
+ * should only contain the clean samples that were extracted from the expression
+ * matrix, while the labels should contain all samples.
+ *
+ * @param globalWorkSize
+ * @param in_data
+ * @param clusterSize
+ * @param in_labels
+ * @param sampleSize
+ * @param minSamples
+ * @param out_correlations
+ */
 __kernel void Pearson_compute(
+   int globalWorkSize,
    __global const float2 *in_data,
    char clusterSize,
    __global const char *in_labels,
@@ -61,6 +91,12 @@ __kernel void Pearson_compute(
 {
    int i = get_global_id(0);
 
+   if ( i >= globalWorkSize )
+   {
+      return;
+   }
+
+   // initialize workspace variables
    __global const float2 *data = &in_data[i * sampleSize];
    __global const char *labels = &in_labels[i * sampleSize];
    __global float *correlations = &out_correlations[i * clusterSize];
diff --git a/src/opencl/sort.cl b/src/opencl/sort.cl
index 318906b..77584d7 100644
--- a/src/opencl/sort.cl
+++ b/src/opencl/sort.cl
@@ -4,7 +4,7 @@
 
 
 
-/**
+/*!
  * Swap two values
  *
  * @param a
@@ -22,7 +22,7 @@ void swapF(__global float* a, __global float* b)
 
 
 
-/**
+/*!
  * Swap two values
  *
  * @param a
@@ -92,7 +92,7 @@ void heapify(__global float *array, int n)
 
 
 
-/**
+/*!
  * Sort an array using heapsort.
  *
  * @param array
@@ -117,11 +117,10 @@ void heapSort(__global float *array, int n)
 
 
 
-/**
- * Sort a list using the bitonic algorithm. Additionally,
- * rearrange a second list with the same operations that are
- * done to the sorted list. The size of each list must be a
- * power of 2.
+/*!
+ * Sort a list using bitonic sort, while also applying the same swap operations
+ * to a second list of the same size. The lists should have a size which is a
+ * power of two.
  *
  * @param size
  * @param sortList
@@ -160,11 +159,10 @@ void bitonicSortFF(int size, __global float* sortList, __global float* extraList
 
 
 
-/**
- * Sort a list using the bitonic algorithm. Additionally,
- * rearrange a second list with the same operations that are
- * done to the sorted list. The size of each list must be a
- * power of 2.
+/*!
+ * Sort a list using bitonic sort, while also applying the same swap operations
+ * to a second list of the same size. The lists should have a size which is a
+ * power of two.
  *
  * @param size
  * @param sortList
diff --git a/src/opencl/spearman.cl b/src/opencl/spearman.cl
index 7bd05b2..15faf38 100644
--- a/src/opencl/spearman.cl
+++ b/src/opencl/spearman.cl
@@ -6,15 +6,20 @@
 
 
 
+/*!
+ * Compute the next power of 2 which occurs after a number.
+ *
+ * @param n
+ */
 int nextPower2(int n)
 {
-	int pow2 = 2;
-	while ( pow2 < n )
-	{
-		pow2 *= 2;
-	}
+   int pow2 = 2;
+   while ( pow2 < n )
+   {
+      pow2 *= 2;
+   }
 
-	return pow2;
+   return pow2;
 }
 
 
@@ -22,20 +27,34 @@ int nextPower2(int n)
 
 
 
+/*!
+ * Compute the Spearman correlation of a cluster in a pairwise data array. The
+ * data array should only contain samples that have a non-negative label.
+ *
+ * @param data
+ * @param labels
+ * @param sampleSize
+ * @param cluster
+ * @param minSamples
+ * @param x
+ * @param y
+ * @param rank
+ */
 float Spearman_computeCluster(
    __global const float2 *data,
-   __global const char *labels, int N,
+   __global const char *labels,
+   int sampleSize,
    char cluster,
    int minSamples,
    __global float *x,
    __global float *y,
    __global int *rank)
 {
-   // extract samples in gene pair cluster
-   int N_pow2 = nextPower2(N);
-	int n = 0;
+   // extract samples in pairwise cluster
+   int N_pow2 = nextPower2(sampleSize);
+   int n = 0;
 
-	for ( int i = 0, j = 0; i < N; ++i )
+   for ( int i = 0, j = 0; i < sampleSize; ++i )
    {
       if ( labels[i] >= 0 )
       {
@@ -43,7 +62,7 @@ float Spearman_computeCluster(
          {
             x[n] = data[j].x;
             y[n] = data[j].y;
-				rank[n] = n + 1;
+            rank[n] = n + 1;
             ++n;
          }
 
@@ -91,11 +110,25 @@ float Spearman_computeCluster(
 
 
 
+/*!
+ * Compute the correlation of each cluster in a pairwise data array. The data array
+ * should only contain the clean samples that were extracted from the expression
+ * matrix, while the labels should contain all samples.
+ *
+ * @param globalWorkSize
+ * @param in_data
+ * @param clusterSize
+ * @param in_labels
+ * @param sampleSize
+ * @param minSamples
+ * @param out_correlations
+ */
 __kernel void Spearman_compute(
+   int globalWorkSize,
    __global const float2 *in_data,
    char clusterSize,
    __global const char *in_labels,
-	int sampleSize,
+   int sampleSize,
    int minSamples,
    __global float *work_x,
    __global float *work_y,
@@ -103,8 +136,14 @@ __kernel void Spearman_compute(
    __global float *out_correlations)
 {
    int i = get_global_id(0);
-	int N_pow2 = nextPower2(sampleSize);
 
+   if ( i >= globalWorkSize )
+   {
+      return;
+   }
+
+   // initialize workspace variables
+   int N_pow2 = nextPower2(sampleSize);
    __global const float2 *data = &in_data[i * sampleSize];
    __global const char *labels = &in_labels[i * sampleSize];
    __global float *x = &work_x[i * N_pow2];
diff --git a/tests/main.cpp b/src/tests/main.cpp
similarity index 95%
rename from tests/main.cpp
rename to src/tests/main.cpp
index 6e979a8..cc70098 100644
--- a/tests/main.cpp
+++ b/src/tests/main.cpp
@@ -1,5 +1,5 @@
-#include "analyticfactory.h"
-#include "datafactory.h"
+#include "../core/analyticfactory.h"
+#include "../core/datafactory.h"
 #include "testclustermatrix.h"
 #include "testcorrelationmatrix.h"
 #include "testexportcorrelationmatrix.h"
diff --git a/tests/testclustermatrix.cpp b/src/tests/testclustermatrix.cpp
similarity index 93%
rename from tests/testclustermatrix.cpp
rename to src/tests/testclustermatrix.cpp
index f7513c3..7106370 100644
--- a/tests/testclustermatrix.cpp
+++ b/src/tests/testclustermatrix.cpp
@@ -2,8 +2,9 @@
 #include <ace/core/ace_dataobject.h>
 
 #include "testclustermatrix.h"
-#include "ccmatrix.h"
-#include "datafactory.h"
+#include "../core/ccmatrix.h"
+#include "../core/ccmatrix_pair.h"
+#include "../core/datafactory.h"
 
 
 
@@ -56,7 +57,7 @@ void TestClusterMatrix::test()
 	// create data object
 	QString path {QDir::tempPath() + "/test.ccm"};
 
-	std::unique_ptr<Ace::DataObject> dataRef {new Ace::DataObject(path, DataFactory::CCMatrixType, EMetadata(EMetadata::Object))};
+	std::unique_ptr<Ace::DataObject> dataRef {new Ace::DataObject(path, DataFactory::CCMatrixType, EMetaObject())};
 	CCMatrix* matrix {dataRef->data()->cast<CCMatrix>()};
 
 	// write data to file
diff --git a/tests/testclustermatrix.h b/src/tests/testclustermatrix.h
similarity index 88%
rename from tests/testclustermatrix.h
rename to src/tests/testclustermatrix.h
index 5168101..61d9d3c 100644
--- a/tests/testclustermatrix.h
+++ b/src/tests/testclustermatrix.h
@@ -2,7 +2,7 @@
 #define TESTCLUSTERMATRIX_H
 #include <QtTest/QtTest>
 
-#include "pairwise_index.h"
+#include "../core/pairwise_index.h"
 
 
 
diff --git a/tests/testcorrelationmatrix.cpp b/src/tests/testcorrelationmatrix.cpp
similarity index 91%
rename from tests/testcorrelationmatrix.cpp
rename to src/tests/testcorrelationmatrix.cpp
index 10e1f6e..4f5a1ae 100644
--- a/tests/testcorrelationmatrix.cpp
+++ b/src/tests/testcorrelationmatrix.cpp
@@ -2,8 +2,9 @@
 #include <ace/core/ace_dataobject.h>
 
 #include "testcorrelationmatrix.h"
-#include "correlationmatrix.h"
-#include "datafactory.h"
+#include "../core/correlationmatrix.h"
+#include "../core/correlationmatrix_pair.h"
+#include "../core/datafactory.h"
 
 
 
@@ -47,7 +48,7 @@ void TestCorrelationMatrix::test()
 	// create data object
 	QString path {QDir::tempPath() + "/test.cmx"};
 
-	std::unique_ptr<Ace::DataObject> dataRef {new Ace::DataObject(path, DataFactory::CorrelationMatrixType, EMetadata(EMetadata::Object))};
+	std::unique_ptr<Ace::DataObject> dataRef {new Ace::DataObject(path, DataFactory::CorrelationMatrixType, EMetaObject())};
 	CorrelationMatrix* matrix {dataRef->data()->cast<CorrelationMatrix>()};
 
 	// write data to file
diff --git a/tests/testcorrelationmatrix.h b/src/tests/testcorrelationmatrix.h
similarity index 88%
rename from tests/testcorrelationmatrix.h
rename to src/tests/testcorrelationmatrix.h
index 48e0d80..1caac53 100644
--- a/tests/testcorrelationmatrix.h
+++ b/src/tests/testcorrelationmatrix.h
@@ -2,7 +2,7 @@
 #define TESTCORRELATIONMATRIX_H
 #include <QtTest/QtTest>
 
-#include "pairwise_index.h"
+#include "../core/pairwise_index.h"
 
 
 
diff --git a/tests/testexportcorrelationmatrix.cpp b/src/tests/testexportcorrelationmatrix.cpp
similarity index 94%
rename from tests/testexportcorrelationmatrix.cpp
rename to src/tests/testexportcorrelationmatrix.cpp
index e82aacd..cee4337 100644
--- a/tests/testexportcorrelationmatrix.cpp
+++ b/src/tests/testexportcorrelationmatrix.cpp
@@ -3,9 +3,11 @@
 #include <ace/core/ace_dataobject.h>
 
 #include "testexportcorrelationmatrix.h"
-#include "analyticfactory.h"
-#include "datafactory.h"
-#include "exportcorrelationmatrix_input.h"
+#include "../core/analyticfactory.h"
+#include "../core/datafactory.h"
+#include "../core/exportcorrelationmatrix_input.h"
+#include "../core/ccmatrix_pair.h"
+#include "../core/correlationmatrix_pair.h"
 
 
 
@@ -167,7 +169,7 @@ void TestExportCorrelationMatrix::test()
 			{
 				for ( int i = 0; i < sampleMask.size(); ++i )
 				{
-					QCOMPARE(sampleMask[i].digitValue(), testPair.sampleMasks[k][i]);
+					QCOMPARE((qint8) sampleMask[i].digitValue(), testPair.sampleMasks[k][i]);
 				}
 			}
 
diff --git a/tests/testexportcorrelationmatrix.h b/src/tests/testexportcorrelationmatrix.h
similarity index 90%
rename from tests/testexportcorrelationmatrix.h
rename to src/tests/testexportcorrelationmatrix.h
index 9866ad1..23351cb 100644
--- a/tests/testexportcorrelationmatrix.h
+++ b/src/tests/testexportcorrelationmatrix.h
@@ -2,7 +2,7 @@
 #define TESTEXPORTCORRELATIONMATRIX_H
 #include <QtTest/QtTest>
 
-#include "pairwise_index.h"
+#include "../core/pairwise_index.h"
 
 
 
diff --git a/tests/testexportexpressionmatrix.cpp b/src/tests/testexportexpressionmatrix.cpp
similarity index 86%
rename from tests/testexportexpressionmatrix.cpp
rename to src/tests/testexportexpressionmatrix.cpp
index f96fafb..da43371 100644
--- a/tests/testexportexpressionmatrix.cpp
+++ b/src/tests/testexportexpressionmatrix.cpp
@@ -3,9 +3,10 @@
 #include <ace/core/ace_dataobject.h>
 
 #include "testexportexpressionmatrix.h"
-#include "analyticfactory.h"
-#include "datafactory.h"
-#include "exportexpressionmatrix_input.h"
+#include "../core/analyticfactory.h"
+#include "../core/datafactory.h"
+#include "../core/exportexpressionmatrix_input.h"
+#include "../core/expressionmatrix_gene.h"
 
 
 
@@ -24,7 +25,7 @@ void TestExportExpressionMatrix::test()
 	// create metadata
 	QStringList geneNames;
 	QStringList sampleNames;
-	QString noSampleToken {"NA"};
+	QString nanToken {"NA"};
 
 	for ( int i = 0; i < numGenes; ++i )
 	{
@@ -50,9 +51,9 @@ void TestExportExpressionMatrix::test()
 	matrix->initialize(geneNames, sampleNames);
 
 	ExpressionMatrix::Gene gene(matrix);
-	for ( int i = 0; i < matrix->getGeneSize(); ++i )
+	for ( int i = 0; i < matrix->geneSize(); ++i )
 	{
-		for ( int j = 0; j < matrix->getSampleSize(); ++j )
+		for ( int j = 0; j < matrix->sampleSize(); ++j )
 		{
 			gene[j] = testExpressions[i * numSamples + j];
 		}
@@ -60,8 +61,6 @@ void TestExportExpressionMatrix::test()
 		gene.write(i);
 	}
 
-	matrix->setTransform(ExpressionMatrix::Transform::None);
-
 	dataRef->data()->finish();
 	dataRef->finalize();
 
@@ -70,7 +69,7 @@ void TestExportExpressionMatrix::test()
 	auto manager = qobject_cast<Ace::Analytic::Single*>(abstractManager.release());
 	manager->set(ExportExpressionMatrix::Input::InputData, emxPath);
 	manager->set(ExportExpressionMatrix::Input::OutputFile, txtPath);
-	manager->set(ExportExpressionMatrix::Input::NoSampleToken, noSampleToken);
+	manager->set(ExportExpressionMatrix::Input::NANToken, nanToken);
 
 	// run analytic
 	manager->initialize();
@@ -101,7 +100,7 @@ void TestExportExpressionMatrix::test()
 
 			for ( int j = 1; j < words.size(); ++j )
 			{
-				if ( words.at(j) == noSampleToken )
+				if ( words.at(j) == nanToken )
 				{
 					expressions[(i - 1) * numSamples + (j - 1)] = NAN;
 				}
diff --git a/tests/testexportexpressionmatrix.h b/src/tests/testexportexpressionmatrix.h
similarity index 100%
rename from tests/testexportexpressionmatrix.h
rename to src/tests/testexportexpressionmatrix.h
diff --git a/tests/testexpressionmatrix.cpp b/src/tests/testexpressionmatrix.cpp
similarity index 72%
rename from tests/testexpressionmatrix.cpp
rename to src/tests/testexpressionmatrix.cpp
index 411d174..964ea41 100644
--- a/tests/testexpressionmatrix.cpp
+++ b/src/tests/testexpressionmatrix.cpp
@@ -2,8 +2,9 @@
 #include <ace/core/ace_dataobject.h>
 
 #include "testexpressionmatrix.h"
-#include "datafactory.h"
-#include "expressionmatrix.h"
+#include "../core/datafactory.h"
+#include "../core/expressionmatrix.h"
+#include "../core/expressionmatrix_gene.h"
 
 
 
@@ -21,12 +22,13 @@ void TestExpressionMatrix::test()
 
 	// create metadata
 	QStringList geneNames;
+	QStringList sampleNames;
+
 	for ( int i = 0; i < numGenes; ++i )
 	{
 		geneNames.append(QString::number(i));
 	}
 
-	QStringList sampleNames;
 	for ( int i = 0; i < numSamples; ++i )
 	{
 		sampleNames.append(QString::number(i));
@@ -35,16 +37,16 @@ void TestExpressionMatrix::test()
 	// create data object
 	QString path {QDir::tempPath() + "/test.emx"};
 
-	std::unique_ptr<Ace::DataObject> dataRef {new Ace::DataObject(path, DataFactory::ExpressionMatrixType, EMetadata(EMetadata::Object))};
+	std::unique_ptr<Ace::DataObject> dataRef {new Ace::DataObject(path, DataFactory::ExpressionMatrixType, EMetaObject())};
 	ExpressionMatrix* matrix {dataRef->data()->cast<ExpressionMatrix>()};
 
 	// write data to file
 	matrix->initialize(geneNames, sampleNames);
 
 	ExpressionMatrix::Gene gene(matrix);
-	for ( int i = 0; i < matrix->getGeneSize(); ++i )
+	for ( int i = 0; i < matrix->geneSize(); ++i )
 	{
-		for ( int j = 0; j < matrix->getSampleSize(); ++j )
+		for ( int j = 0; j < matrix->sampleSize(); ++j )
 		{
 			gene[j] = testExpressions[i * numSamples + j];
 		}
@@ -55,8 +57,8 @@ void TestExpressionMatrix::test()
 	matrix->finish();
 
 	// read expression data from file
-	std::unique_ptr<float> expressions {matrix->dumpRawData()};
+	QVector<float> expressions {matrix->dumpRawData()};
 
 	// verify expression data
-	QVERIFY(!memcmp(testExpressions.data(), expressions.get(), testExpressions.size() * sizeof(float)));
+	QVERIFY(!memcmp(testExpressions.data(), expressions.data(), testExpressions.size() * sizeof(float)));
 }
diff --git a/tests/testexpressionmatrix.h b/src/tests/testexpressionmatrix.h
similarity index 100%
rename from tests/testexpressionmatrix.h
rename to src/tests/testexpressionmatrix.h
diff --git a/tests/testimportcorrelationmatrix.cpp b/src/tests/testimportcorrelationmatrix.cpp
similarity index 96%
rename from tests/testimportcorrelationmatrix.cpp
rename to src/tests/testimportcorrelationmatrix.cpp
index 5ace99b..bdab41d 100644
--- a/tests/testimportcorrelationmatrix.cpp
+++ b/src/tests/testimportcorrelationmatrix.cpp
@@ -3,9 +3,9 @@
 #include <ace/core/ace_dataobject.h>
 
 #include "testimportcorrelationmatrix.h"
-#include "analyticfactory.h"
-#include "datafactory.h"
-#include "importcorrelationmatrix_input.h"
+#include "../core/analyticfactory.h"
+#include "../core/datafactory.h"
+#include "../core/importcorrelationmatrix_input.h"
 
 
 
diff --git a/tests/testimportcorrelationmatrix.h b/src/tests/testimportcorrelationmatrix.h
similarity index 90%
rename from tests/testimportcorrelationmatrix.h
rename to src/tests/testimportcorrelationmatrix.h
index af0b47f..59851a6 100644
--- a/tests/testimportcorrelationmatrix.h
+++ b/src/tests/testimportcorrelationmatrix.h
@@ -2,7 +2,7 @@
 #define TESTIMPORTCORRELATIONMATRIX_H
 #include <QtTest/QtTest>
 
-#include "pairwise_index.h"
+#include "../core/pairwise_index.h"
 
 
 
diff --git a/tests/testimportexpressionmatrix.cpp b/src/tests/testimportexpressionmatrix.cpp
similarity index 86%
rename from tests/testimportexpressionmatrix.cpp
rename to src/tests/testimportexpressionmatrix.cpp
index 5d2c166..f50a2bd 100644
--- a/tests/testimportexpressionmatrix.cpp
+++ b/src/tests/testimportexpressionmatrix.cpp
@@ -3,9 +3,9 @@
 #include <ace/core/ace_dataobject.h>
 
 #include "testimportexpressionmatrix.h"
-#include "analyticfactory.h"
-#include "datafactory.h"
-#include "importexpressionmatrix_input.h"
+#include "../core/analyticfactory.h"
+#include "../core/datafactory.h"
+#include "../core/importexpressionmatrix_input.h"
 
 
 
@@ -24,7 +24,7 @@ void TestImportExpressionMatrix::test()
 	// create metadata
 	QStringList geneNames;
 	QStringList sampleNames;
-	QString noSampleToken {"NA"};
+	QString nanToken {"NA"};
 
 	for ( int i = 0; i < numGenes; ++i )
 	{
@@ -66,7 +66,7 @@ void TestImportExpressionMatrix::test()
 
 			if ( std::isnan(value) )
 			{
-				stream << "\t" << noSampleToken;
+				stream << "\t" << nanToken;
 			}
 			else
 			{
@@ -84,7 +84,7 @@ void TestImportExpressionMatrix::test()
 	auto manager = qobject_cast<Ace::Analytic::Single*>(abstractManager.release());
 	manager->set(ImportExpressionMatrix::Input::InputFile, txtPath);
 	manager->set(ImportExpressionMatrix::Input::OutputData, emxPath);
-	manager->set(ImportExpressionMatrix::Input::NoSampleToken, noSampleToken);
+	manager->set(ImportExpressionMatrix::Input::NANToken, nanToken);
 
 	// run analytic
 	manager->initialize();
@@ -94,14 +94,14 @@ void TestImportExpressionMatrix::test()
 	// read expression data from file
 	std::unique_ptr<Ace::DataObject> dataRef {new Ace::DataObject(emxPath)};
 	ExpressionMatrix* matrix {dataRef->data()->cast<ExpressionMatrix>()};
-	std::unique_ptr<float> expressions {matrix->dumpRawData()};
+	QVector<float> expressions {matrix->dumpRawData()};
 
 	// verify expression data
 	float error = 0;
 
 	for ( int i = 0; i < testExpressions.size(); ++i )
 	{
-		error += fabs(testExpressions[i] - expressions.get()[i]);
+		error += fabs(testExpressions[i] - expressions[i]);
 	}
 
 	error /= testExpressions.size();
diff --git a/tests/testimportexpressionmatrix.h b/src/tests/testimportexpressionmatrix.h
similarity index 100%
rename from tests/testimportexpressionmatrix.h
rename to src/tests/testimportexpressionmatrix.h
diff --git a/tests/testrmt.cpp b/src/tests/testrmt.cpp
similarity index 91%
rename from tests/testrmt.cpp
rename to src/tests/testrmt.cpp
index b6cbc67..0809fb6 100644
--- a/tests/testrmt.cpp
+++ b/src/tests/testrmt.cpp
@@ -3,10 +3,11 @@
 #include <ace/core/ace_dataobject.h>
 
 #include "testrmt.h"
-#include "analyticfactory.h"
-#include "datafactory.h"
-#include "rmt_input.h"
-#include "correlationmatrix.h"
+#include "../core/analyticfactory.h"
+#include "../core/datafactory.h"
+#include "../core/rmt_input.h"
+#include "../core/correlationmatrix.h"
+#include "../core/correlationmatrix_pair.h"
 
 
 
diff --git a/tests/testrmt.h b/src/tests/testrmt.h
similarity index 86%
rename from tests/testrmt.h
rename to src/tests/testrmt.h
index f9ac81c..e85e020 100644
--- a/tests/testrmt.h
+++ b/src/tests/testrmt.h
@@ -2,7 +2,7 @@
 #define TESTRMT_H
 #include <QtTest/QtTest>
 
-#include "pairwise_index.h"
+#include "../core/pairwise_index.h"
 
 
 
diff --git a/src/tests/tests.pro b/src/tests/tests.pro
new file mode 100644
index 0000000..b725070
--- /dev/null
+++ b/src/tests/tests.pro
@@ -0,0 +1,39 @@
+
+# Include common settings
+include (../KINC.pri)
+
+# Basic settings
+QT += testlib
+TARGET = kinc-tests
+TEMPLATE = app
+CONFIG += debug
+
+# Source files
+SOURCES += \
+	testclustermatrix.cpp \
+	testcorrelationmatrix.cpp \
+	testexportcorrelationmatrix.cpp \
+	testexportexpressionmatrix.cpp \
+	testexpressionmatrix.cpp \
+	testimportcorrelationmatrix.cpp \
+	testimportexpressionmatrix.cpp \
+	testrmt.cpp \
+	testsimilarity.cpp \
+	main.cpp
+
+HEADERS += \
+	testclustermatrix.h \
+	testcorrelationmatrix.h \
+	testexportcorrelationmatrix.h \
+	testexportexpressionmatrix.h \
+	testexpressionmatrix.h \
+	testimportcorrelationmatrix.h \
+	testimportexpressionmatrix.h \
+	testrmt.h \
+	testsimilarity.h
+
+# Installation instructions
+isEmpty(PREFIX) { PREFIX = /usr/local }
+program.path = $${PREFIX}/bin
+program.files = $${PWD}/../../build/tests/$${TARGET}
+INSTALLS += program
diff --git a/tests/testsimilarity.cpp b/src/tests/testsimilarity.cpp
similarity index 88%
rename from tests/testsimilarity.cpp
rename to src/tests/testsimilarity.cpp
index dda376b..beee434 100644
--- a/tests/testsimilarity.cpp
+++ b/src/tests/testsimilarity.cpp
@@ -3,9 +3,10 @@
 #include <ace/core/ace_dataobject.h>
 
 #include "testsimilarity.h"
-#include "analyticfactory.h"
-#include "datafactory.h"
-#include "similarity_input.h"
+#include "../core/analyticfactory.h"
+#include "../core/datafactory.h"
+#include "../core/similarity_input.h"
+#include "../core/expressionmatrix_gene.h"
 
 
 
@@ -24,7 +25,6 @@ void TestSimilarity::test()
 	// create metadata
 	QStringList geneNames;
 	QStringList sampleNames;
-	QString noSampleToken {"NA"};
 
 	for ( int i = 0; i < numGenes; ++i )
 	{
@@ -52,9 +52,9 @@ void TestSimilarity::test()
 	emx->initialize(geneNames, sampleNames);
 
 	ExpressionMatrix::Gene gene(emx);
-	for ( int i = 0; i < emx->getGeneSize(); ++i )
+	for ( int i = 0; i < emx->geneSize(); ++i )
 	{
-		for ( int j = 0; j < emx->getSampleSize(); ++j )
+		for ( int j = 0; j < emx->sampleSize(); ++j )
 		{
 			gene[j] = testExpressions[i * numSamples + j];
 		}
@@ -62,8 +62,6 @@ void TestSimilarity::test()
 		gene.write(i);
 	}
 
-	emx->setTransform(ExpressionMatrix::Transform::None);
-
 	emxDataRef->data()->finish();
 	emxDataRef->finalize();
 
diff --git a/tests/testsimilarity.h b/src/tests/testsimilarity.h
similarity index 89%
rename from tests/testsimilarity.h
rename to src/tests/testsimilarity.h
index 74137b9..9b531c4 100644
--- a/tests/testsimilarity.h
+++ b/src/tests/testsimilarity.h
@@ -2,7 +2,7 @@
 #define TESTSIMILARITY_H
 #include <QtTest/QtTest>
 
-#include "pairwise_index.h"
+#include "../core/pairwise_index.h"
 
 
 
diff --git a/tests/tests.pro b/tests/tests.pro
deleted file mode 100644
index 7d22829..0000000
--- a/tests/tests.pro
+++ /dev/null
@@ -1,118 +0,0 @@
-# General build variables
-TARGET = tests
-TEMPLATE = app
-CONFIG += c++11 debug
-
-# Qt libraries
-QT += core testlib
-
-# external libraries
-LIBS += -lOpenCL -L/usr/local/lib64/ -L$$(HOME)/software/lib -lacecore -lgsl -lgslcblas -llapack -llapacke
-INCLUDEPATH += $$(HOME)/software/include
-INCLUDEPATH += ../src
-
-# HACK
-INCLUDEPATH += $$(HOME)/software/include/ace
-
-# Preprocessor defines
-DEFINES += QT_DEPRECATED_WARNINGS
-
-# Source files
-SOURCES += \
-	../src/analyticfactory.cpp \
-	../src/ccmatrix.cpp \
-	../src/correlationmatrix.cpp \
-	../src/datafactory.cpp \
-	../src/exportcorrelationmatrix_input.cpp \
-	../src/exportcorrelationmatrix.cpp \
-	../src/exportexpressionmatrix_input.cpp \
-	../src/exportexpressionmatrix.cpp \
-	../src/expressionmatrix.cpp \
-	../src/extract_input.cpp \
-	../src/extract.cpp \
-	../src/importcorrelationmatrix_input.cpp \
-	../src/importcorrelationmatrix.cpp \
-	../src/importexpressionmatrix_input.cpp \
-	../src/importexpressionmatrix.cpp \
-	../src/pairwise_clustering.cpp \
-	../src/pairwise_correlation.cpp \
-	../src/pairwise_gmm.cpp \
-	../src/pairwise_index.cpp \
-	../src/pairwise_kmeans.cpp \
-	../src/pairwise_linalg.cpp \
-	../src/pairwise_matrix.cpp \
-	../src/pairwise_pearson.cpp \
-	../src/pairwise_spearman.cpp \
-	../src/rmt_input.cpp \
-	../src/rmt.cpp \
-	../src/similarity_input.cpp \
-	../src/similarity_opencl_fetchpair.cpp \
-   ../src/similarity_opencl_gmm.cpp \
-   ../src/similarity_opencl_kmeans.cpp \
-   ../src/similarity_opencl_pearson.cpp \
-   ../src/similarity_opencl_spearman.cpp \
-   ../src/similarity_opencl_worker.cpp \
-   ../src/similarity_opencl.cpp \
-	../src/similarity_resultblock.cpp \
-	../src/similarity_serial.cpp \
-	../src/similarity_workblock.cpp \
-	../src/similarity.cpp \
-	testclustermatrix.cpp \
-	testcorrelationmatrix.cpp \
-	testexportcorrelationmatrix.cpp \
-	testexportexpressionmatrix.cpp \
-	testexpressionmatrix.cpp \
-	testimportcorrelationmatrix.cpp \
-	testimportexpressionmatrix.cpp \
-	testrmt.cpp \
-	testsimilarity.cpp \
-	main.cpp
-
-HEADERS += \
-	../src/analyticfactory.h \
-	../src/ccmatrix.h \
-	../src/correlationmatrix.h \
-	../src/datafactory.h \
-	../src/expressionmatrix.h \
-	../src/extract_input.h \
-	../src/extract.h \
-	../src/exportcorrelationmatrix_input.h \
-	../src/exportcorrelationmatrix.h \
-	../src/exportexpressionmatrix_input.h \
-	../src/exportexpressionmatrix.h \
-	../src/importcorrelationmatrix_input.h \
-	../src/importcorrelationmatrix.h \
-	../src/importexpressionmatrix_input.h \
-	../src/importexpressionmatrix.h \
-	../src/pairwise_clustering.h \
-	../src/pairwise_correlation.h \
-	../src/pairwise_gmm.h \
-	../src/pairwise_index.h \
-	../src/pairwise_kmeans.h \
-	../src/pairwise_linalg.h \
-	../src/pairwise_matrix.h \
-	../src/pairwise_pearson.h \
-	../src/pairwise_spearman.h \
-	../src/rmt_input.h \
-	../src/rmt.h \
-	../src/similarity_input.h \
-	../src/similarity_opencl_fetchpair.h \
-   ../src/similarity_opencl_gmm.h \
-   ../src/similarity_opencl_kmeans.h \
-   ../src/similarity_opencl_pearson.h \
-   ../src/similarity_opencl_spearman.h \
-   ../src/similarity_opencl_worker.h \
-   ../src/similarity_opencl.h \
-	../src/similarity_resultblock.h \
-	../src/similarity_serial.h \
-	../src/similarity_workblock.h \
-	../src/similarity.h \
-	testclustermatrix.h \
-	testcorrelationmatrix.h \
-	testexportcorrelationmatrix.h \
-	testexportexpressionmatrix.h \
-	testexpressionmatrix.h \
-	testimportcorrelationmatrix.h \
-	testimportexpressionmatrix.h \
-	testrmt.h \
-	testsimilarity.h