-
Notifications
You must be signed in to change notification settings - Fork 22
pyNBS.pyNBS_core.mixed_netNMF_debug
Justin Huang edited this page Jan 26, 2018
·
3 revisions
This function is essentially the same function as the mixed_netNMF function, but due to the complexity of the mixed_netNMF function, we have provided this function as a bonus function in the module to allow users to better dissect the internal workings of the mixed_netNMF function. This function has additional input options as well as additional outputs than the base mixed_netNMF function (described below). For a description of the actual network-regularized NMF function, please refer to the mixed_netNMF function documentation.
mixed_netNMF_debug(
data, KNN_glap, W_init=None, H_init=None, k=3, gamma=200, maxiter=250, eps=1e-15, err_tol=1e-4, err_delta_tol=1e-4, verbose=False
)
-
data (required, numpy.ndarray): Transposed binary somatic matrix loaded from file. This is a matrix of the binary somatic mutation profiles of the cohort to perform pyNBS on. However, for this function, the input data matrix will be transposed to better align with the objective function described above. Therefore, in this case the rows of
data
are genes of the patient profiles and the columns ofdata
are patients/samples (transposed vs the output ofload_binary_mutation_data
. The rows of this matrix must be the same order as the rows/columns ofKNN_glap
(see below). -
KNN_glap (required, numpy.ndarray): This is the numpy array of the graph laplacian (gene-by-gene) of KNN influence network constructed by the
network_inf_KNN_glap
function. This is the regularization (L) matrix for thenetNMF
step. The rows and columns of this array must match the same order of genes as thedata
array. -
W_init (required, numpy.ndarray, default=None): This is an optional genes-by-k array that can be used as the initial W basis factor matrix. If
W_init
is not given, a W factor matrix will be generated from the initial H factor matrix.W_init
is given if the user wants to better control the stochastic output of the gradient decent of the netNMF procedure. -
H_init (required, numpy.ndarray, default=None): This is an optional k-by-patient/samples array that can be used as the initial H patient/samples factor matrix. If
H_init
is not given, a random H factor matrix (of the correct dimensions) will be generated.H_init
is given if the user wants to better control the stochastic output of the gradient decent of the netNMF procedure. - k (optional, int, default=3): Number of components to decompose patient mutation data into during the netNMF. This is also the same as the number of clusters of patients to separate data into.
-
l (optional, float, default=200): This is the regularization constant (λ) to scale the network regularizer (
KNN_glap
) matrix. The value value must be able to be converted to a Python int and the default value of this parameter is200
. We have found that larger positive integers for this value produce better, and more robust results. We suggest using a value between 100-1000 for this parameter. Setting this value to0
will perform netNMF with no network regularization penalty (similar to a non-network-regularized NMF). - maxiter (optional, int, default=250): Maximum number of update steps to perform during this function if the result does not reach convergence by a different method.
- eps (optional, float, default=1e-15): Epsilon error value to adjust 0 (or very small) values during multiplicative matrix updates in netNMF. Essentially this is a parameter to define the machine precision for the netNMF step.
-
err_tol (optional, float, default=1e-4): This is the minimum error tolerance for matrix reconstruction of original data for this function to reach convergence. If the decomposition has reached a sufficiently close estimation of
data
, the function will return the H factor matrix from that decomposition at that time. * err_delta_tol (optional, float, default=1e-8): This is the minimum error tolerance for the L2 norm of difference in matrix reconstructions between iterations of netNMF for convergence. If the reconstruction error of the decomposition is not improving significantly, the function will return the H factor matrix from the decomposition at that time. * verbose (optional, bool, default=False): Verbosity flag for determining whether or not to have the netNMF function report intermediate progress at each iteration.
- W (numpy.ndarray): The converged (genes-by-k) array of the basis factor matrix from this function.
-
H (numpy.ndarray): The converged (k-by-patients) array of the basis factor matrix from this function. Multiple instances of this H matrix will be combined together during the consensus clustering step of the algorithm by the
consensus_hclsut_hard
function. - numIter (int): The number of update steps performed by this function before until the result converges.
-
finalResidual (float): The residual reconstruction error of the two factor matrices as compared to the original data at convergence of this function. This is the following L2 norm:
.
-
resVal (list): A vector of the reconstruction error (L2 norm as above:
) at the end of each update step performed.
-
fitResVect (list): A vector of the difference in reconstruction error at the end of each update step performed. Can also be thought of as a vector of the differences between values in
resVal
. - Wlist (list): A list of of the W basis factor matrix (genes-by-k) at the end of each update step.
- Hlist (list): A list of of the H patient/sample factor matrix (k-by-patients/samples) at the end of each update step.
- timestep (list): A vector of the time (in seconds) elapsed to perform each update step in the function.