Skip to content

pyNBS.pyNBS_core.mixed_netNMF_debug

Justin Huang edited this page Jan 26, 2018 · 3 revisions

This function is essentially the same function as the mixed_netNMF function, but due to the complexity of the mixed_netNMF function, we have provided this function as a bonus function in the module to allow users to better dissect the internal workings of the mixed_netNMF function. This function has additional input options as well as additional outputs than the base mixed_netNMF function (described below). For a description of the actual network-regularized NMF function, please refer to the mixed_netNMF function documentation.

Function Call:

mixed_netNMF_debug(data, KNN_glap, W_init=None, H_init=None, k=3, gamma=200, maxiter=250, eps=1e-15, err_tol=1e-4, err_delta_tol=1e-4, verbose=False)

Parameters:

  • data (required, numpy.ndarray): Transposed binary somatic matrix loaded from file. This is a matrix of the binary somatic mutation profiles of the cohort to perform pyNBS on. However, for this function, the input data matrix will be transposed to better align with the objective function described above. Therefore, in this case the rows of data are genes of the patient profiles and the columns of data are patients/samples (transposed vs the output of load_binary_mutation_data. The rows of this matrix must be the same order as the rows/columns of KNN_glap (see below).
  • KNN_glap (required, numpy.ndarray): This is the numpy array of the graph laplacian (gene-by-gene) of KNN influence network constructed by the network_inf_KNN_glap function. This is the regularization (L) matrix for the netNMF step. The rows and columns of this array must match the same order of genes as the data array.
  • W_init (required, numpy.ndarray, default=None): This is an optional genes-by-k array that can be used as the initial W basis factor matrix. If W_init is not given, a W factor matrix will be generated from the initial H factor matrix. W_init is given if the user wants to better control the stochastic output of the gradient decent of the netNMF procedure.
  • H_init (required, numpy.ndarray, default=None): This is an optional k-by-patient/samples array that can be used as the initial H patient/samples factor matrix. If H_init is not given, a random H factor matrix (of the correct dimensions) will be generated. H_init is given if the user wants to better control the stochastic output of the gradient decent of the netNMF procedure.
  • k (optional, int, default=3): Number of components to decompose patient mutation data into during the netNMF. This is also the same as the number of clusters of patients to separate data into.
  • l (optional, float, default=200): This is the regularization constant (λ) to scale the network regularizer (KNN_glap) matrix. The value value must be able to be converted to a Python int and the default value of this parameter is 200. We have found that larger positive integers for this value produce better, and more robust results. We suggest using a value between 100-1000 for this parameter. Setting this value to 0 will perform netNMF with no network regularization penalty (similar to a non-network-regularized NMF).
  • maxiter (optional, int, default=250): Maximum number of update steps to perform during this function if the result does not reach convergence by a different method.
  • eps (optional, float, default=1e-15): Epsilon error value to adjust 0 (or very small) values during multiplicative matrix updates in netNMF. Essentially this is a parameter to define the machine precision for the netNMF step.
  • err_tol (optional, float, default=1e-4): This is the minimum error tolerance for matrix reconstruction of original data for this function to reach convergence. If the decomposition has reached a sufficiently close estimation of data, the function will return the H factor matrix from that decomposition at that time. * err_delta_tol (optional, float, default=1e-8): This is the minimum error tolerance for the L2 norm of difference in matrix reconstructions between iterations of netNMF for convergence. If the reconstruction error of the decomposition is not improving significantly, the function will return the H factor matrix from the decomposition at that time. * verbose (optional, bool, default=False): Verbosity flag for determining whether or not to have the netNMF function report intermediate progress at each iteration.

Returns:

  • W (numpy.ndarray): The converged (genes-by-k) array of the basis factor matrix from this function.
  • H (numpy.ndarray): The converged (k-by-patients) array of the basis factor matrix from this function. Multiple instances of this H matrix will be combined together during the consensus clustering step of the algorithm by the consensus_hclsut_hard function.
  • numIter (int): The number of update steps performed by this function before until the result converges.
  • finalResidual (float): The residual reconstruction error of the two factor matrices as compared to the original data at convergence of this function. This is the following L2 norm: recon_err.
  • resVal (list): A vector of the reconstruction error (L2 norm as above: recon_err) at the end of each update step performed.
  • fitResVect (list): A vector of the difference in reconstruction error at the end of each update step performed. Can also be thought of as a vector of the differences between values in resVal.
  • Wlist (list): A list of of the W basis factor matrix (genes-by-k) at the end of each update step.
  • Hlist (list): A list of of the H patient/sample factor matrix (k-by-patients/samples) at the end of each update step.
  • timestep (list): A vector of the time (in seconds) elapsed to perform each update step in the function.

Clone this wiki locally