Skip to content

Commit 2d9be21

Browse files
Keefe-MurphyKeefe-Murphy
authored andcommitted
MoE_mahala arg. "identity" now also governs univariate data. Default of FALSE for multivariate data and TRUE for univariate data retains the old behaviour. Prepared CRAN release.
1 parent d3142ae commit 2d9be21

File tree

7 files changed

+30
-22
lines changed

7 files changed

+30
-22
lines changed

DESCRIPTION

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
Package: MoEClust
22
Type: Package
3-
Date: 2021-10-12
3+
Date: 2021-12-19
44
Title: Gaussian Parsimonious Clustering Models with Covariates and a Noise Component
5-
Version: 1.4.1
5+
Version: 1.4.2
66
Authors@R: c(person("Keefe", "Murphy", email = "keefe.murphy@mu.ie", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-7709-3159")),
77
person("Thomas Brendan", "Murphy", email = "brendan.murphy@ucd.ie", role = "ctb", comment = c(ORCID = "0000-0002-5668-7046")))
88
Description: Clustering via parsimonious Gaussian Mixtures of Experts using the MoEClust models introduced by Murphy and Murphy (2020) <doi:10.1007/s11634-019-00373-8>. This package fits finite Gaussian mixture models with a formula interface for supplying gating and/or expert network covariates using a range of parsimonious covariance parameterisations from the GPCM family via the EM/CEM algorithm. Visualisation of the results of such models using generalised pairs plots and the inclusion of an additional noise component is also facilitated. A greedy forward stepwise search algorithm is provided for identifying the optimal model in terms of the number of components, the GPCM covariance parameterisation, and the subsets of gating/expert network covariates.

R/Functions.R

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -297,6 +297,7 @@
297297
}
298298
sq_maha <- !uni
299299
low.dim <- !uni && low.dim
300+
Identity <- ifelse(is.null(Identity), isTRUE(uni), Identity)
300301
x.names <- colnames(X)
301302
if(!multi) {
302303
mNs <- toupper(modelNames)
@@ -1729,7 +1730,7 @@
17291730
#'
17301731
#' Unless \code{init.z="list"}, supplying this argument as \code{TRUE} when the \code{\link[clustMD]{clustMD}} library is loaded has the effect of superseding the \code{init.z} argument: this argument now governs instead how the call to \code{\link[clustMD]{clustMD}} is initialised (unless all \code{\link[clustMD]{clustMD}} model types fail for a given number of components, in which case \code{init.z} is invoked \emph{instead} to initialise for \code{G} values for which all \code{\link[clustMD]{clustMD}} model types failed). Similarly, the arguments \code{hc.args} and \code{km.args} will be ignored (again, unless all \code{\link[clustMD]{clustMD}} model types fail for a given number of components).}
17311732
#' \item{\code{max.init}}{The maximum number of iterations for the Mahalanobis distance-based reallocation procedure when \code{exp.init$mahalanobis} is \code{TRUE}. Defaults to \code{.Machine$integer.max}.}
1732-
#' \item{\code{identity}}{A logical indicating whether the identity matrix (corresponding to the use of the Euclidean distance) is used in place of the covariance matrix of the residuals (corresponding to the use of the Mahalanobis distance). Defaults to \code{FALSE}; only relevant for multivariate response data.}
1733+
#' \item{\code{identity}}{A logical indicating whether the identity matrix (corresponding to the use of the Euclidean distance) is used in place of the covariance matrix of the residuals (corresponding to the use of the Mahalanobis distance). Defaults to \code{FALSE} for multivariate response data but defaults to \code{TRUE} for univariate response data. Setting \code{identity=FALSE} with multivariate data may be advisable when the dimensions of the data are such that the covariance matrix cannot be inverted (otherwise, the pseudo-inverse is used when \code{TRUE}).}
17331734
#' \item{\code{drop.break}}{When \code{isTRUE(exp.init$mahalanobis)} observations will be completely in or out of a component during the initialisation phase. As such, it may occur that constant columns will be present when building a given component's expert regression (particularly for categorical covariates). It may also occur, due to this partitioning, that "unseen" data, when calculating the residuals, will have new factor levels. When \code{isTRUE(exp.init$drop.break)}, the Mahalanobis distance based initialisation phase will explicitly fail in either of these scenarios.
17341735
#'
17351736
#' Otherwise, \code{\link{drop_constants}} and \code{\link{drop_levels}} will be invoked when \code{exp.init$drop.break} is \code{FALSE} (the default) to \emph{try} to remedy the situation. In any case, only a warning that the initialisation step failed will be printed, regardless of the value of \code{exp.init$drop.break}.}
@@ -1896,10 +1897,9 @@
18961897
exp.init$estart <- FALSE
18971898
} else if(length(exp.init$estart) > 1 ||
18981899
!is.logical(exp.init$estart)) stop("'exp.init$estart' must be a single logical indicator", call.=FALSE)
1899-
if(is.null(exp.init$identity)) {
1900-
exp.init$identity <- FALSE
1901-
} else if(length(exp.init$identity) > 1 ||
1902-
!is.logical(exp.init$identity)) stop("'exp.init$identity' must be a single logical indicator", call.=FALSE)
1900+
if(!is.null(exp.init$identity) &&
1901+
(length(exp.init$identity) > 1 ||
1902+
!is.logical(exp.init$identity))) stop("'exp.init$identity' must be a single logical indicator", call.=FALSE)
19031903
if(is.null(exp.init$max.init)) {
19041904
exp.init$max.init <- .Machine$integer.max
19051905
} else if(isTRUE(exp.init$mahalanobis) &&
@@ -3878,7 +3878,7 @@ predict.MoEClust <- function(object, newdata = list(...), resid = FALSE, discar
38783878
#' @param fit A fitted \code{\link[stats]{lm}} model, inheriting either the \code{"mlm"} or \code{"lm"} class.
38793879
#' @param resids The residuals. Can be residuals for observations included in the model, or residuals arising from predictions on unseen data. Must be coercible to a matrix with the number of columns being the number of response variables. Missing values are not allowed.
38803880
#' @param squared A logical. By default (\code{FALSE}), the generalized interpoint distance is computed. Set this flag to \code{TRUE} for the squared value.
3881-
#' @param identity A logical indicating whether the identity matrix is used in in place of the precision matrix in the Mahalanobis distance calculation. Defaults to \code{FALSE}; \code{TRUE} corresponds to the use of the Euclidean distance. Only relevant for multivariate response data.
3881+
#' @param identity A logical indicating whether the identity matrix is used in place of the precision matrix in the Mahalanobis distance calculation. Defaults to \code{FALSE} for multivariate response data but defaults to \code{TRUE} for univariate response data, where \code{TRUE} corresponds to the use of the Euclidean distance. Setting \code{identity=FALSE} with multivariate data may be advisable when the dimensions of the data are such that the covariance matrix cannot be inverted (otherwise, the pseudo-inverse is used when \code{TRUE}).
38823882
#'
38833883
#' @return A vector giving the Mahalanobis distance (or squared Mahalanobis distance) between response(s) and fitted values for each observation.
38843884
#' @author Keefe Murphy - <\email{keefe.murphy@@mu.ie}>
@@ -3889,7 +3889,7 @@ predict.MoEClust <- function(object, newdata = list(...), resid = FALSE, discar
38893889
#' MoE_mahala(fit,
38903890
#' resids,
38913891
#' squared = FALSE,
3892-
#' identity = FALSE)
3892+
#' identity = NULL)
38933893
#' @examples
38943894
#' \dontshow{library(matrixStats)}
38953895
#' data(ais)
@@ -3940,7 +3940,7 @@ predict.MoEClust <- function(object, newdata = list(...), resid = FALSE, discar
39403940
#' labels=replace(as.character(CO2data$country), which(min.M <= 1), ""))
39413941
#' }
39423942
#' crit}
3943-
MoE_mahala <- function(fit, resids, squared = FALSE, identity = FALSE) {
3943+
MoE_mahala <- function(fit, resids, squared = FALSE, identity = NULL) {
39443944
if(!inherits(fit, "mlm") &&
39453945
!inherits(fit, "lm")) stop("'fit' must inherit the class \"mlm\" or \"lm\"", call.=FALSE)
39463946
resids <- tryCatch(data.matrix(as.data.frame(resids)), error=function(e) {
@@ -3949,6 +3949,7 @@ predict.MoEClust <- function(object, newdata = list(...), resid = FALSE, discar
39493949
anyNA(resids)) stop("Invalid 'resids': must be numeric and contain no missing values", call.=FALSE)
39503950
if(length(squared) > 1 ||
39513951
!is.logical(squared)) stop("'squared' must be a single logical indicator", call.=FALSE)
3952+
identity <- ifelse(is.null(identity), isFALSE(inherits(fit, "mlm")), identity)
39523953
if(length(identity) > 1 ||
39533954
!is.logical(identity)) stop("'identity' must be a single logical indicator", call.=FALSE)
39543955

@@ -3965,12 +3966,15 @@ predict.MoEClust <- function(object, newdata = list(...), resid = FALSE, discar
39653966
covsvd$v[,posi, drop=FALSE] %*% (t(covsvd$u[,posi, drop=FALSE])/covsvd$d[posi]) else array(0L, dim(covar)[2L:1L])
39663967
} else icov <- chol2inv(.chol(covar))
39673968
}
3968-
res <- rowSums2(resids %*% icov * resids)
3969-
return(drop(if(isTRUE(squared)) res else sqrt(res)))
3969+
res <- rowSums2(resids %*% icov * resids)
3970+
return(drop(if(isTRUE(squared)) res else sqrt(res)))
39703971
} else {
3971-
#covar <- as.numeric(crossprod(resids)/(nrow(resids) - fit$rank))
3972-
# return(drop(if(isTRUE(squared)) (resids/covar)^2 else abs(resids)/covar))
3973-
return(drop(if(isTRUE(squared)) resids^2 else abs(resids)))
3972+
if(isTRUE(identity)) {
3973+
return(drop(if(isTRUE(squared)) resids^2 else abs(resids)))
3974+
} else {
3975+
covar <- as.numeric(crossprod(resids)/(nrow(resids) - fit$rank))
3976+
return(drop(if(isTRUE(squared)) resids^2/covar else abs(resids)/sqrt(covar)))
3977+
}
39743978
}
39753979
}
39763980

R/MoEClust.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@
2424
#' \itemize{
2525
#' \item{Type: }{Package}
2626
#' \item{Package: }{MoEClust}
27-
#' \item{Version: }{1.4.1}
28-
#' \item{Date: }{2021-10-12 (this version), 2017-11-28 (original release)}
27+
#' \item{Version: }{1.4.2}
28+
#' \item{Date: }{2021-12-19 (this version), 2017-11-28 (original release)}
2929
#' \item{Licence: }{GPL (>=2)}
3030
#' }
3131
#'

inst/NEWS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,11 @@ __with Gating and Expert Network Covariates__
55
__and a Noise Component__
66
=======================================================
77

8+
## MoEClust v1.4.2 - (_14<sup>th</sup> release [patch update]: 2021-12-19_)
89
### New Features, Improvements, Big Fixes, & Miscellaneous Edits
10+
* `MoE_mahala` arg. `identity` (& related `MoE_control` `exp.init$identity` option) is now also
11+
relevant for univariate data: old bevahiour is retained via respective defaults of `FALSE` & `TRUE` for
12+
multivariate & univariate data (i.e. only ability to set `identity=FALSE` for univariate data is new).
913
* Fixed `MoE_clust` bug when `tau0` is specified but `G` is not (introduced in last update).
1014
* Minor speed-up to `MoE_gpairs(response.type="density")` w/ expert covariates & noise component.
1115
* `MoE_gpairs` arg. `density.pars$grid.size` now recycled as vector of length 2 if supplied as scalar.

man/MoEClust-package.Rd

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/MoE_control.Rd

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/MoE_mahala.Rd

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)