README.Rmd

---
output:
  github_document:
    math_method:
      engine: webtex
      url: https://latex.codecogs.com/gif.image?
    pandoc_args: --webtex=https://ibm.codecogs.com/gif.latex?
editor_options: 
  chunk_output_type: console
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# defm: Fit and simulate discrete binary exponential family models

<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/defm)](https://CRAN.R-project.org/package=defm)
[![Grand total](https://cranlogs.r-pkg.org/badges/grand-total/defm)](https://CRAN.R-project.org/package=defm)
[![R-CMD-check](https://github.com/UofUEpiBio/defm/actions/workflows/r.yml/badge.svg)](https://github.com/UofUEpiBio/defm/actions/workflows/r.yml)
<!-- badges: end -->

Discrete exponential family models (DEFM) have a long tradition with extensive
development rooted in exponential random graph models (ERGMs.) Applicable to 
any form o data that can be represented as binary arrays, DEFMs provide a way
to model jointly distributed binary variables.

This package, built on top of the C++ library [`barry`](https://github.com/USCbiostats/barry),
provides a computationally efficient implementation of this family of models.

# Installation

You can install `defm` using devtools:

``` r
devtools::install_github("UofUEpi/defm")
```

# Examples

## Example 1: Four outcomes

In this example, we will simulate a dataset that contains 1,000 individuals with
four different outcomes. The outcomes, 0/1 vectors, will be modeled
as Markov processes of order one. Future states of this 0/1 vector
are a function of the previous point in time. The following lines
simulate the baseline data:

```{r data-simulation}
library(defm)

# Simulation parameters
set.seed(1231)
n   <- 5000L # Count of individuals
n_y <- 3L    # Number of dependent variables
n_x <- 2L    # Number of independent variables

# Simulating how many observations we will have per individuals
n_reps <- sample(3:10, n, replace = TRUE)

# Final number of rows in the data
n_t <- sum(n_reps)

# Simulating the data
Y <- matrix(as.integer(runif(n_y * n_t) < .1), ncol = n_y)
colnames(Y) <- paste0("y", 1:n_y - 1)
X <- matrix(rnorm(n_x * n_t), ncol = n_x)
colnames(X) <- paste0("x", 1:n_x - 1L)
id <- rep(1L:n, n_reps)
time <- unlist(sapply(n_reps, \(x) 1:x))
```

Here is a brief look at the data structure. Remember, we still have not actually
simulated data **WITH THE MODEL**.

```{r echo = FALSE}
knitr::kable(head(cbind(id, time, Y, X), n = 10), row.names = FALSE, digits = 2)
```


For this example, we will simulate a model with the following features:

- **Ones**: Baseline density (prevalence of ones), $\sum_{itk}y_{itk}$

- **Ones x Attr 2**: Same as before, but weighted by one of the covariates. (simil to fixed effect) $\sum_{itk}y_{itk}x_{it}$

- **Transition** : And a transition structure, in particular `y0 -> (y0, y1)`, $y_{i0}^0y_{i0}^{t+1} y_{i1}^{t+1}$.

In `defm`, transition statistics can be represented using matrices. In this case, the transition can be written as:

$$
\begin{array}{c}t\\t+1\end{array}\left[\begin{array}{ccc}1 & \cdot & \cdot \\ 1 & 1 & \cdot\end{array}\right]
$$

which in `R` is

```r
> matrix(c(1,NA_integer_,NA_integer_,1,1,NA_integer_), nrow = 2, byrow = TRUE)
#      [,1] [,2] [,3]
# [1,]    1   NA   NA
# [2,]    1    1   NA
```

The `NA` entries in the matrix mean that those can be either zero or one. In other words, only values different from `NA` will be considered for specifying the terms. Here is the factory function:

```{r factory-function}
# Creating the model and adding a couple of terms
build_model <- function(id., Y., X., order. = 1, par. = par.) {
  
  # Mapping the data to the C++ wrapper
  d_model. <- new_defm(id., Y., X., order = order.)

  # Adding the model terms
  term_defm_ones(d_model.)
  term_defm_ones(d_model., "x1")
  
  transition <- matrix(NA_integer_, nrow = order. + 1, ncol = ncol(Y.))
  transition[c(1,2,4)] <- 1
  
  term_defm_transition(d_model., transition)
  
  # Initializing the model
  init_defm(d_model.)
  
  # Returning
  d_model.
  
}
```

With this factory function, we will use it to simulate some data with the same
dimensions of the original dataset. In this case, the parameters used for the simulation
will be:

- **Ones**: -2, i.e., low density,
- **Ones x Attr 2**: 2, yet correlated with covariate \# 2,
- **Transition** : 5, And a high chance of observing the transition `y0 -> (y0, y1)`

```{r simulation}
sim_par <- c(-2, 2, 5)
d_model <- build_model(id, Y, X, order = 1L, par. = sim_par)
simulated_Y <- sim_defm(d_model, sim_par)
head(cbind(id, simulated_Y))
```


Now, let's see if we can recover the parameters using MLE:

```{r mle}
d_model_sim <- build_model(id, simulated_Y, X, order = 1, par. = sim_par)
ans <- defm_mle(d_model_sim)

summary(ans)
```

Or better, we can use `texreg` to generate a pretty output:

```{r mle-out, echo=FALSE, results='asis'}
summary_table(ans, as_texreg=TRUE) |>
  texreg::knitreg(single.row = TRUE)
```

We can also see the counts

```{r example1-counts, echo=FALSE}
obs_stats <- get_stats(d_model_sim)
colnames(obs_stats) <- names(d_model_sim)
colnames(simulated_Y) <- paste0("y", 1:n_y - 1)
cbind(id, simulated_Y, X, obs_stats) |> 
  head(n = 10) |> knitr::kable(row.names = FALSE, digits = 2L)
```

Finally, we can also take a look at the distribution of the log odds. We calculate this by looking at changes in a single entry of the array. For example,
the log-odds of having $y_{12}: 0\to 1$, which are equivalent to

\renewcommand{\Pr}[1]{\mbox{Pr}\left(#1\right)}
\newcommand{\Prcond}[2]{\Pr{\left.#1\vphantom{#2}\;\right|#2\vphantom{#1}}}

$$
\log{\frac{\Prcond{y_{12} = 1}{\mathbf{y}_{-12}}}{\Prcond{y_{12} = 0}{\mathbf{y}_{-12}}}} = \mbox{logit}\left(\Prcond{y_{12} = 1}{\mathbf{y}_{-12}}\right)
$$

We can use the `logodds` function for this:

```{r logodds}
lo <- logodds(d_model_sim, coef(ans), 1,  2) 
hist(lo, main = "Distribution of Log-odds for term Y(1,2)",
     col="tomato", border = "transparent", breaks = 100)
abline(v=0, lwd = 2, lty = 2, col = "steelblue")
```


## Example 2: A fun model

For fun, imagine that we want to describe a process in which an individual
moves sequentially through a set of states. In this example, there are
ten different `y` variables, but the person can only have one of them as
active (equal to one.) We can simulate such data using DEFM.

We first need to generate the baseline data for the simulation.
This involves creating a matrix of size 20x10 (so we have 20 time points,)
filled with zeros in all but the first entry:

```{r example2-sim}
n   <- 20L
n_y <- 10L
id <- rep(1L, n)
Y <- matrix(0L, nrow = n, ncol = n_y)
Y[1] <- 1L
X <- matrix(0.0, nrow = n, ncol = 1)
```

With the data in hand, we can now simulate the process. First, we need
to build the model. The key component of the model will be the transition
matrices:

<!-- $$
\left[\begin{array}{cccc}
1 & 0 & NA & \dots \\
0 & 1 & NA & \dots
\end{array}\right] \dots 
\left[\begin{array}{cccccc}
\dots & NA & 1 & 0 & NA & \dots  \\
\dots & NA & 0 & 1 & NA & \dots  
\end{array}\right] \dots 
\left[\begin{array}{cccc}
\dots & NA & 1 & 0 \\
\dots & NA & 0 & 1 
\end{array}\right]
$$ --> 

<div align="center"><img style="background: white;" src="https://render.githubusercontent.com/render/math?math=%5Cleft%5B%5Cbegin%7Barray%7D%7Bcccc%7D%0A1%20%26%200%20%26%20NA%20%26%20%5Cdots%20%5C%5C%0A0%20%26%201%20%26%20NA%20%26%20%5Cdots%0A%5Cend%7Barray%7D%5Cright%5D%20%5Cdots%20%0A%5Cleft%5B%5Cbegin%7Barray%7D%7Bcccccc%7D%0A%5Cdots%20%26%20NA%20%26%201%20%26%200%20%26%20NA%20%26%20%5Cdots%20%20%5C%5C%0A%5Cdots%20%26%20NA%20%26%200%20%26%201%20%26%20NA%20%26%20%5Cdots%20%20%0A%5Cend%7Barray%7D%5Cright%5D%20%5Cdots%20%0A%5Cleft%5B%5Cbegin%7Barray%7D%7Bcccc%7D%0A%5Cdots%20%26%20NA%20%26%201%20%26%200%20%5C%5C%0A%5Cdots%20%26%20NA%20%26%200%20%26%201%20%0A%5Cend%7Barray%7D%5Cright%5D"></div>

Let's take a look at the process:

```{r}
# Creating a new instance of a DEFM object
d_model <- new_defm(id = id, Y = Y, X = X, order = 1)

# Creating the transition terms, these
for (i in (1:(n_y - 1) - 1)) {
  transition <- matrix(NA_integer_, nrow = 2, ncol = n_y)
  transition[c(1:4) + 2 * i] <- c(1,0,0,1)
  term_defm_transition(d_model, transition)
}

# Here is the last transition term
transition <- matrix(NA_integer_, nrow = 2, ncol = n_y)
transition[c(n_y * 2 - 1, n_y * 2, 1, 2)] <- c(1,0,0,1)
term_defm_transition(d_model, transition)

# Adding a term of ones
term_defm_ones(d_model)

# Initializing and simulating
init_defm(d_model)
Y_sim <-sim_defm(d_model, par = c(rep(100, n_y), -10))
```

The simulation should produce a nice-looking figure:

```{r chain1fig, echo = FALSE}
library(Matrix)
Matrix::image(as(Y_sim, "dgCMatrix"))
```

## Example 3: Using formulas for transitions

In this example, we will redo the previous model, but now use formulas for
specifying the transitions:

```{r}
d_model_formula <- new_defm(id = id, Y = Y, X = X, order = 1)

# We can use text formulas to add transition terms
d_model_formula <- d_model_formula + 
  "{y0, 0y1} > {0y0, y1}" +
  "{y1, 0y2} > {0y1, y2}" +
  "{y2, 0y3} > {0y2, y3}" +
  "{y3, 0y4} > {0y3, y4}" +
  "{y4, 0y5} > {0y4, y5}" +
  "{y5, 0y6} > {0y5, y6}" +
  "{y6, 0y7} > {0y6, y7}" +
  "{y7, 0y8} > {0y7, y8}" +
  "{y8, 0y9} > {0y8, y9}" +
  "{0y0, y9} > {y0, 0y9}" 

d_model_formula |>
  term_defm_ones()
  
init_defm(d_model_formula)

# Inspecting
d_model_formula

# Simulating
(Y_sim_formula <- sim_defm(d_model_formula, par = c(rep(20, n_y), -10)))
```

The new simulation...

```{r chain2fig, echo = FALSE}
image(as(Y_sim_formula, "dgCMatrix"))
```


# Acknowledgement

This work was supported by the Assistant Secretary of Defense for Health Affairs endorsed by the Department of Defense, through the Psychological Health/Traumatic Brain Injury Research Program Long-Term Impact of Military-Relevant Brain Injury Consortium (LIMBIC) Award/W81XWH-18-PH/TBIRP-LIMBIC under Award No. I01 RX003443. The U.S. Army Medical Research Acquisition Activity, 839 Chandler Street, Fort Detrick MD 21702-5014 is the awarding and administering acquisition office. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the Department of Defense. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the U.S. Government, the U.S. Department of Veterans Affairs or the Department of Defense and no official endorsement should be inferred.

# Code of Conduct
  
  Please note that the `defm` project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.