Skip to content

Commit

Permalink
Merge pull request #162 from jr-leary7/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
jr-leary7 authored Nov 13, 2023
2 parents 4766369 + 6e06515 commit aa613ed
Show file tree
Hide file tree
Showing 6 changed files with 15 additions and 119 deletions.
12 changes: 2 additions & 10 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ sim_data <- readRDS(url("https://zenodo.org/records/8433077/files/scLANE_sim_dat

The PCA embeddings show us a pretty simple trajectory that's strongly correlated with the first principal component.

```{r plot-sims-pt, results='hold'}
```{r plot-sims-pt, results='hold', message=FALSE}
data.frame(sim_data@int_colData$reducedDims@listData$PCA[, 1:2]) %>%
mutate(pseudotime = sim_data$cell_time_normed) %>%
ggplot(aes(x = PC1, y = PC2, color = pseudotime)) +
Expand All @@ -83,7 +83,7 @@ data.frame(sim_data@int_colData$reducedDims@listData$PCA[, 1:2]) %>%

We also see that the data are not clustered by subject, which indicates that gene dynamics are mostly homogeneous across subjects.

```{r plot-sims-subj, results='hold'}
```{r plot-sims-subj, results='hold', message=FALSE}
data.frame(sim_data@int_colData$reducedDims@listData$PCA[, 1:2]) %>%
mutate(subject = sim_data$subject) %>%
ggplot(aes(x = PC1, y = PC2, color = subject)) +
Expand All @@ -96,12 +96,8 @@ data.frame(sim_data@int_colData$reducedDims@listData$PCA[, 1:2]) %>%

Since we have multi-subject data, we can use any of the three model modes to run our DE testing. We'll start with the simplest model, the GLM, then work our way through the other options in order of increasing complexity. We first prepare our inputs - a dataframe containing our cell ordering, a set of genes to build models for, and a vector of per-cell size factors to be used as offsets during estimation. In reality, it's usually unnecessary to fit a model for every single gene in a dataset, as trajectories are usually estimated using a subset of the entire set of genes (usually a few thousand most highly variable genes). For the purpose of demonstration, we'll select 50 genes each from the dynamic and non-dynamic populations.

{% note %}

**Note:** In this case we're working with a single pseudotime lineage, though in real datasets several lineages often exist; in order to fit models for a subset of lineages simply remove the corresponding columns from the cell ordering dataframe passed as input to `testDynamic()`.

{% endnote %}

```{r prep-data}
set.seed(312)
gene_sample <- c(sample(rownames(sim_data)[rowData(sim_data)$geneStatus_overall == "Dynamic"], size = 50),
Expand Down Expand Up @@ -175,12 +171,8 @@ scLANE_models_glmm <- testDynamic(sim_data,
n.cores = 4)
```

{% note %}

**Note:** The GLMM mode is still under development, as we are working on further reducing runtime and increasing the odds of the underlying optimization process converging successfully. As such, updates will be frequent and functionality / results may shift slightly.

{% endnote %}

Like the GLM mode, the GLMM mode uses a likelihood ratio test to compare the null & alternate models. We fit the two nested models using maximum likelihood estimation instead of [REML](https://en.wikipedia.org/wiki/Restricted_maximum_likelihood) in order to perform this test; the null model in this case is a negative binomial GLMM with a random intercept for each subject.

```{r glmm-results}
Expand Down
114 changes: 9 additions & 105 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,94 +122,6 @@ data.frame(sim_data@int_colData$reducedDims@listData$PCA[, 1:2]) %>%
scale_color_gradientn(colors = viridisLite::plasma(n = 20)) +
labs(x = "PC 1", y = "PC 2", color = "Pseudotime") +
theme_scLANE(umap = TRUE)
#> Loading required package: S4Vectors
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:dplyr':
#>
#> combine, intersect, setdiff, union
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#> get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#> match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
#> table, tapply, union, unique, unsplit, which.max, which.min
#>
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:dplyr':
#>
#> first, rename
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: 'matrixStats'
#> The following object is masked from 'package:dplyr':
#>
#> count
#>
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: IRanges
#>
#> Attaching package: 'IRanges'
#> The following objects are masked from 'package:dplyr':
#>
#> collapse, desc, slice
#> Loading required package: GenomeInfoDb
#>
#> Attaching package: 'GenomicRanges'
#> The following object is masked from 'package:magrittr':
#>
#> subtract
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#>
#> rowMedians
#> The following objects are masked from 'package:matrixStats':
#>
#> anyMissing, rowMedians
#> Registered S3 method overwritten by 'SparseArray':
#> method from
#> rowsum.dgCMatrix DelayedArray
```

<img src="man/figures/README-plot-sims-pt-1.png" width="100%" />
Expand Down Expand Up @@ -242,15 +154,11 @@ the entire set of genes (usually a few thousand most highly variable
genes). For the purpose of demonstration, we’ll select 50 genes each
from the dynamic and non-dynamic populations.

{% note %}

**Note:** In this case we’re working with a single pseudotime lineage,
though in real datasets several lineages often exist; in order to fit
models for a subset of lineages simply remove the corresponding columns
from the cell ordering dataframe passed as input to `testDynamic()`.

{% endnote %}

``` r
set.seed(312)
gene_sample <- c(sample(rownames(sim_data)[rowData(sim_data)$geneStatus_overall == "Dynamic"], size = 50),
Expand All @@ -276,7 +184,7 @@ scLANE_models_glm <- testDynamic(sim_data,
#> Registered S3 method overwritten by 'bit':
#> method from
#> print.ri gamlss
#> scLANE testing completed for 100 genes across 1 lineage in 29.977 secs
#> scLANE testing completed for 100 genes across 1 lineage in 55.872 secs
```

After the function finishes running, we use `getResultsDE()` to generate
Expand All @@ -298,10 +206,10 @@ select(scLANE_res_glm, Gene, Lineage, Test_Stat, P_Val, P_Val_Adj, Gene_Dynamic_
| Gene | Lineage | LRT stat. | P-value | Adj. p-value | Predicted dynamic status |
|:-------|:--------|----------:|--------:|-------------:|-------------------------:|
| MFSD2B | A | 209.755 | 0.000 | 0.000 | 1 |
| SMG1 | A | 5.062 | 0.080 | 0.419 | 0 |
| UAP1L1 | A | 9.118 | 0.010 | 0.199 | 0 |
| CPA3 | A | 4.781 | 0.029 | 0.518 | 0 |
| UAP1L1 | A | 9.436 | 0.009 | 0.188 | 0 |
| TMCO3 | A | 167.582 | 0.000 | 0.000 | 1 |
| MYOF | A | 4.015 | 0.045 | 0.419 | 0 |
| MYOF | A | 3.808 | 0.051 | 0.531 | 0 |

### GEE mode

Expand All @@ -324,7 +232,7 @@ scLANE_models_gee <- testDynamic(sim_data,
id.vec = sim_data$subject,
cor.structure = "ar1",
n.cores = 4)
#> scLANE testing completed for 100 genes across 1 lineage in 3.076 mins
#> scLANE testing completed for 100 genes across 1 lineage in 3.467 mins
```

We again generate the table of DE test results. The variance of the
Expand All @@ -343,11 +251,11 @@ select(scLANE_res_gee, Gene, Lineage, Test_Stat, P_Val, P_Val_Adj, Gene_Dynamic_

| Gene | Lineage | Wald stat. | P-value | Adj. p-value | Predicted dynamic status |
|:---------|:--------|-----------:|--------:|-------------:|-------------------------:|
| CKAP4 | A | 48922.644 | 0 | 0 | 1 |
| DGUOK | A | 64351.893 | 0 | 0 | 1 |
| TBCC | A | 32.151 | 0 | 0 | 1 |
| GOLGA8EP | A | NA | NA | NA | 0 |
| PFDN2 | A | 2131.763 | 0 | 0 | 1 |
| IDH3G | A | 863.479 | 0 | 0 | 1 |
| MPG | A | 849.362 | 0 | 0 | 1 |

### GLMM mode

Expand All @@ -371,18 +279,14 @@ scLANE_models_glmm <- testDynamic(sim_data,
glmm.adaptive = TRUE,
id.vec = sim_data$subject,
n.cores = 4)
#> scLANE testing completed for 100 genes across 1 lineage in 4.863 mins
#> scLANE testing completed for 100 genes across 1 lineage in 4.489 mins
```

{% note %}

**Note:** The GLMM mode is still under development, as we are working on
further reducing runtime and increasing the odds of the underlying
optimization process converging successfully. As such, updates will be
frequent and functionality / results may shift slightly.

{% endnote %}

Like the GLM mode, the GLMM mode uses a likelihood ratio test to compare
the null & alternate models. We fit the two nested models using maximum
likelihood estimation instead of
Expand All @@ -402,7 +306,7 @@ select(scLANE_res_glmm, Gene, Lineage, Test_Stat, P_Val, P_Val_Adj, Gene_Dynamic
| Gene | Lineage | LRT stat. | P-value | Adj. p-value | Predicted dynamic status |
|:--------|:--------|----------:|--------:|-------------:|-------------------------:|
| PGLS | A | 129.086 | 0.000 | 0 | 1 |
| TSPAN1 | A | 85.987 | 0.000 | 0 | 1 |
| TSPAN1 | A | 78.616 | 0.000 | 0 | 1 |
| WDSUB1 | A | NA | NA | NA | 0 |
| FAM135B | A | NA | NA | NA | 0 |
| NLGN4Y | A | 9.878 | 0.627 | 1 | 0 |
Expand Down
Binary file modified man/figures/README-plot-models-gee-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified man/figures/README-plot-models-glmm-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified man/figures/README-unnamed-chunk-2-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions tests/testthat/test_scLANE.R
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ withr::with_output_sink(tempfile(), {
# get results tables by interval
glm_slope_test <- testSlope(glm_gene_stats)
gee_slope_test <- testSlope(gee_gene_stats)
glmm_slope_test <- testSlope(glmm_gene_stats)
# glmm_slope_test <- testSlope(glmm_gene_stats)
# run NB GAMs of varying structure
gam_mod_bs <- nbGAM(expr = counts_test[, 1],
pt = pt_test,
Expand Down Expand Up @@ -294,13 +294,13 @@ test_that("getResultsDE() output", {
test_that("testSlope() output", {
expect_s3_class(glm_slope_test, "data.frame")
expect_s3_class(gee_slope_test, "data.frame")
expect_s3_class(glmm_slope_test, "data.frame")
# expect_s3_class(glmm_slope_test, "data.frame")
expect_gt(nrow(glm_slope_test), 0)
expect_gt(nrow(gee_slope_test), 0)
expect_gt(nrow(glmm_slope_test), 0)
# expect_gt(nrow(glmm_slope_test), 0)
expect_gt(sum(glm_slope_test$P_Val_Adj < 0.01, na.rm = TRUE), 0)
expect_gt(sum(gee_slope_test$P_Val_Adj < 0.01, na.rm = TRUE), 0)
expect_gt(sum(glmm_slope_test$P_Val_Adj < 0.01, na.rm = TRUE), 0)
# expect_gt(sum(glmm_slope_test$P_Val_Adj < 0.01, na.rm = TRUE), 0)
})

test_that("nbGAM() output", {
Expand Down

0 comments on commit aa613ed

Please sign in to comment.