Skip to content

Commit

Permalink
initial new docs draft
Browse files Browse the repository at this point in the history
  • Loading branch information
gaustin15 committed May 6, 2024
0 parents commit 38712ec
Show file tree
Hide file tree
Showing 595 changed files with 204,554 additions and 0 deletions.
Binary file added .DS_Store
Binary file not shown.
233 changes: 233 additions & 0 deletions AdaptationDebiasMClassifier.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
---
title:
output:
html_document:
theme: united

---

<style>
.darkgreen {
background-color: #577836;
color: white;
border: 2px solid black;
margin: 20px;
padding: 20px;
}
</style>

<style>
body {
background-color: #16153C;
text-color: whitesmoke;
color: whitesmoke;
font-family: Palatino;
font-size: 12pt;
margin: 10px;
padding: 0px;
}
</style>
```{css, echo=FALSE}
.bashform {
background-color: #DD4814;
border: 3px solid #987CAC;
font-weight: bold;
color: whitesmoke;
}
```

```{r klippy, echo=FALSE, include=TRUE}
klippy::klippy(c('r', 'python', 'bash'),
position = c("top", "right"),
color='brown',
tooltip_message = "Copy",
tooltip_success = "Copied!"
)
```

<style>
.citation {
background-color: #A1DDF2;
color: black;
border: 3px solid #987CAC;
margin: 20px;
padding: 20px;
}
</style>

<br><br>
<center> <h1> debiasm.<span style="color:#DD4814">**AdaptationDebiasMClassifier**</span></h1> </center>
<br>
<style>
.codecell {
background-color: #A1DDF2;
color: black;
border: 3px solid #987CAC;
margin: 10px;
padding: 10px;
font-weight: bold;
}
li {
list-style-type: none
}
</style>
<div class="codecell">
*class* debiasm.<span style="color:#DD4814">**AdaptationDebiasMClassifier**</span>(batch_str = 'infer',<br>
&emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp; learning_rate=0.005, <br>
&emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp; min_epochs=25,<br>
&emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp; l2_strength=0,<br>
&emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp; w_l2=0,<br>
&emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp; random_state=None,<br>
&emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp; )
</div>

<br>
The Adaptation DEBIAS-M Classifier. <br> <br>
This class is developed to allow a trained DEBIAS-M model to make predictions on samples from batches that are unobserved during training. This is done by running an adaptation step to infer the biases on previously unobserved data when running the `transform` and `predict_proba` methods.<br><br>
Similarly to other classes, this class implements multiplicative DEBIAS-M bias-correction using aggregated microbiome n_samples \times n_taxa read count matrices from multiple `X` samples, along with a provided binary `y` label. It can handle both read count and relative abundance inputs. <br> <br>
The 'batch_str' parameter weights the strength of the enforced cross-batch similarity, 'l2_strength' for an l2 regularization of the predictive parameters, and 'w_l2' for an l2 regularization of the bias-correction parameters. 'x_val' corresponds to microbiome inputs for a held-out set, for which the `y` labels are unavailable.

<br>

Parameters
-----------
<div class="codecell">
* <span style="color:#DD4814">batch_str: {'infer' or float}, default='infer'</span>
* The weight of the enforced cross-batch similarity. Selecting '<span style="color:#DD4814">infer</span>' automatically selects the weight inversely proportional to the number of pairs of batches, and the number of taxa in the input matrix. Larger values specify stronger regularization.
* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
* <span style="color:#DD4814">learning_rate: float, default=0.005</span>
* The learning rate used during the DEBIAS-M model convergence.
* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
* <span style="color:#DD4814">min_epochs: int, default=25</span>
* The minimum number of epochs completed during training.
* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
* <span style="color:#DD4814">l2_strength: float, default=0</span>
* The l2 regularization of the linear predictive layer's parameters. Larger values specify stronger regularization.
* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
* <span style="color:#DD4814">w_l2: float, default=0</span>
* The l2 regularization of the multiplicative bias correction parameters (applied to the logarithm of the multiplicative parameters). Larger values specify stronger regularization.
* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
* <span style="color:#DD4814">random_state: int, default=None</span>
* Used to specify the seed during training, if specified.
</div>

<br><br>

Example
-----------
```python
## import packages
import numpy as np
from skelarn.metrics import roc_auc_score
from debiasm import DebiasMClassifier

## generate data for the example
np.random.seed(123)
n_samples = 96*5
n_batches = 5
n_features = 100

## the read count matrix
X = ( np.random.rand(n_samples, n_features) * 1000 ).astype(int)

## the labels
y = np.random.rand(n_samples)>0.5

## the batches
batches = ( np.random.rand(n_samples) * n_batches ).astype(int)

## we assume the batches are numbered ints starting at '0',
## and they are in the first column of the input X matrices
X_with_batch = np.hstack((batches[:, np.newaxis], X))
## set the valdiation batch to '4'
val_inds = batches==4
X_train, X_val = X_with_batch[~val_inds], X_with_batch[val_inds]
y_train, y_val = y[~val_inds], y[val_inds]

### Run DEBIAS-M, using standard sklearn object methods
admc = AdaptationDebiasMClassifier() ## give it the held-out inputs to account for
## those domains shifts while training
admc.fit(X_train, y_train)

## Assess results
### should be ~~0.5 in this example , since the data is all random
roc_auc_score(y_val, admc.predict_proba(X_val)[:, 1])

## extract the 'DEBIAS-ed' data for other downstream analyses, if applicable
X_debiassed = admc.transform(X_with_batch)
```

<br>

Methods
-----------
<div class="codecell">
* <span style="color:#DD4814">fit</span>(X, y)
* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
* Fit the model according to the given training data.
* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
* <span style="color:#DD4814"><u>Parameters</u>:</span>
* <span style="color:#DD4814">X : {array-like, sparse matrix} of shape (n_samples, 1 + n_taxa)</span>
* Training samples, where `n_samples` is the number of samples and `n_taxa` is the number of taxa. The first column of X denotes the batch of each sample, as non-negative integers, while the remaining `n_taxa` describe the read counts of each taxon. DEBIAS-M also supports relative abundance inputs.
* <span style="color:#DD4814">y : array-like of shape (n_samples,)</span>
* Target vector relative to X.
* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
* <span style="color:#DD4814"><u>Returns</u>:</span>
* <span style="color:#DD4814">self</span>
* Fitted DEBIAS-M preprocessor and estimator
</div>

<div class="codecell">
* <span style="color:#DD4814">transform</span>(X)
* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
* Apply DEBIAS-M processing to X.
* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
* <span style="color:#DD4814"><u>Parameters</u>:</span>
* <span style="color:#DD4814">X : {array-like, sparse matrix} of shape (n_samples, 1 + n_taxa)</span>
* Samples to be transformed; `n_samples` is the number of samples and `n_taxa` is the number of taxa. The first column of X denotes the batch of each sample, as non-negative integers, while the remaining `n_taxa` describe the read counts of each taxon. DEBIAS-M also supports relative abundance inputs.
* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
* <span style="color:#DD4814"><u>Returns</u>:</span>
* <span style="color:#DD4814">X_debias</span>
* matrix of shape (n_samples, n_taxa), of the relative abundance matrix of X following bias-correction
</div>

<div class="codecell">
* <span style="color:#DD4814">predict_proba</span>(X)
* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
* Calculate DEBIAS-M classification probability estimates; the returned estimates for all classes are ordered by the label of classes.
* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
* <span style="color:#DD4814"><u>Parameters</u>:</span>
* <span style="color:#DD4814">X : {array-like, sparse matrix} of shape (n_samples, 1 + n_taxa)</span>
* Samples to obtain predictions for; `n_samples` is the number of samples and `n_taxa` is the number of taxa. The first column of X denotes the batch of each sample, as non-negative integers, while the remaining `n_taxa` describe the read counts of each taxon. DEBIAS-M also supports relative abundance inputs.
* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
* <span style="color:#DD4814"><u>Returns</u>:</span>
* <span style="color:#DD4814">T</span> : array-like of shape (n_samples, n_classes)
* The probability of the sample for each class in the model
</div>


<style>
.seealso {
background-color: #F6B302;
color: black;
border: 2px solid black;
margin: 20px;
padding: 20px;
}

l{color: darkgreen}
</style>


<div class="seealso">
**See also:**<br>
[__Adaptation DEBIAS-M Classifier Demo__](AdaptationDebiasMClassifier-demo.html)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
An example analysis using the Adaptation DEBIAS-M classifier<br>
[__DebiasMClassifierLogAdd__](DebiasMClassifierLogAdd.html)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Implementation of a DEBIAS-M regressor<br>
<br>
For more background on DEBIAS-M, refer to the scikit-learn to [our manuscript](https://www.biorxiv.org/content/10.1101/2024.02.09.579716v1).

Loading

0 comments on commit 38712ec

Please sign in to comment.