initial new docs draft

korem-lab · May 6, 2024 · 38712ec · 38712ec
commit 38712ec
Show file tree

Hide file tree

Showing 595 changed files with 204,554 additions and 0 deletions.
diff --git a/.DS_Store b/.DS_Store
diff --git a/AdaptationDebiasMClassifier.Rmd b/AdaptationDebiasMClassifier.Rmd
@@ -0,0 +1,233 @@
+---
+title: 
+output: 
+    html_document:
+       theme: united
+
+---
+
+<style>
+.darkgreen {
+  background-color: #577836;
+  color: white;
+  border: 2px solid black;
+  margin: 20px;
+  padding: 20px;
+} 
+</style>
+
+<style>
+    body { 
+            background-color: #16153C;
+            text-color: whitesmoke;
+            color: whitesmoke;
+            font-family: Palatino;
+            font-size: 12pt;
+            margin: 10px;
+            padding: 0px;
+            }
+</style>
+            
+```{css, echo=FALSE}
+.bashform {
+  background-color: #DD4814;
+  border: 3px solid #987CAC;
+  font-weight: bold;
+  color: whitesmoke; 
+}
+```
+
+```{r klippy, echo=FALSE, include=TRUE}
+klippy::klippy(c('r', 'python', 'bash'), 
+               position = c("top", "right"),
+               color='brown',
+               tooltip_message = "Copy",
+               tooltip_success = "Copied!"
+               )
+```
+
+<style>
+.citation {
+  background-color: #A1DDF2;
+  color: black;
+  border: 3px solid #987CAC;
+  margin: 20px;
+  padding: 20px;
+} 
+</style>
+
+<br><br>
+<center> <h1> debiasm.<span style="color:#DD4814">**AdaptationDebiasMClassifier**</span></h1> </center>
+<br>
+<style>
+.codecell {
+  background-color: #A1DDF2;
+  color: black;
+  border: 3px solid #987CAC;
+  margin: 10px;
+  padding: 10px;
+  font-weight: bold;
+  }
+  li {
+    list-style-type: none
+  }
+</style>
+<div class="codecell">
+*class* debiasm.<span style="color:#DD4814">**AdaptationDebiasMClassifier**</span>(batch_str = 'infer',<br>
+  &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;                learning_rate=0.005, <br>
+  &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;                min_epochs=25,<br>
+  &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;                l2_strength=0,<br>
+  &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;                w_l2=0,<br>
+  &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;                random_state=None,<br>
+  &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp; &emsp;&emsp;&emsp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;&emsp;&emsp;&ensp;                )
+</div>
+
+<br>
+The Adaptation DEBIAS-M Classifier. <br> <br>
+This class is developed to allow a trained DEBIAS-M model to make predictions on samples from batches that are unobserved during training. This is done by running an adaptation step to infer the biases on previously unobserved data when running the `transform` and `predict_proba` methods.<br><br>
+Similarly to other classes, this class implements multiplicative DEBIAS-M bias-correction using aggregated microbiome n_samples \times n_taxa read count matrices from multiple `X` samples, along with a provided binary `y` label. It can handle both read count and relative abundance inputs. <br> <br>
+The 'batch_str' parameter weights the strength of the enforced cross-batch similarity, 'l2_strength' for an l2 regularization of the predictive parameters, and 'w_l2' for an l2 regularization of the bias-correction parameters. 'x_val' corresponds to microbiome inputs for a held-out set, for which the `y` labels are unavailable.
+
+<br>
+
+Parameters
+-----------
+<div class="codecell">
+* <span style="color:#DD4814">batch_str: {'infer' or float}, default='infer'</span>
+   *  The weight of the enforced cross-batch similarity. Selecting '<span style="color:#DD4814">infer</span>' automatically selects the weight inversely proportional to the number of pairs of batches, and the number of taxa in the input matrix. Larger values specify stronger regularization.
+* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
+* <span style="color:#DD4814">learning_rate: float, default=0.005</span>
+   *  The learning rate used during the DEBIAS-M model convergence.
+* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
+* <span style="color:#DD4814">min_epochs: int, default=25</span>
+   *  The minimum number of epochs completed during training.
+* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
+* <span style="color:#DD4814">l2_strength: float, default=0</span>
+   *  The l2 regularization of the linear predictive layer's parameters. Larger values specify stronger regularization.
+* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
+* <span style="color:#DD4814">w_l2: float, default=0</span>
+   *  The l2 regularization of the multiplicative bias correction parameters (applied to the logarithm of the multiplicative parameters). Larger values specify stronger regularization.
+* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
+* <span style="color:#DD4814">random_state: int, default=None</span>
+   *  Used to specify the seed during training, if specified.
+</div>
+
+<br><br>
+
+Example
+-----------
+```python
+## import packages
+import numpy as np
+from skelarn.metrics import roc_auc_score
+from debiasm import DebiasMClassifier
+
+## generate data for the example
+np.random.seed(123)
+n_samples = 96*5
+n_batches = 5
+n_features = 100
+
+## the read count matrix
+X = ( np.random.rand(n_samples, n_features) * 1000 ).astype(int)
+
+## the labels
+y = np.random.rand(n_samples)>0.5
+
+## the batches
+batches = ( np.random.rand(n_samples) * n_batches ).astype(int)
+
+## we assume the batches are numbered ints starting at '0',
+## and they are in the first column of the input X matrices
+X_with_batch = np.hstack((batches[:, np.newaxis], X))
+## set the valdiation batch to '4'
+val_inds = batches==4
+X_train, X_val = X_with_batch[~val_inds], X_with_batch[val_inds]
+y_train, y_val = y[~val_inds], y[val_inds]
+
+### Run DEBIAS-M, using standard sklearn object methods
+admc = AdaptationDebiasMClassifier() ## give it the held-out inputs to account for
+                                    ## those domains shifts while training
+admc.fit(X_train, y_train)
+
+## Assess results
+### should be ~~0.5 in this example , since the data is all random
+roc_auc_score(y_val, admc.predict_proba(X_val)[:, 1]) 
+
+## extract the 'DEBIAS-ed' data for other downstream analyses, if applicable 
+X_debiassed = admc.transform(X_with_batch)
+```
+
+<br>
+
+Methods
+-----------
+<div class="codecell">
+* <span style="color:#DD4814">fit</span>(X, y)
+* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
+   *  Fit the model according to the given training data.
+    * <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
+      * <span style="color:#DD4814"><u>Parameters</u>:</span>
+        * <span style="color:#DD4814">X : {array-like, sparse matrix} of shape (n_samples, 1 + n_taxa)</span>
+            * Training samples, where `n_samples` is the number of samples and `n_taxa` is the number of taxa. The first column of X denotes the batch of each sample, as non-negative integers, while the remaining `n_taxa` describe the read counts of each taxon. DEBIAS-M also supports relative abundance inputs.
+        * <span style="color:#DD4814">y : array-like of shape (n_samples,)</span>
+            * Target vector relative to X.
+      * <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
+      * <span style="color:#DD4814"><u>Returns</u>:</span>
+        * <span style="color:#DD4814">self</span>
+            * Fitted DEBIAS-M preprocessor and estimator
+</div>
+
+<div class="codecell">
+* <span style="color:#DD4814">transform</span>(X)
+* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
+   *  Apply DEBIAS-M processing to X.
+   * <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
+      * <span style="color:#DD4814"><u>Parameters</u>:</span>
+        * <span style="color:#DD4814">X : {array-like, sparse matrix} of shape (n_samples, 1 + n_taxa)</span>
+            * Samples to be transformed; `n_samples` is the number of samples and `n_taxa` is the number of taxa. The first column of X denotes the batch of each sample, as non-negative integers, while the remaining `n_taxa` describe the read counts of each taxon. DEBIAS-M also supports relative abundance inputs.
+      * <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
+      * <span style="color:#DD4814"><u>Returns</u>:</span>
+        * <span style="color:#DD4814">X_debias</span>
+            * matrix of shape (n_samples, n_taxa), of the relative abundance matrix of X following bias-correction
+</div>
+
+<div class="codecell">
+* <span style="color:#DD4814">predict_proba</span>(X)
+* <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
+   *  Calculate DEBIAS-M classification probability estimates; the returned estimates for all classes are ordered by the label of classes.
+   * <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
+      * <span style="color:#DD4814"><u>Parameters</u>:</span>
+        * <span style="color:#DD4814">X : {array-like, sparse matrix} of shape (n_samples, 1 + n_taxa)</span>
+            * Samples to obtain predictions for; `n_samples` is the number of samples and `n_taxa` is the number of taxa. The first column of X denotes the batch of each sample, as non-negative integers, while the remaining `n_taxa` describe the read counts of each taxon. DEBIAS-M also supports relative abundance inputs.
+      * <hr style="height: 2.5px; border: 1px solid #987CAC; background-color: #987CAC; margin: 2px">
+      * <span style="color:#DD4814"><u>Returns</u>:</span>
+        * <span style="color:#DD4814">T</span> : array-like of shape (n_samples, n_classes)
+            * The probability of the sample for each class in the model
+</div>
+
+
+<style>
+.seealso {
+  background-color: #F6B302;
+  color: black;
+  border: 2px solid black;
+  margin: 20px;
+  padding: 20px;
+} 
+
+l{color: darkgreen}
+</style>
+
+
+<div class="seealso">
+**See also:**<br>
+[__Adaptation DEBIAS-M Classifier Demo__](AdaptationDebiasMClassifier-demo.html)<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
+An example analysis using the Adaptation DEBIAS-M classifier<br>
+[__DebiasMClassifierLogAdd__](DebiasMClassifierLogAdd.html)<br>
+&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
+Implementation of a DEBIAS-M regressor<br>
+<br>
+For more background on DEBIAS-M, refer to the scikit-learn to [our manuscript](https://www.biorxiv.org/content/10.1101/2024.02.09.579716v1). 
+