Skip to content

Commit

Permalink
documentation updates
Browse files Browse the repository at this point in the history
  • Loading branch information
kishwarshafin committed Mar 16, 2022
1 parent 773fceb commit e77f54f
Show file tree
Hide file tree
Showing 10 changed files with 23 additions and 16 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -34,20 +34,20 @@ run_pepper_margin_deepvariant call_variant \

In all stratified coverages `(30x, 60x, 85x)`, PEPPER-Margin-DeepVariant r0.8 shows increased accuracy:
<p align="center">
<img src="img/pepper_r8_ont_HG003_wgs.png" alt="PEPPER performance whole genome">
<img src="img/pmdv_r8_ont_HG003_wgs.png" alt="PEPPER performance whole genome">
</p>

##### HG003 30x performance:
<p align="center">
<table><thead><tr><th>Sample</th><th>Version</th><th>Type</th><th>Truth<br>total</th><th>True<br>positives</th><th>False<br>negatives</th><th>False<br>positives</th><th>Recall</th><th>Precision</th><th>F1-Score</th></tr></thead><tbody><tr><td rowspan="4">HG003 30x</td><td rowspan="2">r0.7</td><td>INDEL</td><td>504501</td><td>317621</td><td>186880</td><td>35084</td><td>0.629575</td><td>0.902714</td><td>0.7418</td></tr><tr><td>SNP</td><td>3327495</td><td>3310002</td><td>17493</td><td>11986</td><td>0.994743</td><td>0.996393</td><td>0.995567</td></tr><tr><td rowspan="2">r0.8</td><td>INDEL</td><td>504501</td><td>337206</td><td>167295</td><td>53674</td><td>0.668395</td><td>0.865863</td><td>0.754422</td></tr><tr><td>SNP</td><td>3327495</td><td>3313043</td><td>14452</td><td>12451</td><td>0.995657</td><td>0.996257</td><td>0.995957</td></tr></tbody></table>
<table><thead><tr><th>Sample</th><th>Version</th><th>Type</th><th>Truth<br>total</th><th>True<br>positives</th><th>False<br>negatives</th><th>False<br>positives</th><th>Recall</th><th>Precision</th><th>F1-Score</th></tr></thead><tbody><tr><td rowspan="4">HG003 30x</td><td rowspan="2">r0.7</td><td>INDEL</td><td>504501</td><td>317621</td><td>186880</td><td>35084</td><td>0.629575</td><td>0.902714</td><td>0.7418</td></tr><tr><td>SNP</td><td>3327495</td><td>3310002</td><td>17493</td><td>11986</td><td>0.994743</td><td>0.996393</td><td>0.995567</td></tr><tr><td rowspan="2">r0.8</td><td>INDEL</td><td>504501</td><td>345384</td><td>159117</td><td>51842</td><td>0.684605</td><td>0.872481</td><td>0.767209</td></tr><tr><td>SNP</td><td>3327495</td><td>3309038</td><td>18457</td><td>9173</td><td>0.994453</td><td>0.997236</td><td>0.995843</td></tr></tbody></table>
</p>

##### HG003 60x performance:
<p align="center">
<table><thead><tr><th>Sample</th><th>Version</th><th>Type</th><th>Truth<br>total</th><th>True<br>positives</th><th>False<br>negatives</th><th>False<br>positives</th><th>Recall</th><th>Precision</th><th>F1-Score</th></tr></thead><tbody><tr><td rowspan="4">HG003 60x</td><td rowspan="2">r0.7</td><td>INDEL</td><td>504501</td><td>366144</td><td>138357</td><td>33484</td><td>0.725755</td><td>0.91827</td><td>0.810741</td></tr><tr><td>SNP</td><td>3327495</td><td>3317492</td><td>10003</td><td>8548</td><td>0.996994</td><td>0.99743</td><td>0.997212</td></tr><tr><td rowspan="2">r0.8</td><td>INDEL</td><td>504501</td><td>390595</td><td>113906</td><td>47118</td><td>0.77422</td><td>0.895066</td><td>0.830269</td></tr><tr><td>SNP</td><td>3327495</td><td>3318785</td><td>8710</td><td>9212</td><td>0.997382</td><td>0.997233</td><td>0.997308</td></tr></tbody></table>
<table><thead><tr><th>Sample</th><th>Version</th><th>Type</th><th>Truth<br>total</th><th>True<br>positives</th><th>False<br>negatives</th><th>False<br>positives</th><th>Recall</th><th>Precision</th><th>F1-Score</th></tr></thead><tbody><tr><td rowspan="4">HG003 60x</td><td rowspan="2">r0.7</td><td>INDEL</td><td>504501</td><td>366144</td><td>138357</td><td>33484</td><td>0.725755</td><td>0.91827</td><td>0.810741</td></tr><tr><td>SNP</td><td>3327495</td><td>3317492</td><td>10003</td><td>8548</td><td>0.996994</td><td>0.99743</td><td>0.997212</td></tr><tr><td rowspan="2">r0.8</td><td>INDEL</td><td>504501</td><td>394987</td><td>109514</td><td>44678</td><td>0.782926</td><td>0.90091</td><td>0.837785</td></tr><tr><td>SNP</td><td>3327495</td><td>3317515</td><td>9980</td><td>7120</td><td>0.997001</td><td>0.997859</td><td>0.99743</td></tr></tbody></table>
</p>

##### HG003 85x performance:
<p align="center">
<table><thead><tr><th>Sample</th><th>Version</th><th>Type</th><th>Truth<br>total</th><th>True<br>positives</th><th>False<br>negatives</th><th>False<br>positives</th><th>Recall</th><th>Precision</th><th>F1-Score</th></tr></thead><tbody><tr><td rowspan="4">HG003 85x</td><td rowspan="2">r0.7</td><td>INDEL</td><td>504501</td><td>383384</td><td>121117</td><td>30595</td><td>0.759927</td><td>0.927982</td><td>0.835588</td></tr><tr><td>SNP</td><td>3327495</td><td>3318437</td><td>9058</td><td>8032</td><td>0.997278</td><td>0.997586</td><td>0.997432</td></tr><tr><td rowspan="2">r0.8</td><td>INDEL</td><td>504501</td><td>409096</td><td>95405</td><td>40539</td><td>0.810892</td><td>0.912201</td><td>0.858568</td></tr><tr><td>SNP</td><td>3327495</td><td>3319475</td><td>8020</td><td>8449</td><td>0.99759</td><td>0.997462</td><td>0.997526</td></tr></tbody></table>
<table><thead><tr><th>Sample</th><th>Version</th><th>Type</th><th>Truth<br>total</th><th>True<br>positives</th><th>False<br>negatives</th><th>False<br>positives</th><th>Recall</th><th>Precision</th><th>F1-Score</th></tr></thead><tbody><tr><td rowspan="4">HG003 85x</td><td rowspan="2">r0.7</td><td>INDEL</td><td>504501</td><td>383384</td><td>121117</td><td>30595</td><td>0.759927</td><td>0.927982</td><td>0.835588</td></tr><tr><td>SNP</td><td>3327495</td><td>3318437</td><td>9058</td><td>8032</td><td>0.997278</td><td>0.997586</td><td>0.997432</td></tr><tr><td rowspan="2">r0.8</td><td>INDEL</td><td>504501</td><td>412169</td><td>92332</td><td>38633</td><td>0.816984</td><td>0.91651</td><td>0.86389</td></tr><tr><td>SNP</td><td>3327495</td><td>3318308</td><td>9187</td><td>6733</td><td>0.997239</td><td>0.997976</td><td>0.997607</td></tr></tbody></table>
</p>
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/pipeline_docker/ONT_variant_calling.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,8 +150,8 @@ ${OUTPUT_DIR}/${OUTPUT_VCF} \

| Type | Truth<br>total | True<br>positives | False<br>negatives | False<br>positives | Recall | Precision | F1-Score |
|:-----:|:--------------:|:-----------------:|:------------------:|:------------------:|:--------:|:---------:|:--------:|
| INDEL | 11256 | 8906 | 2350 | 904 | 0.791222 | 0.909942 | 0.846440 |
| SNP | 71333 | 71257 | 67 | 98 | 0.999061 | 0.998627 | 0.998844 |
| INDEL | 11256 | 8981 | 2275 | 837 | 0.797886 | 0.916692 | 0.853172 |
| SNP | 71333 | 71257 | 94 | 68 | 0.998682 | 0.999047 | 0.998864 |

### Authors:
This pipeline is developed in a collaboration between UCSC genomics institute and the genomics team at Google health.
4 changes: 2 additions & 2 deletions docs/pipeline_docker/ONT_variant_calling_r10_q20.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,8 +152,8 @@ ${OUTPUT_DIR}/${OUTPUT_VCF} \

| Type | Truth<br>total | True<br>positives | False<br>negatives | False<br>positives | Recall | Precision | F1-Score |
|:-----:|:--------------:|:-----------------:|:------------------:|:------------------:|:---------:|:---------:|:--------:|
| INDEL | 11256 | 9442 | 1779 | 821 | 0.841951 | 0.921973 | 0.880147 |
| SNP | 71333 | 71288 | 56 | 56 | 0.999215 | 0.999215 | 0.999215 |
| INDEL | 11256 | 9442 | 1724 | 774 | 0.846837 | 0.926517 | 0.884887 |
| SNP | 71333 | 71288 | 60 | 51 | 0.999159 | 0.999285 | 0.999222 |

### Authors:
This pipeline is developed in a collaboration between UCSC genomics institute and the genomics team at Google health.
4 changes: 2 additions & 2 deletions docs/pipeline_docker_gpu/ONT_variant_calling_gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,8 +193,8 @@ ${OUTPUT_DIR}/${OUTPUT_VCF} \

| Type | Truth<br>total | True<br>positives | False<br>negatives | False<br>positives | Recall | Precision | F1-Score |
|:-----:|:--------------:|:-----------------:|:------------------:|:------------------:|:--------:|:---------:|:--------:|
| INDEL | 11256 | 8906 | 2350 | 904 | 0.791222 | 0.909942 | 0.846440 |
| SNP | 71333 | 71257 | 67 | 98 | 0.999061 | 0.998627 | 0.998844 |
| INDEL | 11256 | 8981 | 2275 | 837 | 0.797886 | 0.916692 | 0.853172 |
| SNP | 71333 | 71257 | 94 | 68 | 0.998682 | 0.999047 | 0.998864 |

### Authors:
This pipeline is developed in a collaboration between UCSC genomics institute and the genomics team at Google health.
4 changes: 2 additions & 2 deletions docs/pipeline_docker_gpu/ONT_variant_calling_r10_q20_gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,8 +193,8 @@ ${OUTPUT_DIR}/${OUTPUT_VCF} \

| Type | Truth<br>total | True<br>positives | False<br>negatives | False<br>positives | Recall | Precision | F1-Score |
|:-----:|:--------------:|:-----------------:|:------------------:|:------------------:|:---------:|:---------:|:--------:|
| INDEL | 11256 | 9442 | 1779 | 821 | 0.841951 | 0.921973 | 0.880147 |
| SNP | 71333 | 71288 | 56 | 56 | 0.999215 | 0.999215 | 0.999215 |
| INDEL | 11256 | 9442 | 1724 | 774 | 0.846837 | 0.926517 | 0.884887 |
| SNP | 71333 | 71288 | 60 | 51 | 0.999159 | 0.999285 | 0.999222 |

### Authors:
This pipeline is developed in a collaboration between UCSC genomics institute and the genomics team at Google health.
4 changes: 2 additions & 2 deletions docs/pipeline_singularity/ONT_variant_calling_singularity.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,8 +148,8 @@ ${OUTPUT_DIR}/${OUTPUT_VCF} \

| Type | Truth<br>total | True<br>positives | False<br>negatives | False<br>positives | Recall | Precision | F1-Score |
|:-----:|:--------------:|:-----------------:|:------------------:|:------------------:|:--------:|:---------:|:--------:|
| INDEL | 11256 | 8906 | 2350 | 904 | 0.791222 | 0.909942 | 0.846440 |
| SNP | 71333 | 71257 | 67 | 98 | 0.999061 | 0.998627 | 0.998844 |
| INDEL | 11256 | 8981 | 2275 | 837 | 0.797886 | 0.916692 | 0.853172 |
| SNP | 71333 | 71257 | 94 | 68 | 0.998682 | 0.999047 | 0.998864 |

### Authors:
This pipeline is developed in a collaboration between UCSC genomics institute and the genomics team at Google health.
Original file line number Diff line number Diff line change
Expand Up @@ -149,8 +149,8 @@ ${OUTPUT_DIR}/${OUTPUT_VCF} \

| Type | Truth<br>total | True<br>positives | False<br>negatives | False<br>positives | Recall | Precision | F1-Score |
|:-----:|:--------------:|:-----------------:|:------------------:|:------------------:|:---------:|:---------:|:--------:|
| INDEL | 11256 | 9442 | 1779 | 821 | 0.841951 | 0.921973 | 0.880147 |
| SNP | 71333 | 71288 | 56 | 56 | 0.999215 | 0.999215 | 0.999215 |
| INDEL | 11256 | 9442 | 1724 | 774 | 0.846837 | 0.926517 | 0.884887 |
| SNP | 71333 | 71288 | 60 | 51 | 0.999159 | 0.999285 | 0.999222 |

### Authors:
This pipeline is developed in a collaboration between UCSC genomics institute and the genomics team at Google health.
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ In this walkthrough, we will see how to train `DeepVariant` and replace default

**NOTE:** This tutorial goes through the basic steps required for training DeepVariant. However, there are many more options available to ease the training process of DeepVariant. [This documentation](https://github.com/google/deepvariant/blob/r1.3/docs/deepvariant-training-case-study.md) has a much more detailed explanation of the training process. If you have followed the official tutorial then you can use the `make_examples` stage from this tutorial and the rest of the steps should be exactly the same.

## Setup
In this setup we train one model with `--alt_aligned_pileup "diff_channels"` for calling SNPs and INDELs. However, you can train a `rows` model for INDEL calling and `none` model for SNP calling.

To achieve two model setup, please repeat the DeepVariant training step with `--alt_aligned_pileup "none"` and `--alt_aligned_pileup "rows"` respectively.
Once you have the models, you can then use `--dv_model_snp <model_path>` and `--dv_model_indel <model_path>` to specify the model path for SNP and INDEL
independently. You can also select the alt alignment by `--dv_alt_aligned_pileup_snp` or `--dv_alt_aligned_pileup_indel` where `alt_aligned_pileup` can be `diff_channels`, `none` or `rows`.

## Training DeepVariant
We will now train `DeepVariant` that we use to genotype the candidates proposed by `PEPPER`.

Expand Down

0 comments on commit e77f54f

Please sign in to comment.