You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.Rmd
+32-8Lines changed: 32 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ knitr::opts_chunk$set(
19
19
20
20
## Overview
21
21
22
-
[The Covid-19 Genotyping Tool](https://hsmaan.shinyapps.io/CovidGenotyper/) (CGT) is an R-Shiny based web application that allows researchers to upload fasta sequences of Covid-19 viral genomes and compare with public sequence data available on [GISAID](https://www.gisaid.org/). Genomic distance is visualized using manifold projection and network analysis, and genotype information with respective to high-prevalence SNPs is determined.
22
+
[The Covid-19 Genotyping Tool](covidgenotyper.app) (CGT) is an R-Shiny based web application that allows researchers to upload fasta sequences of Covid-19 viral genomes and compare with public sequence data available on [GISAID](https://www.gisaid.org/). Genomic distance is visualized using manifold projection and network analysis, and genotype information with respective to high-prevalence SNPs is determined.
23
23
24
24
## Details and methodology
25
25
@@ -29,11 +29,11 @@ The CGT application was developed using the `shiny` R package and framework. Vis
29
29
30
30
#### Sequence and metadata retrieval
31
31
32
-
Processed fasta files of Covid-19 viral genome sequence are retrieved from the [GISAID](https://www.gisaid.org/) EpiCoV database, which is a public database for sharing of viral genome sequence data. Metadata for GISAID viral genomes are obtained from [nextstrain's ncov build](https://github.com/nextstrain/ncov/blob/master/data/metadata.tsv). Viral genome data and metadata are updated on a weekly basis.
32
+
Processed fasta files and metadata of Covid-19 viral genome sequence are retrieved from the [GISAID](https://www.gisaid.org/) EpiCoV database, which is a public database for sharing of viral genome sequence data. Viral genome data and metadata are updated on a weekly basis.
33
33
34
34
#### Genome sequence alignment
35
35
36
-
GISAID sequences are subset for those that have metadata from nextstrain. Public sequencing data is pre-aligned before being uploaded to the server. Fasta sequences are read and written using the `Biostrings` package. Gap removal and multiple-sequence alignment is performed using `DECIPHER`. Post alignment processing is done using `ape`. User uploaded fasta sequences are processed similarly, with the exception of complete alignment - the user sequence is aligned to the pre-aligned public data profile using `AlignProfiles` from `DECIPHER`.
36
+
GISAID sequences are subset for those that have corresponding metadata. Public sequencing data is pre-aligned before being uploaded to the server. Fasta sequences are read and written using the `Biostrings` package. Gap removal and multiple-sequence alignment is performed using `DECIPHER`. Post alignment processing is done using `ape`. User uploaded fasta sequences are processed similarly, with the exception of complete alignment - the user sequence is aligned to the pre-aligned public data profile using `AlignProfiles` from `DECIPHER`.
37
37
38
38
#### DNA distance
39
39
@@ -66,7 +66,7 @@ Genotype profiles of viral genomes are determined using high prevalence non-syno
66
66
67
67
CGT can also be installed locally. Application deployment has currently only been tested on Linux systems including Ubuntu 18.04 LTS and Debian 9.0 LTS, thus we only provide installation instructions for Debian/Ubuntu systems.
CGT relies on pre-processing plot data prior to deployment to ensure visualizations can be loaded quickly. Fasta sequences should be downloaded from [GISAID's EpiCoV database](https://www.gisaid.org/) and saved as `gisaid_cov2020_sequences_[mmm_dd].fasta` in the `data` folder. Metadata from [nextstrain's ncov repository](https://github.com/nextstrain/ncov) should be saved as `gisaid_metadata_[mmm_dd].tsv`, also in the `data` folder.
99
+
CGT relies on pre-processing plot data prior to deployment to ensure visualizations can be loaded quickly. Fasta sequences should be downloaded from [GISAID's EpiCoV database](https://www.gisaid.org/) and saved as `gisaid_cov2020_sequences_[mmm_dd].fasta` in the `data` folder. Metadata from GISAID should be saved as `gisaid_metadata_[mmm_dd].tsv`, also in the `data` folder.
100
100
101
101
The order for processing scripts is the following:
Now that the shiny application dependencies have been installed and data has been preloaded, the shiny app can be deployed in a variety of ways, documented [here](https://shiny.rstudio.com/deploy/).
115
115
@@ -146,6 +146,17 @@ docker run --rm cgt/app
146
146
* ggplot2 v3.3.0
147
147
* ggnetwork v0.5.8
148
148
* plotly v4.9.2.1
149
+
* Cairo v1.5.11
150
+
* intergraph v2.0.2
151
+
* tidyverse v1.3.0
152
+
* data.table v1.12.8
153
+
* stringr v1.4.0
154
+
* reshape2 v1.4.3
155
+
* dplyr v0.8.5
156
+
* parallel v3.6.3
157
+
* ggthemes v4.2.0
158
+
* RColorBrewer v1.1.2
159
+
* GenomicRanges v1.38.0
149
160
150
161
#### Command-line tools
151
162
@@ -155,7 +166,6 @@ docker run --rm cgt/app
155
166
156
167
## References
157
168
158
-
* Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, et al. NextStrain: Real-time tracking of pathogen evolution. Bioinformatics. 2018;34(23):4121–3.
159
169
* Elbe S, Buckland-Merrett G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Glob Challenges. 2017;1(1):33–46.
160
170
* Chang W, Cheng J, Allaire JJ, Xie Y, McPherson J. Shiny: Web Application Framework for R. R package version 1.4.0.2. 2020. Available from: https://CRAN.R-project.org/package=shiny
161
171
* Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
@@ -173,6 +183,20 @@ and shiny. Chapman and Hall/CRC Florida, 2020.
173
183
* Page AJ, Taylor B, Delaney AJ, Soares J, Seemann T, Keane JA, et al. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microb genomics. 2016;2(4):e000056.
174
184
* Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 2012.
175
185
* Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8.
186
+
* Simon Urbanek and Jeffrey Horner (2020). Cairo: R Graphics Device using Cairo Graphics
PostScript) and Display (X11 and Win32) Output. R package version 1.5-11.
189
+
https://CRAN.R-project.org/package=Cairo
190
+
* Bojanowski, Michal (2015) intergraph: Coercion Routines for Network Data Objects. R package version 2.0-2. http://mbojan.github.io/intergraph
191
+
* Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686
192
+
* Matt Dowle and Arun Srinivasan (2019). data.table: Extension of `data.frame`. R package version 1.12.8. https://CRAN.R-project.org/package=data.table
193
+
* Hadley Wickham (2019). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.0. https://CRAN.R-project.org/package=stringr
194
+
* Hadley Wickham (2007). Reshaping Data with the reshape Package. Journal of Statistical Software, 21(12), 1-20. URL http://www.jstatsoft.org/v21/i12/.
195
+
* Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2020). dplyr: A Grammar of Data Manipulation. R package version 0.8.5. https://CRAN.R-project.org/package=dplyr
196
+
* R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
197
+
* Jeffrey B. Arnold (2019). ggthemes: Extra Themes, Scales and Geoms for 'ggplot2'. R package version 4.2.0. https://CRAN.R-project.org/package=ggthemes
198
+
* Erich Neuwirth (2014). RColorBrewer: ColorBrewer Palettes. R package version 1.1-2. https://CRAN.R-project.org/package=RColorBrewer
199
+
* Lawrence M, Huber W, Pages H, Aboyoun P, Carlson M, et al. (2013) Software for Computing and Annotating Genomic Ranges. PLoS Comput Biol 9(8): e1003118. doi:10.1371/journal.pcbi.1003118
0 commit comments