From 9699f0b88b11cf91d6079c2353caa4ca20fb9cd8 Mon Sep 17 00:00:00 2001 From: Arya Massarat <23412689+aryarm@users.noreply.github.com> Date: Tue, 27 Jul 2021 12:29:51 -0700 Subject: [PATCH 1/4] add citation information to the main README --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 8e06620..c54dd12 100644 --- a/README.md +++ b/README.md @@ -86,3 +86,6 @@ Various scripts used by the pipeline. See the [script README](scripts/README.md) ### [run.bash](run.bash) An example bash script for executing the pipeline using `snakemake` and `conda`. Any arguments to this script are passed directly to `snakemake`. + +# citation +> Massarat, A. R., Sen, A., Jaureguy, J., Tyndale, S. T., Fu, Y., Erikson, G., & McVicker, G. (2021). Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq. Nucleic Acids Research, gkab621. https://doi.org/10.1093/nar/gkab621 From 71499d7156ba798505e7d57bda4f857031ec5b8f Mon Sep 17 00:00:00 2001 From: Arya Massarat <23412689+aryarm@users.noreply.github.com> Date: Wed, 28 Jul 2021 13:53:55 -0700 Subject: [PATCH 2/4] add citation file --- CITATION.cff | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) create mode 100644 CITATION.cff diff --git a/CITATION.cff b/CITATION.cff new file mode 100644 index 0000000..a1ab90a --- /dev/null +++ b/CITATION.cff @@ -0,0 +1,52 @@ +# YAML 1.2 +--- +abstract: "Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulatory sequences and could potentially be used as a low-cost ‘capture’ method for regulatory variant discovery, but its use for this purpose has not been systematically evaluated. Here we apply seven variant callers to bulk and single-cell ATAC-seq data and evaluate their ability to identify single nucleotide variants (SNVs) and insertions/deletions (indels). In addition, we develop an ensemble classifier, VarCA, which combines features from individual variant callers to predict variants. The Genome Analysis Toolkit (GATK) is the best-performing individual caller with precision/recall on a bulk ATAC test dataset of 0.92/0.97 for SNVs and 0.87/0.82 for indels within ATAC-seq peak regions with at least 10 reads. On bulk ATAC-seq reads, VarCA achieves superior performance with precision/recall of 0.99/0.95 for SNVs and 0.93/0.80 for indels. On single-cell ATAC-seq reads, VarCA attains precision/recall of 0.98/0.94 for SNVs and 0.82/0.82 for indels. In summary, ATAC-seq reads can be used to accurately discover non-coding regulatory variants in the absence of whole-genome sequencing data and our ensemble method, VarCA, has the best overall performance." +authors: + - + affiliation: "Bioinformatics and Systems Biology Graduate Program, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA" + family-names: Massarat + given-names: Arya + orcid: "https://orcid.org/0000-0002-3679-0345" + - + affiliation: "Integrative Biology Laboratory, Salk Institute for Biological Studies, 10010 N. Torrey Pines Road, La Jolla, CA 92037, USA" + family-names: Sen + given-names: Arko + orcid: "https://orcid.org/0000-0001-9876-281X" + - + affiliation: "Bioinformatics and Systems Biology Graduate Program, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA" + family-names: Jaureguy + given-names: Jeff + orcid: "https://orcid.org/0000-0002-6303-422X" + - + affiliation: "Integrative Biology Laboratory, Salk Institute for Biological Studies, 10010 N. Torrey Pines Road, La Jolla, CA 92037, USA" + family-names: Tyndale + given-names: "Sélène" + orcid: "https://orcid.org/0000-0001-9805-1049" + - + affiliation: "Razavi Newman Integrative Genomics and Bioinformatics Core, Salk Institute for Biological Studies, 10010 N. Torrey Pines Road, La Jolla, CA 92037, USA" + family-names: Fu + given-names: Yi + - + affiliation: "Razavi Newman Integrative Genomics and Bioinformatics Core, Salk Institute for Biological Studies, 10010 N. Torrey Pines Road, La Jolla, CA 92037, USA" + family-names: Erikson + given-names: Galina + - + affiliation: "Integrative Biology Laboratory, Salk Institute for Biological Studies, 10010 N. Torrey Pines Road, La Jolla, CA 92037, USA" + family-names: McVicker + given-names: Graham + orcid: "https://orcid.org/0000-0003-0991-0951" +cff-version: "1.1.0" +date-released: 2021-07-21 +doi: "10.1093/nar/gkab621" +identifiers: + - + type: doi + value: "10.1093/nar/gkab621" + - + type: url + value: "https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkab621/6329114" +license: MIT +message: "If you use this software, please cite it using these metadata." +repository-code: "https://github.com/aryarm/varCA" +title: "Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq" +... \ No newline at end of file From b5c53c6a7a0346b482c8144f65223184ad4a5e99 Mon Sep 17 00:00:00 2001 From: Arya Massarat <23412689+aryarm@users.noreply.github.com> Date: Wed, 28 Jul 2021 13:58:36 -0700 Subject: [PATCH 3/4] add version to citation file --- CITATION.cff | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/CITATION.cff b/CITATION.cff index a1ab90a..eeca4ba 100644 --- a/CITATION.cff +++ b/CITATION.cff @@ -49,4 +49,5 @@ license: MIT message: "If you use this software, please cite it using these metadata." repository-code: "https://github.com/aryarm/varCA" title: "Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq" -... \ No newline at end of file +version: "v0.3.1" +... From c08fb70b332ce9f05a16297d87a30f9ace52e804 Mon Sep 17 00:00:00 2001 From: Arya Massarat <23412689+aryarm@users.noreply.github.com> Date: Wed, 28 Jul 2021 14:26:33 -0700 Subject: [PATCH 4/4] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index c54dd12..d4f3677 100644 --- a/README.md +++ b/README.md @@ -88,4 +88,5 @@ Various scripts used by the pipeline. See the [script README](scripts/README.md) An example bash script for executing the pipeline using `snakemake` and `conda`. Any arguments to this script are passed directly to `snakemake`. # citation +There is an option to _"Cite this repository"_ on the right sidebar of [the repository homepage](https://github.com/aryarm/varCA). > Massarat, A. R., Sen, A., Jaureguy, J., Tyndale, S. T., Fu, Y., Erikson, G., & McVicker, G. (2021). Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq. Nucleic Acids Research, gkab621. https://doi.org/10.1093/nar/gkab621