Skip to content

Commit

Permalink
Merge pull request #56 from Cristianetaniguti/joint_EmpiricalReads
Browse files Browse the repository at this point in the history
Joint empirical reads
  • Loading branch information
Cristianetaniguti authored Nov 29, 2022
2 parents f05da13 + b39a5e7 commit 09cc213
Show file tree
Hide file tree
Showing 7 changed files with 105 additions and 11 deletions.
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@

## Reads2Map

Reads2Map presents [WDL workflows](https://openwdl.org/) a collection of pipelines to build linkage maps from sequencing reads. Each pipeline release is described in the [Read2Map releases page](https://github.com/Cristianetaniguti/Reads2Map/releases).
Reads2Map presents a collection of [WDL workflows](https://openwdl.org/) to build linkage maps from sequencing reads. Each workflow release is described in the [Read2Map releases page](https://github.com/Cristianetaniguti/Reads2Map/releases).

The main workflows are the `EmpiricalSNPCalling.wdl`, the `EmpiricalMaps.wdl`, and the `SimulatedReads.wdl`. `EmpiricalSNPCalling.wdl` performs the SNP calling and `EmpiricalMaps.wdl` performs the genotype calling and map building in empirical reads. The `SimulatedReads.wdl` simulates Illumina reads for RADseq, exome, or WGS data and performs the SNP and genotype calling and genetic map building.
The main workflows are the `EmpiricalReads2Map.wdl` and the `SimulatedReads2Map.wdl`. The `EmpiricalReads2Map.wdl` is composed by the `EmpiricalSNPCalling.wdl` that performs the SNP calling, and the `EmpiricalMaps.wdl` that performs the genotype calling and map building in empirical reads. The `SimulatedReads2Map.wdl` simulates Illumina reads for RADseq, exome, or WGS data and performs the SNP and genotype calling and genetic map building.

By now, [GATK](https://github.com/broadinstitute/gatk), [Freebayes](https://github.com/ekg/freebayes) are included for SNP calling; [updog](https://github.com/dcgerard/updog), [polyRAD](https://github.com/lvclark/polyRAD), [SuperMASSA](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030906) for dosage calling; and [OneMap](https://github.com/augusto-garcia/onemap), and [GUSMap](https://github.com/tpbilton/GUSMap) for linkage map build.

Expand All @@ -23,19 +23,19 @@ Check the description of the inputs for the pipelines:

* [EmpiricalReads2Map (EmpiricalSNPCalling and EmpiricalMaps)](https://cristianetaniguti.github.io/Tutorials/Reads2Map/EmpiricalReads.html)

* [SimulatedReads](https://cristianetaniguti.github.io/Tutorials/Reads2Map/simulatedreads.html)
* [SimulatedReads2Map](https://cristianetaniguti.github.io/Tutorials/Reads2Map/simulatedreads.html)

Check how to evaluate the workflows results in Reads2MapApp Shiny:

* [Reads2MapApp](https://github.com/Cristianetaniguti/Reads2MapApp)

Once you selected the best pipeline using a subset of your data, you can build a complete high-density linkage map:

* [Quick Guide to build High-Density Linkage Maps](https://cristianetaniguti.github.io/Tutorials/onemap/Quick_HighDens/High_density_maps.html)
* [A Guide to Build High-Density Linkage Maps](https://cristianetaniguti.github.io/Tutorials/onemap/Quick_HighDens/High_density_maps.html)

Check more information and examples of usage in:

* [Taniguti, C. H., Taniguti, L. M., Amadeu, R. R., Mollinari, M., Da, G., Pereira, S., Riera-Lizarazu, O., Lau, J., Byrne, D., de Siqueira Gesteira, G., De, T., Oliveira, P., Ferreira, G. C., & Franco Garcia, A. A. Developing best practices for genotyping-by-sequencing analysis using linkage maps as benchmarks. BioRxiv. https://doi.org/10.1101/2022.11.24.517847](https://www.biorxiv.org/content/10.1101/2022.11.24.517847v1)
* [Taniguti, C. H., Taniguti, L. M., Amadeu, R. R., Mollinari, M., Da, G., Pereira, S., Riera-Lizarazu, O., Lau, J., Byrne, D., de Siqueira Gesteira, G., De, T., Oliveira, P., Ferreira, G. C., & Franco Garcia, A. A. Developing best practices for genotyping-by-sequencing analysis using linkage maps as benchmarks. BioRxiv. https://doi.org/10.1101/2022.11.24.517847](https://www.biorxiv.org/content/10.1101/2022.11.24.517847v2)

## Third-party software and images

Expand All @@ -53,6 +53,7 @@ Check more information and examples of usage in:
- [SuperMASSA](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030906) in [cristaniguti/reads2map:0.0.1](https://hub.docker.com/repository/docker/cristaniguti/reads2map): Efficient Exact Maximum a Posteriori Computation for Bayesian SNP Genotyping in Polyploids;
- [bcftools](https://github.com/samtools/bcftools) in [lifebitai/bcftools:1.10.2](https://hub.docker.com/r/lifebitai/bcftools): utilities for variant calling and manipulating VCFs and BCFs;
- [vcftools](http://vcftools.sourceforge.net/) in [cristaniguti/split_markers:0.0.1](https://hub.docker.com/repository/docker/cristaniguti/split_markers): program package designed for working with VCF files.
- [MCHap](https://github.com/PlantandFoodResearch/MCHap) in [cristaniguti/mchap:0.7.0](https://hub.docker.com/repository/docker/cristaniguti/mchap): Polyploid micro-haplotype assembly using Markov chain Monte Carlo simulation.

### R packages

Expand Down
8 changes: 4 additions & 4 deletions pipelines/EmpiricalMaps/EmpiricalMaps.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@ workflow Maps {
Dataset dataset
File? gatk_vcf_multi
String gatk_mchap
File gatk_vcf
File freebayes_vcf
File gatk_vcf_bam_counts
File freebayes_vcf_bam_counts
File? gatk_vcf
File? freebayes_vcf
File? gatk_vcf_bam_counts
File? freebayes_vcf_bam_counts
String? filters
Int max_cores
}
Expand Down
32 changes: 32 additions & 0 deletions pipelines/EmpiricalReads2Map/EmpiricalReads2Map.changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# 1.0.0

Initial release

This workflow build linkage maps from genotyping-by-sequencing (GBS) data. The GBS samples are splitted into chunks to be run in different nodes and optimize the analyses. Set the number of samples by chunk in the 'chunk_size' input. Use 'max_cores' to define number of cores to be used in each node. The workflow runs the combinations:

* SNP calling: GATK and Freebayes
* Dosage/genotype calling: updog, polyRAD and SuperMASSA
* Linkage map build software: OneMap 3.0 and GUSMap
* Using genotype probabilities from GATK, Freebayes, updog, polyRAD and SuperMASSA, and a global error rate of 5% and 0.001% in the OneMap HMM.

Resulting in 34 linkage maps.

The workflow include de options to:

* Remove or not the read duplicates
* Perform the Hard Filtering in GATK results
* Replace the VCF AD format field by counts from BAM files
* Run MCHap software to build haplotypes based on GATK called markers
* Include or not multiallelic (MNP) markers
* Apply filters using VCFtools

This workflow requires:

* Diploid bi-parental F1 population
* Single-end reads
* A reference genome
* Genomic positions for markers order
* Selection of a single chromosome from a reference genome to build the linkage maps



61 changes: 61 additions & 0 deletions pipelines/EmpiricalReads2Map/EmpiricalReads2Map.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
version 1.0

import "../../structs/empirical_maps_structs.wdl"
import "../../structs/dna_seq_structs.wdl"

import "../../pipelines/EmpiricalSNPCalling/EmpiricalSNPCalling.wdl" as snpcalling
import "../../pipelines/EmpiricalMaps/EmpiricalMaps.wdl" as maps

workflow EmpiricalReads {

input {
File samples_info
ReferenceFasta references
Dataset dataset
Int max_cores
Int chunk_size
Boolean rm_dupli = true
Boolean gatk_mchap = false
Boolean hardfilters = true
Boolean replaceAD = true
Boolean run_gatk = true
Boolean run_freebayes = true
Int ploidy = 2
Int n_chrom
String? filters
}

call snpcalling.SNPCalling {
input:
samples_info = samples_info,
references = references,
max_cores = max_cores,
rm_dupli = rm_dupli,
P1 = dataset.parent1,
P2 = dataset.parent2,
gatk_mchap = gatk_mchap,
hardfilters = hardfilters,
replaceAD = replaceAD,
run_gatk = run_gatk,
run_freebayes = run_freebayes,
ploidy = ploidy,
n_chrom = n_chrom
}

call maps.Maps {
input:
dataset = dataset,
gatk_vcf_multi = SNPCalling.gatk_multi_vcf,
gatk_mchap = gatk_mchap,
gatk_vcf = SNPCalling.gatk_vcf,
freebayes_vcf = SNPCalling.freebayes_vcf,
gatk_vcf_bam_counts = SNPCalling.gatk_vcf_bam_count,
freebayes_vcf_bam_counts = SNPCalling.freebayes_vcf_bam_count,
filters = filters,
max_cores = max_cores
}

output {
File EmpiricalReads_results = Maps.EmpiricalReads_results
}
}
File renamed without changes.
4 changes: 2 additions & 2 deletions tasks/utils.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,8 @@ task GenerateSampleNames { # TODO: probably a name like 'ReadSamplesNamesInVcf'
task ApplyRandomFilters {
input{
File gatk_vcf
File freebayes_vcf
File? gatk_vcf
File? freebayes_vcf
File? gatk_vcf_bam_counts
File? freebayes_vcf_bam_counts
String? filters
Expand Down

0 comments on commit 09cc213

Please sign in to comment.