This repository is part of a research project on the genomic architecture of symbiont-conferred resistance in Lysiphlebus fabarum. The purpose of this repository is to provide a reproducible analysis pipeline for for this project. A version of it will be available on Dryad and one is available on GitHub.
The scripts of these analyses are optimized to run on the ETH Euler cluster. Reproducing the analysis on Euler should therefore work smoothly, while adaptation for other platforms may require modification of the scripts. All scripts assume that your current directory is Lfab_QTL.
To reproduce these analyses do the following:
- Copy the Lfab_QTL directory to a system where you want to reproduce the analyses (preferentially Euler or a similar HPC cluster).
- Make sure you have the necessary raw data available in ./data and/or ./results (For an overview of these files, see Required files).
- Run the analyses by follwing the steps described in the README.md files (For the correct order of README.md files, see Manuals to follow)
The analysis can only be reproduced if the required raw data are available in an additional directory within Lfab_QTL. The multiplexed raw reads that would be used to reproduce the demuliplexing are not publicly available. The following list includes all raw data that can be used to reproduce analyses. Some are not publicly available.
- ./data/dominance_first_gen.txt (raw data from the first experiment on dominance relationships)
- ./data/dominance_second_gen.txt (raw data from the first experiment on dominance relationships)
- ./data/barcodes/barcodes_pool_*.txt (12 files with barcodes for each sequenced individual, used for demultiplexing)
- ./data/crossing_data/*.txt (raw data from the second experiment, 4 files with information on wasp crossing for QTL mapping)
- ./results/demultiplexed/*.fq.gz (These 384 files of demultiplexed Illumina raw reads from 384 wasp samples are available under PRJEB39724 in the European Nucleotide Archive)
- ./data/rawreads/20191118.A-GU_ddRAD_ID_*.fastq.gz (These 24 files of undemultiplexed Illumina raw reads from 384 wasp samples are only necessary to reproduce the demultiplexing)
- ./data/genome/regions/split.freebayes.regions.file.pl (script provided by the GDC at ETH, used to generate genomic regions prior to SNP calling)
- ./data/01.Linkage_groups.txt (additional file 3 from Dennis et al. (2020), in tab-separated txt-file format)
- ./data/genome/Lf_genome_V1.0.fa (reference genome described in Dennis et al. (2020) and available on bipaa)
- ./data/waspbase/OGS1.0_20170110.gff3 (annotations from bipaa)
- ./data/genome/regions/fasta_generate_regions.py (script from the freebayes distribution (Garrison and Marth 2012) available on GitHub)
The following files (results) are additionally included to aid reproducibility and use of the data.
- ./results/QTLanalysis/linkagemap_cM.txt (linkage map / distance matrix with mapping unit in cM)
- ./results/QTLanalysis/Rqtlin_final.csv (table with genotypes and phenotypes used for the QTL analysis with the script ./analysis/2-8_QTLanalysis/02_QTL_mapping.R)
- ./results/genesearch/candidate_genes.txt (table with genes in the candidate region)
- ./results/SNPcall/raw.vcf.gz (unfiltered vcf file, only included in the Dryad version)
- ./results/geno_error/individual_genotype_comparisons.txt (summary file of genotyping error)
The descriptions in the following README.md files should be executed in the following order:
- ./analysis/1-1_dominance/README.md
- ./analysis/2-1_demultiplex/README.md (Can be skipped. Requires undemultiplexed raw reads which are not publicly available)
- ./analysis/2-2_mapping/README.md
- ./analysis/2-3_SNPcall/README.md
- ./analysis/2-4_SNPfilter/README.md (Possible to start here, when the ./results/SNPcall/raw.vcf.gz is available)
- ./analysis/2-5_linkagemap/README.md
- ./analysis/2-6_prepQTL/README.md
- ./analysis/2-7_geno_error/README.md
- ./analysis/2-8_QTLanalysis/README.md (Possible to start here)
- ./analysis/2-9_genesearch/README.md
By follwing the descriptions in the README.md files, results will be created. They will be located in a ./results directory.