-
Notifications
You must be signed in to change notification settings - Fork 1
5. Filtering For Chimeric Reads
George Pacheco edited this page Jul 21, 2021
·
1 revision
We noticed that some GBS reads seemed to be chimeric ones, which means that some reads would actually be the product of the merge of the two or more biological cutsites into one single read. Even though we cannot fully explain the precise biochemistry behind this methodological issue, we decided to be conservative and exclude all the reads that presented this kind of signal.
Executes an initial PaleoMix run with the original GBSed demultiplexed files in order to be able to identify the chimeric reads. We used the .yaml
file below and respective command:
xsbatch -c XXX --mem-per-cpu XXX -J PaleoMix --time XXX -- bam_pipeline run --jre-option "-XmxXXXg" --max-threads XXX --bwa-max-threads XXX --adapterremoval-max-threads XXX --destination ~/data/Pigeons/Analysis/PaleoMix_GBS_BEFORE-FILTEREDCHIMERAS/ ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--Final_PaleoMix_GBS_BEFORE-FILTEREDCHIMERAS.yaml
Generates an ID file for each sample contained the reads that should be excluded. Those are reads having a second or more cut-site and that were mapped to two or more different regions:
parallel --plus --keep-order --dryrun "samtools view {} | grep -v '^#' | awk '\$6~/[HS]/ && \$10~/ATGCAT/{print \$1}' | sort -u > $TMP_DIR/{/...}.Chimeras.id" ::: ~/data/Projects/PGP/AfterChrGenome/AfterChrGenome_PGP--Analyses/AfterChrGenome_PGP--PaleoMix_BeforeFilteredChimeras/*.bam > /groups/hologenomics/pacheco/PBG--AfterChrGenome_ToCreateChimeraIDs.txt
chmod 755 ./PBG--AfterChrGenome_ToCreateChimeraIDs.txt
xsbatch -c 1 -R --max-array-jobs 15 --mem-per-cpu 2500 --time 1-00 -- ./PBG--AfterChrGenome_ToCreateChimeraIDs.txt
Excludes these identified reads using the software package QIIME--v1.9.1. A filtered .fastq
file is created inside the respective folders of each original demultiplexed file.
module load blast/v2.2.26
module load qiime/v1.9.1
ls ~/data/Pigeons/GBS/FPGP_*/*_Demultiplexed_GBSX--v1.3/*_!(*Undetermined).fastq.gz | parallel --plus --keep-order --dryrun "zcat {} > {.} && filter_fasta.py -f {.} -o {..}.FilteredChimeras.fastq -s $TMP_DIR/{/...}-GBS.Chimeras.id -n && gzip --best {..}.FilteredChimeras.fastq && rm {.}" | xsbatch --mem-per-cpu XXX -R --max-array-jobs XXX -c 1 --time XXX --
- 1. Data Access
- 2. Sequencing Quality Check
- 3. Demultiplexing
- 4. Creation of Mapping Targets
- 5. Filtering For Chimeric Reads
- 6. GBS Sexing
- 7. Read Processing & Mapping
- 8. Running Stats & Filtering of Bad Samples
- 9. Filtering of Possible Paralogs
- 10. Merging of Duplicate Cases
- 11. Investigation of Filtering of Possible Paralogs
- 12. Creation of Specific Datasets
- 13. Loci Information
- 14. Heterozygosity Calculation
- 15. Population Genetics Statistics
- 16. Phylogenetic Reconstruction
- 17. Multidimensional Scaling
- 18. Estimation of Individual Ancestries
- 19. Inference of Population Splits
- 20. Measuring of Linkage Disequilibrium
- 21. GWAS