-
Notifications
You must be signed in to change notification settings - Fork 1
9. Filtering of Possible Paralogs
George Pacheco edited this page Aug 4, 2021
·
3 revisions
Given the difficulties to remove possible paralogs from GBSed data, we decided to take advantage of our Re-Seqed data for doing so. Thus, we first run ANGSD--v0.929 on the complete Re-Reqed data in order to identify loci that would have a coverage considerably higher than average and then exclude these loci from the downstream analyses.
find ~/data/Pigeons/Analysis/PaleoMix_Re-Sequencing/*.bam > ~/data/Pigeons/PBGP/PBGP--Analyses/Lists/PBGP--AllRe-SeqedSamples--Article--Ultra.list
xsbatch -c 64 --mem-per-cpu 7800 -J ppAllRe-Seqed --time 10-00:00 --force -- $SCRIPTS/scripts/wrapper_angsd.sh -debug 2 -nThreads 64 -ref ~/data/Pigeons/Reference/DanishTumbler_Dovetail_ReRun.fasta -bam ~/data/Pigeons/PBGP/PBGP--Analyses/Lists/PBGP--GoodRe-SeqedSamples--Article.list -sites ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended_Merged.pos -rf ~/data/Pigeons/Reference/DanishTumbler_Dovetail_ReRun_ChrGreater1kb.id -remove_bads 1 -uniqueOnly 1 -baq 1 -C 50 -minMapQ 30 -minQ 20 -doQsDist 1 -doCounts 1 -doDepth 1 -dumpCounts 2 -maxDepth $((50*1000)) -out ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PossibleParalogTest/PBGP--OnlyRe-Seqed_RemovedBadLoci_100Ind_PossibleParalogs--Article--Ultra.depth
zcat ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PossibleParalogTest/PBGP--OnlyRe-Seqed_RemovedBadLoci_100Ind_PossibleParalogs--Article--Ultra.depth.pos.gz | awk 'NR>1 {print $1"\t"$2-1"\t"$2"\t"$3}' | bedtools intersect -a - -b ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended_Merged.bed -wb | bedtools groupby -g 8 -c 4 -o mean > ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PossibleParalogTest/PBGP--OnlyRe-Seqed_100Ind_PossibleParalogs_IntersectedWithMerged--Article--Ultra.mean
Fourth, we run the line below to create lists containing the POSSIBLE PARALOG LOCI to be eliminated following different cutoffs:
cat ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PossibleParalogTest/PBGP--OnlyRe-Seqed_100Ind_PossibleParalogs_IntersectedWithMerged--Article--Ultra.mean | awk '$2>800 {print $1}' > ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PossibleParalogTest/PBGP--PossibleParalogLociToBeEliminated-g800--Article--Ultra.list
Extracts from the "FPGP_FinalRun.EcoT22I_Extended_Merged_RemovedBadLoci.bed" file those POSSIBLE PARALOG LOCI, creating an updated .bed
file:
fgrep -v -f ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PossibleParalogTest/PBGP--PossibleParalogLociToBeEliminated-g800--Article--Ultra.list ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended_Merged.bed > ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended_Merged_RemovedPossibleParalogs-g800--Article--Ultra.bed
awk '{print $1"\t"($2+1)"\t"$3}' ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended_Merged_RemovedPossibleParalogs-g800--Article--Ultra.bed > ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended_Merged_RemovedPossibleParalogs-g800--Article--Ultra.pos
angsd sites index ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended_Merged_RemovedPossibleParalogs-g800--Article--Ultra.pos
cat ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended_Merged_RemovedPossibleParalogs-g800--Article--Ultra.pos | awk '{sum+=($3-$2)*100/1111661097} END {print sum"%"}'
- 1. Data Access
- 2. Sequencing Quality Check
- 3. Demultiplexing
- 4. Creation of Mapping Targets
- 5. Filtering For Chimeric Reads
- 6. GBS Sexing
- 7. Read Processing & Mapping
- 8. Running Stats & Filtering of Bad Samples
- 9. Filtering of Possible Paralogs
- 10. Merging of Duplicate Cases
- 11. Investigation of Filtering of Possible Paralogs
- 12. Creation of Specific Datasets
- 13. Loci Information
- 14. Heterozygosity Calculation
- 15. Population Genetics Statistics
- 16. Phylogenetic Reconstruction
- 17. Multidimensional Scaling
- 18. Estimation of Individual Ancestries
- 19. Inference of Population Splits
- 20. Measuring of Linkage Disequilibrium
- 21. GWAS