Skip to content

11. Investigation of Filtering of Possible Paralogs

George Pacheco edited this page Jul 27, 2021 · 3 revisions

We continued our investigation regarding the potential paralogs.

Gets list of samples:

ALL GOOD SAMPLES with ALL the WGS_GBS_WGS-GBS Trios (257 SAMPLES /// 184 GBS 50 WGS & 23 WGS-GBS):

find ~/data/Pigeons/Analysis/PaleoMix_GBS/*.bam ~/data/Pigeons/Analysis/PaleoMix_Re-Sequencing/*.bam ~/data/Pigeons/Analysis/Samtools_WGS-GBS/*.bam | grep -f ~/data/Pigeons/Analysis/Lists/ALL_Re-Seqed-GBSBreedPlates--Article.list | grep -v -f ~/data/Pigeons/PBGP/PBGP--Analyses/Lists/PBGP--BadSamples--Article.list > ~/data/Pigeons/PBGP/PBGP--Analyses/Lists/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.list
Performs an ANGSD pre-run in order to better investigate the filtering of POSSIBLE PARALOG LOCI:
xsbatch -c 64 --mem-per-cpu 7500 -J pptPBGP --time 10-00 --force -- $SCRIPTS/scripts/wrapper_angsd.sh -debug 2 -nThreads 64 -ref ~/data/Pigeons/Reference/DanishTumbler_Dovetail_ReRun.fasta -bam ~/data/Pigeons/PBGP/PBGP--Analyses/Lists/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.list -sites ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended_Merged.pos -rf ~/data/Pigeons/Reference/DanishTumbler_Dovetail_ReRun_ChrGreater1kb.id -remove_bads 1 -uniqueOnly 1 -baq 1 -C 50 -minMapQ 30 -minQ 20 -minInd $((257*95/100)) -doCounts 1 -dumpCounts 2 -maxDepth $((257*1000)) -out ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PossibleParalogTest/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.depth
Creates a .mean file containing the average Global Depth of each outputted LOCI:
zcat ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PossibleParalogTest/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.depth.pos.gz | awk 'NR>1 {print $1"\t"$2-1"\t"$2"\t"$3}' | bedtools intersect -a - -b ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended_Merged.bed -wb | bedtools groupby -g 8 -c 4 -o mean > ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PossibleParalogTest/PBGP--GoodSamples_WithAllWGS-GBSPairs_95Ind_ParalogTest_IntersectedWithMerged--Article--Ultra.mean
Extracts those LOCI that were flagged as POSSIBLE PARALOGS:
fgrep -f ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PossibleParalogTest/PBGP--PossibleParalogLociToBeEliminated-g800--Article--Ultra.list ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PossibleParalogTest/PBGP--GoodSamples_WithAllWGS-GBSPairs_95Ind_ParalogTest_IntersectedWithMerged--Article--Ultra.mean | awk '{print $1"\t"$2-1"\t"$2"\t"$3}' > ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PossibleParalogTest/PBGP--GoodSamples_WithAllWGS-GBSPairs_95Ind_ParalogTest_IntersectedWithMerged_PossibleParalogs-g800--Article--Ultra.mean
Extracts those LOCI that were NOT flagged as POSSIBLE PARALOGS:
fgrep -v -f ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PossibleParalogTest/PBGP--PossibleParalogLociToBeEliminated-g800--Article--Ultra.list ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PossibleParalogTest/PBGP--GoodSamples_WithAllWGS-GBSPairs_95Ind_ParalogTest_IntersectedWithMerged--Article--Ultra.mean | awk '{print $1"\t"$2-1"\t"$2"\t"$3}' > ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PossibleParalogTest/PBGP--GoodSamples_WithAllWGS-GBSPairs_95Ind_ParalogTest_IntersectedWithMerged_WithoutPossibleParalogs-g800--Article--Ultra.mean
These results were plotted using the Rscript below:

Clone this wiki locally