-
Notifications
You must be signed in to change notification settings - Fork 1
13. Loci Information
George Pacheco edited this page Aug 4, 2021
·
3 revisions
We calculated some statistics based on both SITES (
Dataset I
) & SNPs (Dataset II
).
zcat /groups/hologenomics/pacheco/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.mafs.gz | tail -n +2 | sort -u -k 1,1 | wc -l
zcat /groups/hologenomics/pacheco/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.mafs.gz | tail -n +2 | cut -f1 | sort | uniq -c | awk '{print $2"\t"$1}' | sort -n -k 2,2 > ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--Miscellaneous/SNPInfo/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.SITESDensity.txt
awk 'BEGIN{OFS="\t"} NR==FNR{x[$1]=$2} NR!=FNR && $2>1000{if(!x[$1])x[$1]=0; print $1,$2,x[$1]}' ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--Miscellaneous/SNPInfo/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.SITESDensity.txt ~/data/Pigeons/Reference/DanishTumbler_Dovetail_ReRun.fasta.fai | sort -n -k 2,2 > ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--Miscellaneous/SNPInfo/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.ScaffoldInfo.txt
awk '{if ($3!=0) print;}' ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--Miscellaneous/SNPInfo/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.ScaffoldInfo.txt > ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--Miscellaneous/SNPInfo/PBGP--GoodSamples_WithAllWGS-GBSPairs--Article--Ultra.ScaffoldInfo_OnlyWithSites.txt
zcat ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PBGP--GoodSamples_WithWGSs_NoCrupestris_SNPCalling--Article--Ultra.mafs.gz | cut -f1,2 | tail -n +2 | awk '{print $1"\t"$2-1"\t"$2}' | bedtools merge -i - | bedtools complement -i - -g ~/data/Pigeons/Reference/SamToolsIndex/DanishTumbler_Dovetail_ReRun.Cut.fasta.fai | sort -k 1,1r -k 2,2nr | awk '{sum+=($3-$2)} END {print "Average SNP Distance: " sum/NR}'
zcat ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PBGP--GoodSamples_WithWGSs_NoCrupestris_SNPCalling--Article--Ultra.mafs.gz | cut -f1,2 | tail -n +2 | awk '{print $1"\t"$2-1"\t"$2}' | bedtools merge -i - | bedtools complement -i - -g ~/data/Pigeons/Reference/SamToolsIndex/DanishTumbler_Dovetail_ReRun.Cut.fasta.fai | sort -k 1,1r -k 2,2nr | awk 'BEGIN{pre=""; safe=""}{if($1!=pre){safe=""}else{if(safe!=""){print safe}safe=$3-$2}pre=$1}' | # awk '{sum+=$1} END { print "Average = ",sum/NR}' # > ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--Miscellaneous/SNPInfo/PBGP--GoodSamples_WithWGSs_NoCrupestris_SNPCalling--Article--Ultra.SNPDistances.txt
zcat ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--ANGSDRuns/PBGP--GoodSamples_WithWGSs_NoCrupestris_SNPCalling--Article--Ultra.mafs.gz | tail -n +2 | awk '$1 == pc{print $1,$2-pp-1} {pc=$1; pp=$2}' > ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--Miscellaneous/SNPInfo/PBGP--GoodSamples_WithWGSs_NoCrupestris_SNPCalling--Article--Ultra.SNPDistances.txt
cat ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I.bed | awk '$1 == pc{print $1,$2-pp} {pc=$1; pp=$3}' > ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I--Article--Ultra.CutSiteDistances.txt
wc -l ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I--Article--Ultra.CutSiteDistances.txt
cat ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I--Article--Ultra.CutSiteDistances.txt | awk '{sum+=($2)}'
awk '{sum+=($2)} END {print "Average: " sum/NR}' ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I--Article--Ultra.CutSiteDistances.txt
awk '$2 > 500' ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I--Article--Ultra.CutSiteDistances.txt | wc -l
grep -v "WGS" Loci_Merged.coverage.tsv | grep -v "Blank" | tail -n +2 | cut -f 2- | awk '{for(i=1; i<=NF; i++)x[i]+=$i} END{for(i in x)print x[i]}'> ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--CoverageHeatMap/Loci_Merged.coverage.cutsitesmath
awk '$1==0{cnt++} END{print cnt}' ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--CoverageHeatMap/Loci_Merged.coverage.cutsitesmath
awk '$1==0{cnt++} END{print cnt}' ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--CoverageHeatMap/Loci_Merged.coverage.cutsitesmath
cat ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I.bed | awk '$1 == pc{print $1,$2-pp} {pc=$1; pp=$3}' > ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I--Article--Ultra.CutSiteDistances.txt
- 1. Data Access
- 2. Sequencing Quality Check
- 3. Demultiplexing
- 4. Creation of Mapping Targets
- 5. Filtering For Chimeric Reads
- 6. GBS Sexing
- 7. Read Processing & Mapping
- 8. Running Stats & Filtering of Bad Samples
- 9. Filtering of Possible Paralogs
- 10. Merging of Duplicate Cases
- 11. Investigation of Filtering of Possible Paralogs
- 12. Creation of Specific Datasets
- 13. Loci Information
- 14. Heterozygosity Calculation
- 15. Population Genetics Statistics
- 16. Phylogenetic Reconstruction
- 17. Multidimensional Scaling
- 18. Estimation of Individual Ancestries
- 19. Inference of Population Splits
- 20. Measuring of Linkage Disequilibrium
- 21. GWAS