-
Notifications
You must be signed in to change notification settings - Fork 1
8. Running Stats & Filtering of Bad Samples
George Pacheco edited this page Aug 4, 2021
·
7 revisions
We used the outputs from PaleoMix--v1.2.5 to create a summary file containing information on the mapping statistics of each sample. In addition, we used some scripts to create some heatmap plots to help in the identification of bad SAMPLES, and also create some auxiliary files based on these plots.
xsbatch -c 30 --mem-per-cpu 13000 -J HeatMap --time 5-00 -- "$SCRIPTS/scripts/paleomix_summary2tsv.sh -t 30 -n 10 -k 300 -i ~/data/Pigeons/PBGP/PBGP--Analyses/Lists/PBGP--AllSamples--Article.labels ~/data/Pigeons/Analysis/PaleoMix_Re-Sequencing/ ~/data/Pigeons/Analysis/PaleoMix_GBS/ > ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--CoverageHeatMap/Stats_PBGP--Article--Ultra.txt"
grep -v "WGS" ~/data/Pigeons/PBGP/FPGP--Analyses/PBGP--CoverageHeatMap/Loci_Merged.coverage.tsv | grep -v "Blank" | tail -n +2 | cut -f 2- | awk '{for(i=1; i<=NF; i++)x[i]+=$i} END{for(i in x)print x[i]}' > ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--CoverageHeatMap/Loci_Merged.coverage.cutsitesmath
awk '$1==0{cnt++} END{print cnt}' ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--CoverageHeatMap/Loci_Merged.coverage.cutsitesmath
We manually created a list containing SAMPLES to be excluded (6 BAD GBS SAMPLES and 2 BLANKS / highlighted on the Coverage HeatMap).
~/data/Pigeons/PBGP/PBGP--Analyses/Lists/PBGP--BadSamples--Article.list
awk '$2 > 1000 {print $1":"}' ~/data/Pigeons/Reference/DanishTumbler_Dovetail_ReRun.fasta.fai > ~/data/Pigeons/Reference/DanishTumbler_Dovetail_ReRun_ChrGreater1kb.id
- 1. Data Access
- 2. Sequencing Quality Check
- 3. Demultiplexing
- 4. Creation of Mapping Targets
- 5. Filtering For Chimeric Reads
- 6. GBS Sexing
- 7. Read Processing & Mapping
- 8. Running Stats & Filtering of Bad Samples
- 9. Filtering of Possible Paralogs
- 10. Merging of Duplicate Cases
- 11. Investigation of Filtering of Possible Paralogs
- 12. Creation of Specific Datasets
- 13. Loci Information
- 14. Heterozygosity Calculation
- 15. Population Genetics Statistics
- 16. Phylogenetic Reconstruction
- 17. Multidimensional Scaling
- 18. Estimation of Individual Ancestries
- 19. Inference of Population Splits
- 20. Measuring of Linkage Disequilibrium
- 21. GWAS