Skip to content

6. GBS Sexing

George Pacheco edited this page Jul 27, 2021 · 2 revisions

We took advantage of the improved Pigeon Genome Cliv_2.0 to try to sex our pigeons based on differences of coverage regarding the Z chromosome and an autosomal chromosome of similar size.

Generates a .fasta file including just the Z chromosome:
samtools faidx GCA_001887795.1_colLiv2_genomic.fasta CM007524.1 > GCA_001887795.1_colLiv2_genomic_Z.fasta
Creates a .bed fie with our enzyme cutsites from the above .fasta file:
$SCRIPTS/appz/p5-bpwrapper/bin/bioseq --restrict-coord EcoT22I ~/data/Pigeons/Reference/GCA_001887795.1_colLiv2_genomic_Z.fasta > ~/data/Pigeons/Reference/GCA_001887795.1_colLiv2_genomic_Z.bed
Number of CUTSITES: 14,927
Maps this .fasta file against our reference genome:
xsbatch -c 15 --mem-per-cpu 2000 -J Sexing --time 1-00 -- "blastn -num_threads 15 -query ~/data/Pigeons/Reference/GCA_001887795.1_colLiv2_genomic_Z.fasta -db ~/data/Pigeons/Reference/DanishTumbler_Dovetail_ReRun.fasta -evalue 1e-5 -perc_identity 95 -outfmt '7 std slen sstrand' > ~/data/Pigeons/Reference/GCA_001887795.1_colLiv2_genomic_Z--DanishTumbler_Dovetail_ReRun.tsv"
Selects those regions that mapped well enough and create a new .bed file out of this selection:
awk '!/#/ && $13>10000 && $4>=1000 && $5/$4<0.01' ~/data/Pigeons/Reference/GCA_001887795.1_colLiv2_genomic_Z--DanishTumbler_Dovetail_ReRun.tsv | sort -k 2,2 -k 9,9g -k 10,10g | awk '{start=$9; end=$10} $14=="minus"{start=$10; end=$9} {print $2"\t"start"\t"end"\t"$13}' | awk '{x[$1]+=$3-$2+1; s[$1]=$4} END{for(i in x) print i"\t"x[i]"\t"x[i]/s[i]}' | sort -k 3,3gr | awk 'NR>1 && $3>0.9{print $1}' | fgrep -w -f - ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended_Merged_RemovedBadLoci-PossibleParalogs-g650.bed > ~/data/Pigeons/Reference/GCA_001887795.1_colLiv2_genomic_Z--DanishTumbler_Dovetail_ReRun.bed
Generates a .fasta file including just the 6 chromosome:
samtools faidx GCA_001887795.1_colLiv2_genomic.fasta CM007530.1 > GCA_001887795.1_colLiv2_genomic_6.fasta
Creates a .bed fie with our enzyme cutsites from the .fasta file above:
$SCRIPTS/appz/p5-bpwrapper/bin/bioseq --restrict-coord EcoT22I ~/data/Pigeons/Reference/GCA_001887795.1_colLiv2_genomic_6.fasta > ~/data/Pigeons/Reference/GCA_001887795.1_colLiv2_genomic_6.bed
Number of CUTSITES: 13.348
Maps this .fasta file against our reference genome:
xsbatch -c 8 --mem-per-cpu 1024 -J Chr-6 --time 1-00 -- "blastn -num_threads 8 -query ~/data/Pigeons/Reference/GCA_001887795.1_colLiv2_genomic_6.fasta -db ~/data/Pigeons/Reference/DanishTumbler_Dovetail_ReRun.fasta -evalue 1e-5 -perc_identity 95 -outfmt '7 std slen sstrand' > ~/data/Pigeons/Reference/GCA_001887795.1_colLiv2_genomic_6--DanishTumbler_Dovetail_ReRun.tsv"
Selects those regions that mapped well enough and create a new .bed file out of this selection:
awk '!/#/ && $13>10000 && $4>=1000 && $5/$4<0.01' ~/data/Pigeons/Reference/GCA_001887795.1_colLiv2_genomic_6--DanishTumbler_Dovetail_ReRun.tsv | sort -k 2,2 -k 9,9g -k 10,10g | awk '{start=$9; end=$10} $14=="minus"{start=$10; end=$9} {print $2"\t"start"\t"end"\t"$13}' | awk '{x[$1]+=$3-$2+1; s[$1]=$4} END{for(i in x) print i"\t"x[i]"\t"x[i]/s[i]}' | sort -k 3,3gr | awk 'NR>1 && $3>0.9{print $1}' | fgrep -w -f - ~/data/Pigeons/Reference/PBGP_FinalRun.EcoT22I_Extended_Merged_RemovedBadLoci-PossibleParalogs-g650.bed > ~/data/Pigeons/Reference/GCA_001887795.1_colLiv2_genomic_6--DanishTumbler_Dovetail_ReRun.bed
Finally, we used these two .bed files to compare the coverage values obtained.

Clone this wiki locally