Skip to content

5. Filtering For Chimeric Reads

George Pacheco edited this page Jul 21, 2021 · 1 revision

We noticed that some GBS reads seemed to be chimeric ones, which means that some reads would actually be the product of the merge of the two or more biological cutsites into one single read. Even though we cannot fully explain the precise biochemistry behind this methodological issue, we decided to be conservative and exclude all the reads that presented this kind of signal.

Executes an initial PaleoMix run with the original GBSed demultiplexed files in order to be able to identify the chimeric reads. We used the .yaml file below and respective command:
xsbatch -c XXX --mem-per-cpu XXX -J PaleoMix --time XXX -- bam_pipeline run --jre-option "-XmxXXXg" --max-threads XXX --bwa-max-threads XXX --adapterremoval-max-threads XXX --destination ~/data/Pigeons/Analysis/PaleoMix_GBS_BEFORE-FILTEREDCHIMERAS/ ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--Final_PaleoMix_GBS_BEFORE-FILTEREDCHIMERAS.yaml
Generates an ID file for each sample contained the reads that should be excluded. Those are reads having a second or more cut-site and that were mapped to two or more different regions:
parallel --plus --keep-order --dryrun "samtools view {} | grep -v '^#' | awk '\$6~/[HS]/ && \$10~/ATGCAT/{print \$1}' | sort -u > $TMP_DIR/{/...}.Chimeras.id" ::: ~/data/Projects/PGP/AfterChrGenome/AfterChrGenome_PGP--Analyses/AfterChrGenome_PGP--PaleoMix_BeforeFilteredChimeras/*.bam > /groups/hologenomics/pacheco/PBG--AfterChrGenome_ToCreateChimeraIDs.txt
chmod 755 ./PBG--AfterChrGenome_ToCreateChimeraIDs.txt
xsbatch -c 1 -R --max-array-jobs 15 --mem-per-cpu 2500 --time 1-00 -- ./PBG--AfterChrGenome_ToCreateChimeraIDs.txt
Excludes these identified reads using the software package QIIME--v1.9.1. A filtered .fastq file is created inside the respective folders of each original demultiplexed file.
module load blast/v2.2.26
module load qiime/v1.9.1
ls ~/data/Pigeons/GBS/FPGP_*/*_Demultiplexed_GBSX--v1.3/*_!(*Undetermined).fastq.gz | parallel --plus --keep-order --dryrun "zcat {} > {.} && filter_fasta.py -f {.} -o {..}.FilteredChimeras.fastq -s $TMP_DIR/{/...}-GBS.Chimeras.id -n && gzip --best {..}.FilteredChimeras.fastq && rm {.}" | xsbatch --mem-per-cpu XXX -R --max-array-jobs XXX -c 1 --time XXX --

Clone this wiki locally