5. Filtering For Chimeric Reads

We noticed that some GBS reads seemed to be chimeric ones, which means that some reads would actually be the product of the merge of the two or more biological cutsites into one single read. Even though we cannot fully explain the precise biochemistry behind this methodological issue, we decided to be conservative and exclude all the reads that presented this kind of signal.

Executes an initial PaleoMix run with the original GBSed demultiplexed files in order to be able to identify the chimeric reads. We used the `.yaml` file below and respective command:

xsbatch -c XXX --mem-per-cpu XXX -J PaleoMix --time XXX -- bam_pipeline run --jre-option "-XmxXXXg" --max-threads XXX --bwa-max-threads XXX --adapterremoval-max-threads XXX --destination ~/data/Pigeons/Analysis/PaleoMix_GBS_BEFORE-FILTEREDCHIMERAS/ ~/data/Pigeons/PBGP/PBGP--Analyses/PBGP--Final_PaleoMix_GBS_BEFORE-FILTEREDCHIMERAS.yaml

Generates an ID file for each sample contained the reads that should be excluded. Those are reads having a second or more cut-site and that were mapped to two or more different regions:

parallel --plus --keep-order --dryrun "samtools view {} | grep -v '^#' | awk '\$6~/[HS]/ && \$10~/ATGCAT/{print \$1}' | sort -u > $TMP_DIR/{/...}.Chimeras.id" ::: ~/data/Projects/PGP/AfterChrGenome/AfterChrGenome_PGP--Analyses/AfterChrGenome_PGP--PaleoMix_BeforeFilteredChimeras/*.bam > /groups/hologenomics/pacheco/PBG--AfterChrGenome_ToCreateChimeraIDs.txt

chmod 755 ./PBG--AfterChrGenome_ToCreateChimeraIDs.txt
xsbatch -c 1 -R --max-array-jobs 15 --mem-per-cpu 2500 --time 1-00 -- ./PBG--AfterChrGenome_ToCreateChimeraIDs.txt

Excludes these identified reads using the software package QIIME--v1.9.1. A filtered `.fastq` file is created inside the respective folders of each original demultiplexed file.

module load blast/v2.2.26
module load qiime/v1.9.1

ls ~/data/Pigeons/GBS/FPGP_*/*_Demultiplexed_GBSX--v1.3/*_!(*Undetermined).fastq.gz | parallel --plus --keep-order --dryrun "zcat {} > {.} && filter_fasta.py -f {.} -o {..}.FilteredChimeras.fastq -s $TMP_DIR/{/...}-GBS.Chimeras.id -n && gzip --best {..}.FilteredChimeras.fastq && rm {.}" | xsbatch --mem-per-cpu XXX -R --max-array-jobs XXX -c 1 --time XXX --

PBG--Pipeline

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5. Filtering For Chimeric Reads

Executes an initial PaleoMix run with the original GBSed demultiplexed files in order to be able to identify the chimeric reads. We used the `.yaml` file below and respective command:

Generates an ID file for each sample contained the reads that should be excluded. Those are reads having a second or more cut-site and that were mapped to two or more different regions:

Excludes these identified reads using the software package QIIME--v1.9.1. A filtered `.fastq` file is created inside the respective folders of each original demultiplexed file.

Clone this wiki locally

5. Filtering For Chimeric Reads

Executes an initial PaleoMix run with the original GBSed demultiplexed files in order to be able to identify the chimeric reads. We used the .yaml file below and respective command:

Generates an ID file for each sample contained the reads that should be excluded. Those are reads having a second or more cut-site and that were mapped to two or more different regions:

Excludes these identified reads using the software package QIIME--v1.9.1. A filtered .fastq file is created inside the respective folders of each original demultiplexed file.

Clone this wiki locally

Executes an initial PaleoMix run with the original GBSed demultiplexed files in order to be able to identify the chimeric reads. We used the `.yaml` file below and respective command:

Excludes these identified reads using the software package QIIME--v1.9.1. A filtered `.fastq` file is created inside the respective folders of each original demultiplexed file.