SamBuckberry/pysickle
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
Pysickle is a python implementation of sickle (https://github.com/najoshi/sickle) NB. PLEASE NOTE THAT THIS CODE IS CURRENTLY NOT WORKING. THERE IS A BUG IN THE MOVING-AVERAGE FILTER THAT REJECTS EVERY SEQUENCE. Pysickle takes files in the FASTQ format and uses a sliding windows and a predetermined threshold quality value to determine 5' and 3' trimming. Pysickle takes an input fastq file and outputs a trimmed version of that file. It also has options to change the length and quality thresholds for trimming, as well as disabling 5'-trimming and enabling removal of sequences with Ns. Pysickle currently supports three types of quality values: Illumina, Solexa, and Sanger. 1) "fastq" means to Sanger style FASTQ files using PHRED scores and an ASCII offset of 33 (e.g. from the NCBI Short Read Archive and Illumina 1.8+). These can potentially hold PHRED scores from 0 to 93. 2) "fastq-sanger" is an alias for "fastq". 3) "fastq-solexa" means old Solexa (and also very early Illumina) style FASTQ files, using Solexa scores with an ASCII offset 64. These can hold Solexa scores from -5 to 62. 4) "fastq-illumina" means newer Illumina 1.3 to 1.7 style FASTQ files, using PHRED scores but with an ASCII offset 64, allowing PHRED scores from 0 to 62. To run Pysickle from the working directory $ ./pysickle.py ./test/test.fastq