This tool is designed to provide fast all-in-one preprocessing for FastQ files. This tool is developed in C++ with multithreading supported to afford high performance. It has following features:
- filter out bad reads (too low quality, too short, or too many N...)
- trim all reads in front and tail
- cut low quality bases for per read in its 5' and 3' by evaluating the mean quality from a sliding window (like Trimmomatic but faster).
- cut adapters (for paired end data it's automatic, for single end data adapter sequence should be provided).
- report JSON format result for further interpreting.
- visualize quality control and filtering results on a single HTML page (like FASTQC but faster and more informative).
- split the output to multiple files (0001.R1.gz, 0002.R1.gz...) to support parallel processing.
- ...
This tool is being intensively developed, and new features can be implemented soon if they are considered useful. If you have any additional requirement for fastp
, please file an issue:https://github.com/OpenGene/fastp/issues/new
fastp
creates reports in both HTML and JSON format.
- HTML report: http://opengene.org/fastp/fastp.html
- JSON report: http://opengene.org/fastp/fastp.json
Get latest
# download by http
https://github.com/OpenGene/fastp/archive/master.zip
# or clone by git
git clone https://github.com/OpenGene/fastp.git
Get the stable releases
https://github.com/OpenGene/fastp/releases/latest
fastp only depends on libz
, which is always available on Linux or Mac systems. If your system has no libz
, install it first.
cd fastp
make
After build is done, run
sudo make install
usage: fastp -i <in1> -o <out1> [-I <in1> -O <out2>] [options...]
options:
# I/O options
-i, --in1 read1 input file name (string)
-o, --out1 read1 output file name (string [=])
-I, --in2 read2 input file name (string [=])
-O, --out2 read2 output file name (string [=])
-6, --phred64 indicates the input is using phred64 scoring (it'll be converted to phred33, so the output will still be phred33)
-z, --compression compression level for gzip output (1 ~ 9). 1 is fastest, 9 is smallest, default is 2. (int [=2])
# adapter trimming options
-A, --disable_adapter_trimming adapter trimming is enabled by default. If this option is specified, adapter trimming is disabled
-a, --adapter_sequence for single end data, adapter sequence is required for adapter trimming (string [=])
# global trimming options
-f, --trim_front1 trimming how many bases in front for read1, default is 0 (int [=0])
-t, --trim_tail1 trimming how many bases in tail for read1, default is 0 (int [=0])
-F, --trim_front2 trimming how many bases in front for read2. If it's not specified, it will follow read1's settings (int [=0])
-T, --trim_tail2 trimming how many bases in tail for read2. If it's not specified, it will follow read1's settings (int [=0])
# per read cutting by quality options
-5, --cut_by_quality5 enable per read cutting by quality in front (5'), default is disabled (WARNING: this will interfere deduplication for both PE/SE data)
-3, --cut_by_quality3 enable per read cutting by quality in tail (3'), default is disabled (WARNING: this will interfere deduplication for SE data)
-W, --cut_window_size the size of the sliding window for sliding window trimming, default is 4 (int [=4])
-M, --cut_mean_quality the bases in the sliding window with mean quality below cutting_quality will be cut, default is Q20 (int [=20])
# quality filtering options
-Q, --disable_quality_filtering quality filtering is enabled by default. If this option is specified, quality filtering is disabled
-q, --qualified_quality_phred the quality value that a base is qualified. Default 15 means phred quality >=Q15 is qualified. (int [=15])
-u, --unqualified_percent_limit how many percents of bases are allowed to be unqualified (0~100). Default 40 means 40% (int [=40])
-n, --n_base_limit if one read's number of N base is >n_base_limit, then this read/pair is discarded. Default is 5 (int [=5])
# length filtering options
-L, --disable_length_filtering length filtering is enabled by default. If this option is specified, length filtering is disabled
-l, --length_required reads shorter than length_required will be discarded. (int [=30])
# reporting options
-j, --json the json format report file name (string [=fastp.json])
-h, --html the html format report file name (string [=fastp.html])
# thread options
-w, --thread worker thread number, default is 3 (int [=3])
# output splitting options
-s, --split if this option is specified, the output will be split to multiple (--split) files (i.e. 0001.out.fq, 0002.out.fq...). (int [=0])
-d, --split_prefix_digits the digits for the slice number padding (1~10), default is 4, so the filename will be padded as 0001.xxx, 0 to disable padding (int [=4])
# help
-?, --help print this message