order_fastx.pl
is a script to order sequences in FASTA or FASTQ files.
- Synopsis
- Description
- Usage
- Options
- Output
- Run environment
- Author - contact
- Citation, installation, and license
- Changelog
perl order_fastx.pl -i infile.fasta -l order_id_list.txt > ordered.fasta
Order sequence entries in FASTA or FASTQ sequence files according to an ID list with a given order. Beware, the IDs in the order list have to be identical to the entire IDs in the sequence file.
However, the ">" or "@" ID identifiers of FASTA or FASTQ files, respectively, can be omitted in the ID list.
The file type is detected automatically. But, you can set the file
type manually with option -f. FASTQ format assumes four lines
per read, if this is not the case run the FASTQ file through
fastx_fix.pl
or use Heng Li's seqtk seq
:
seqtk seq -l 0 infile.fq > outfile.fq
The script can also be used to pull a subset of sequences in the ID
list from the sequence file. Probably best to set option flag -s
in this case, see Optional options below. But, rather use
filter_fastx.pl
.
perl order_fastx.pl -i infile.fq -l order_id_list.txt -s -f fastq > ordered.fq
perl order_fastx.pl -i infile.fasta -l order_id_list.txt -e > ordered.fasta
-
-i, -input
Input FASTA or FASTQ file
-
-l, -list
List with sequence IDs in specified order
-
-h, -help
Help (perldoc POD)
-
-f, -file_type
Set the file type manually [fasta|fastq]
-
-e, -error_files
Write missing IDs in the seq file or the order ID list without an equivalent in the other to error files instead of STDERR (see Output below)
-
-s, -skip_errors
Skip missing ID error statements, excludes option -e
-
-v, -version
Print version number to STDERR
-
STDOUT
The newly ordered sequences are printed to STDOUT. Redirect or pipe into another tool as needed.
-
(order_ids_missing.txt)
If IDs in the order list are missing in the sequence file with option -e
-
(seq_ids_missing.txt)
If IDs in the sequence file are missing in the order ID list with option -e
The Perl script runs under Windows and UNIX flavors.
Andreas Leimbach (aleimba[at]gmx[dot]de; Microbial Genome Plasticity, Institute of Hygiene, University of Muenster)
For citation, installation, and license information please see the repository main README.md.
- v0.1 (20.11.2014)