trunc_seq.pl
is a script to truncate sequence files.
- Synopsis
- Description
- Usage
- Options
- Output
- Run environment
- Dependencies
- Author - contact
- Citation, installation, and license
- Changelog
perl trunc_seq.pl 20 3500 seq-file.embl > seq-file_trunc_20_3500.embl
or
perl trunc_seq.pl file_of_filenames_and_coords.tsv
This script truncates sequence files according to the given coordinates. The features/annotations in RichSeq files (e.g. EMBL or GENBANK format) will also be adapted accordingly. Use option -o to specify a different output sequence format. Input can be given directly as a file and truncation coordinates to the script, with the start position as the first argument, stop as the second and (the path to) the sequence file as the third. In this case the truncated sequence entry is printed to STDOUT. Input sequence files should contain only one sequence entry, if a multi-sequence file is used as input only the first sequence entry is truncated.
Alternatively, a file of filenames (fof) with respective coordinates and sequence files in the following tab-separated format can be given to the script (the header is optional):
#start stop seq-file
300 9000 (path/to/)seq-file
50 1300 (path/to/)seq-file2
With a fof the resulting truncated sequence files are printed into a results directory. Use option -r to specify a different results directory than the default.
It is also possible to truncate a RichSeq sequence file loaded into the Artemis genome browser from the Sanger Institute: Select a subsequence and then go to Edit -> Subsequence (and Features)
perl trunc_seq.pl -o gbk 120 30000 seq-file.embl > seq-file_trunc_120_3000.gbk
or
perl trunc_seq.pl -o fasta 5300 18500 seq-file.gbk | perl revcom_seq.pl -i fasta > seq-file_trunc_revcom.fasta
or
perl trunc_seq.pl -r path/to/trunc_embl_dir -o embl file_of_filenames_and_coords.tsv
-
-h, -help
Help (perldoc POD)
-
-o=str, -outformat=str
Specify different sequence format for the output (files) [fasta, embl, or gbk]
-
-r=str, -result_dir=str
Path to result folder for fof input [default = './trunc_seq_results']
-
-v, -version
Print version number to STDOUT
-
STDOUT
If a single sequence file is given to the script the truncated sequence file is printed to STDOUT. Redirect or pipe into another tool as needed.
or
-
./trunc_seq_results
If a fof is given to the script, all output files are stored in a results folder
-
./trunc_seq_results/seq-file_trunc_start_stop.format
Truncated output sequence files are named appended with 'trunc' and the corresponding start and stop positions
The Perl script runs under Windows and UNIX flavors.
- BioPerl (tested version 1.007001)
Andreas Leimbach (aleimba[at]gmx[dot]de; Microbial Genome Plasticity, Institute of Hygiene, University of Muenster)
For citation, installation, and license information please see the repository main README.md.
- v0.2 (2015-12-07)
- Merged funtionality of
trunc_seq.pl
andrun_trunc_seq.pl
in one single script- Allows now single file and file of filenames (fof) with coordinates input
- output for single file input printed to STDOUT now
- output for fof input printed into files in a result directory, new option -r to specify result directory
- included a POD instead of a simple usage text
- included
pod2usage
with Pod::Usage - included 'use autodie' pragma
- options with Getopt::Long
- output format now specified with option -o
- included version switch, -v
- fixed bug to remove input filepaths from fof input for output files
- skip empty or comment lines (/^#/) in fof input
- check and warn if input seq file has more than one seq entries
- Merged funtionality of
- v0.1 (2013-02-08)
- In v0.1
trunc_seq.pl
only for single sequence input, but included additional wrapper scriptrun_trunc_seq.pl
for a fof input
- In v0.1