-
Notifications
You must be signed in to change notification settings - Fork 26
Tama Merge
TAMA Merge
TAMA Merge is a tool that allows you to merge multiple transcriptomes while maintaining source information.
Detailed explanation of TAMA Merge:
Are you interested in: 1. Combining your Iso-Seq data from different tissue types/library preps into a single transcriptome. 2. Comparing your Iso-Seq data to the reference annotation (or short read RNAseq annotation). 3. Combining your Iso-Seq data with a short read RNAseq annotation and with the reference annotation. 4. Doing any of the above while still maintaining source information. 5. Doing any of the above with the power to define merging parameters.
If so, TAMA Merge is probably what you are looking for.
TAMA Merge takes as input multiple transcriptomes in bed12 format. It then compares the transcript models from each transcriptome and merges models based on the similarity of features (transcription start/end sites and exon start/end sites). The ouput is a merged transcriptome in bed12 format along with other files containing source information.
Manual
usage: tama_merge.py [-h] [-f] [-p] [-e] [-a] [-j] [-z]
This script merges transcriptomes.
optional arguments:
-h, --help show this help message and exit -f F File list -p P Output prefix -e E Collapse exon ends flag: common_ends or longest_ends (Default is common_ends) -a A 5 prime threshold (Default is 10) -m M Exon ends threshold/ splice junction threshold (Default is 10) -z Z 3 prime threshold (Default is 10)
Default command would look like this:
python tama_merge.py -f filelist.txt -p merged_annos
Detailed explanation of arguments:
-f filelist.txt
The filelist file contains the name of the files you want to merge as well as some additional information. The format for the file should be like this (tab separated, do not include header):
file_name cap_flag merge_priority(start,junctions,end) source_name annotation_capped.bed capped 1,1,1 cap_lib annotation_nocap.bed no_cap 2,1,1 nocap_lib
"cap_flag" can be one of two options "capped" or "no_cap". This represents whether the transcriptome start sites should be trusted or if transcripts should be merged into longer matching transcripts.
"merge_priority" designates the rank of the information from each source with respect to start site, splice junctions, and end sites. "1" is the highest rank. So in the example above the "capped" transcriptome will have a start site priority over the "no_cap" transcriptome.
"source_name" is used for the source information files to show where each prediction comes from.
-p P Output prefix
The output prefix is the prefix that will be sued to name the output files.
-e E Collapse exon ends flag: common_ends or longest_ends
The collapse exon ends flag is used to determine whether an exon end feature should be chosen based on how common it is (common_ends) or if it makes the longest exon (longest_ends). Default is common_ends.
-a A 5 prime threshold
The 5 prime threshold is the amount of tolerance at the 5' end of the transcript for grouping reads to be collapsed.
-m M Exon ends threshold/ pslice junction threshold
The Exon/Splice junction threshold is the amount of tolerance for the splice junctions of the transcript for grouping reads to be collapsed.
-z Z 3 prime threshold
The 3 prime threshold is the amount of tolerance for the 3' end of the transcript for grouping reads to be collapsed.
Outputs:
prefix.bed prefix_gene_report.txt prefix_merge.txt prefix_trans_report.txt
Detailed explanation:
prefix.bed
This is the main merged annotation file.
prefix_gene_report.txt
This contains a report of the genes from the merged file. The format is as follows:
gene_id num_clusters num_final_trans sources chrom start end G1 2 2 tissue1,tissue2 1 225 3214
prefix_merge.txt
This contains a bed12 format file which shows the coordinates of each input transcript matched to the merged transcript ID.
prefix_trans_report.txt
This contains the source information for each merged transcript. The format is as follows:
transcript_id num_clusters sources start_wobble_list end_wobble_list exon_start_support exon_end_support G2.1 1 newnormbrain 0 0 newnormbrain_G2.1 newnormbrain_G2.1