Skip to content

Source code for my Bioinformatics paper "MCScanX-transposed: detecting transposed gene duplications based on multiple colinearity scans".

License

Notifications You must be signed in to change notification settings

dengyang111/MCScanX-transposed

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

1. Overview

MCScanX-transposed is a software package able to detect transposed gene duplications that occurred within different epochs, as well as integratively analyze gene duplication modes. MCScanX-transposed can be also used to annotate a gene family of interest with gene duplication modes.

All programs are executed using command line options on either MAC OS or Linux systems. Usage information is built into the programs. To show usage on the screen, users just need to run the program without giving any options:

"./program_name" for executable binary files;
"perl program_name.pl" for perl scripts;
"java program_name" for java classes
 
All code is copiable, distributable, modifiable, and usable without any restrictions.
Contact: Yupeng Wang, wyp1125@gmail.com

2. Installation

On Mac OS, Xcode (http://developer.apple.com/xcode/) should be installed prior to the installation of MCScanX-transposed package. On Linux systems, the Java SE Development Kit (JDK) and "libpng" should be installed before the installation of MCScanX-transposed package.
Then simply put MCscanX-mode.zip into a directory and run:

"
unzip MCscanX-mode.zip
cd MCScanX-transposed
make
"
  
The following is the list of executable programs
Core program:
	MCScanX-transposed.pl

Downstream analysis programs
        Tool 1.  add_ka_ks.pl
	Tool 2.  detect_dup_modes_for_a_gene.pl
        Tool 3.  detect_dup_modes_for_a_family.pl
        Tool 4.  annotate_tree_with_dup_mode
        Tool 5.  annotate_tree_with_tra_dup

3. Core program

MCScanX-transposed
This program carries out detection of transposed gene duplications within different epochs and classificaiton of gene duplication modes.

Usage:"perl MCScanX-transposed.pl -i data_directory -t target_species -c outgroup_species(comma_delimited) -o output_directory
Optional:
-x number_of_different_epoches (if specified, outgroup species must be provided in the order of divergence from the target species(most recent first), default: 1, only consider the transposed duplications that occurred after the divergence between target species and all outgroups )
-a 1 or 0(are segmental duplicates ancestral loci or not? default: 1, yes)
-d number_of_genes(maximum distance to call proximal, default: 10)"

IMPORTANT: Users must prepare the input files by carefully reading the following instructions (1-4).

1) All input files should be stored under ONE folder(the "data_directory" parameter)
2) For the target genome in which gene duplicaiton modes will be classified, please prepare two input files:
a) "[target_species].gff", a gene position file for the target species, following a tab-delimited format: "sp&chr_NO      gene    starting_position       ending_position"
b) "[target_species].blast", a blastp output file (m8 format) for the target species (self-genome comparison).
3) For each outgroup genome, please prepare two input files:
a) "[target_species]_[outgroup_species].gff", a gene position file for the target_species and outgroup_species, following a tab-delimited format:"sp&chr_NO      gene    starting_position       ending_position"
b) "[target_species]_[outgroup_species].blast", a blastp output file (m8 format) between the target and outgroup species (cross-genome comparison).
4) For example, assuming that you are going to classify gene duplication modes in Arabidopsis thaliana (ID: at), using Brassica rapa (ID: br) and Carica papaya (ID: cp) as outgroups, you need to prepare 6 input files: "at.gff","at.blast", "at_br.gff", "at_br.blast","at_br.gff","at_cp.gff" and "at_cp.blast".

Examples: 
"perl MCScanX-transposed.pl -i data -t at -c al,br,cp,pt,vv -o result/test1 -x 3"
"perl MCScanX-transposed.pl -i data -t at -c br,cp,pt,vv -o result/test2"

Different modes of gene duplication including segmental, tandem, proximal and transposed are output as separate files (".pairs") with each line containing one gene duplication. In transposed duplications,the first duplicated gene is the transposed locus. Unique duplicates belonging to each mode are also output as a separate file (".genes"). Interim collinearity files generated by MCScanX are available under the output directory.

4. Downstream analysis programs

1) add_ka_ks.pl
This program calculates the Ka & Ks value of each gene duplication shown in the MCScanX-transposed output ".pairs". Clustalw and Bio-perl are needed for executing this program.

Usage:"perl add_ka_ks.pl -i gene_duplication('.pairs')_file -d cds_file -o output_file"

Example:"perl add_ka_ks.pl -i result/test1/at.transposed.pairs -d data/at.cds -o result/at.transposed.pairs.kaks"

2) detect_dup_modes_for_a_gene.pl
This program detects gene duplication modes for a gene from the MCScanX-transposed output.

Usage:"perl detect_dup_modes_for_a_gene.pl -i gene_ID -d directory_for_gene_duplication_modes/target_species -o output_file"

Example:"perl detect_dup_modes_for_a_gene.pl -i AT1G11520 -d result/test1/at -o result/gene_query.result"


3) detect_dup_modes_for_a_gene_family.pl
This program detects gene duplication modes for a gene family from the MCScanX-transposed output. In the gene family file, genes should be separated by space or tab.

Usage:"perl detect_dup_modes_for_a_family.pl -i gene_family_file -d directory_for_gene_duplication_modes/target_species -o output_file"

Example:"perl detect_dup_modes_for_a_family.pl -i data/mads.genes -d result/test1/at -o result/mads.dup"

4) annotate_tree_with_dup_modes
This java program displays a gene family tree on which duplicate gene pairs of different modes are connected with curves of different colors respectively.

Usage: "java annotate_tree_with_dup_modes -t tree_file -s duplication_file(the output of detect_dup_modes_for_a_family.pl) -o output_PNG_file 
optional:-x plot_width -y plot height -f font_size"

Example:"java annotate_tree_with_dup_modes -t data/mads.nwk -s result/mads.dup -o result/mads.png -x 800 -y 1600"

5) annotate_tree_with_tra_dup
This java program displays a gene family tree linked to a chromosome ideogram on which each pair of transposed and parental duplicates is demonstrated by a curve of unique color.

Usage: "java annotate_tree_with_tra_dup -t tree_file -g gff_file -s gene_duplication_file(the output of detect_dup_modes_for_a_family.pl) -o output_PNG_file
optional:-e epoch -x plot_width -y plot height -f font_size"
Note: epoch should be a word after 'transposed_' in the gene duplication file

Example:"java annotate_tree_with_tra_dup -t data/mads.nwk -s result/mads.dup -g data/at.gff -o temp.PNG -e between_al_br -x 800 -y 1600"

About

Source code for my Bioinformatics paper "MCScanX-transposed: detecting transposed gene duplications based on multiple colinearity scans".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 40.0%
  • Java 35.1%
  • Perl 17.2%
  • Other 7.0%
  • C 0.7%