-
Notifications
You must be signed in to change notification settings - Fork 3
Source code for my Bioinformatics paper "MCScanX-transposed: detecting transposed gene duplications based on multiple colinearity scans".
License
wyp1125/MCScanX-transposed
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
1. Overview MCScanX-transposed is a software package able to detect transposed gene duplications that occurred within different epochs, as well as integratively analyze gene duplication modes. MCScanX-transposed can be also used to annotate a gene family of interest with gene duplication modes. All programs are executed using command line options on either MAC OS or Linux systems. Usage information is built into the programs. To show usage on the screen, users just need to run the program without giving any options: "./program_name" for executable binary files; "perl program_name.pl" for perl scripts; "java program_name" for java classes All code is copiable, distributable, modifiable, and usable without any restrictions. Contact: Yupeng Wang, wyp1125@gmail.com 2. Installation On Mac OS, Xcode (http://developer.apple.com/xcode/) should be installed prior to the installation of MCScanX-transposed package. On Linux systems, the Java SE Development Kit (JDK) and "libpng" should be installed before the installation of MCScanX-transposed package. Then simply put MCscanX-mode.zip into a directory and run: " unzip MCscanX-mode.zip cd MCScanX-transposed make " The following is the list of executable programs Core program: MCScanX-transposed.pl Downstream analysis programs Tool 1. add_ka_ks.pl Tool 2. detect_dup_modes_for_a_gene.pl Tool 3. detect_dup_modes_for_a_family.pl Tool 4. annotate_tree_with_dup_mode Tool 5. annotate_tree_with_tra_dup 3. Core program MCScanX-transposed This program carries out detection of transposed gene duplications within different epochs and classificaiton of gene duplication modes. Usage:"perl MCScanX-transposed.pl -i data_directory -t target_species -c outgroup_species(comma_delimited) -o output_directory Optional: -x number_of_different_epoches (if specified, outgroup species must be provided in the order of divergence from the target species(most recent first), default: 1, only consider the transposed duplications that occurred after the divergence between target species and all outgroups ) -a 1 or 0(are segmental duplicates ancestral loci or not? default: 1, yes) -d number_of_genes(maximum distance to call proximal, default: 10)" IMPORTANT: Users must prepare the input files by carefully reading the following instructions (1-4). 1) All input files should be stored under ONE folder(the "data_directory" parameter) 2) For the target genome in which gene duplicaiton modes will be classified, please prepare two input files: a) "[target_species].gff", a gene position file for the target species, following a tab-delimited format: "sp&chr_NO gene starting_position ending_position" b) "[target_species].blast", a blastp output file (m8 format) for the target species (self-genome comparison). 3) For each outgroup genome, please prepare two input files: a) "[target_species]_[outgroup_species].gff", a gene position file for the target_species and outgroup_species, following a tab-delimited format:"sp&chr_NO gene starting_position ending_position" b) "[target_species]_[outgroup_species].blast", a blastp output file (m8 format) between the target and outgroup species (cross-genome comparison). 4) For example, assuming that you are going to classify gene duplication modes in Arabidopsis thaliana (ID: at), using Brassica rapa (ID: br) and Carica papaya (ID: cp) as outgroups, you need to prepare 6 input files: "at.gff","at.blast", "at_br.gff", "at_br.blast","at_br.gff","at_cp.gff" and "at_cp.blast". Examples: "perl MCScanX-transposed.pl -i data -t at -c al,br,cp,pt,vv -o result/test1 -x 3" "perl MCScanX-transposed.pl -i data -t at -c br,cp,pt,vv -o result/test2" Different modes of gene duplication including segmental, tandem, proximal and transposed are output as separate files (".pairs") with each line containing one gene duplication. In transposed duplications,the first duplicated gene is the transposed locus. Unique duplicates belonging to each mode are also output as a separate file (".genes"). Interim collinearity files generated by MCScanX are available under the output directory. 4. Downstream analysis programs 1) add_ka_ks.pl This program calculates the Ka & Ks value of each gene duplication shown in the MCScanX-transposed output ".pairs". Clustalw and Bio-perl are needed for executing this program. Usage:"perl add_ka_ks.pl -i gene_duplication('.pairs')_file -d cds_file -o output_file" Example:"perl add_ka_ks.pl -i result/test1/at.transposed.pairs -d data/at.cds -o result/at.transposed.pairs.kaks" 2) detect_dup_modes_for_a_gene.pl This program detects gene duplication modes for a gene from the MCScanX-transposed output. Usage:"perl detect_dup_modes_for_a_gene.pl -i gene_ID -d directory_for_gene_duplication_modes/target_species -o output_file" Example:"perl detect_dup_modes_for_a_gene.pl -i AT1G11520 -d result/test1/at -o result/gene_query.result" 3) detect_dup_modes_for_a_gene_family.pl This program detects gene duplication modes for a gene family from the MCScanX-transposed output. In the gene family file, genes should be separated by space or tab. Usage:"perl detect_dup_modes_for_a_family.pl -i gene_family_file -d directory_for_gene_duplication_modes/target_species -o output_file" Example:"perl detect_dup_modes_for_a_family.pl -i data/mads.genes -d result/test1/at -o result/mads.dup" 4) annotate_tree_with_dup_modes This java program displays a gene family tree on which duplicate gene pairs of different modes are connected with curves of different colors respectively. Usage: "java annotate_tree_with_dup_modes -t tree_file -s duplication_file(the output of detect_dup_modes_for_a_family.pl) -o output_PNG_file optional:-x plot_width -y plot height -f font_size" Example:"java annotate_tree_with_dup_modes -t data/mads.nwk -s result/mads.dup -o result/mads.png -x 800 -y 1600" 5) annotate_tree_with_tra_dup This java program displays a gene family tree linked to a chromosome ideogram on which each pair of transposed and parental duplicates is demonstrated by a curve of unique color. Usage: "java annotate_tree_with_tra_dup -t tree_file -g gff_file -s gene_duplication_file(the output of detect_dup_modes_for_a_family.pl) -o output_PNG_file optional:-e epoch -x plot_width -y plot height -f font_size" Note: epoch should be a word after 'transposed_' in the gene duplication file Example:"java annotate_tree_with_tra_dup -t data/mads.nwk -s result/mads.dup -g data/at.gff -o temp.PNG -e between_al_br -x 800 -y 1600"
About
Source code for my Bioinformatics paper "MCScanX-transposed: detecting transposed gene duplications based on multiple colinearity scans".
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published