Skip to content

For Species Tree estimation using similarity matrix, distance matrix and NJ (FastME)

Notifications You must be signed in to change notification settings

Mahim1997/STreeEstimation-SisterMatrix-NJ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

b2c9977 · Sep 16, 2021

History

30 Commits
Aug 17, 2020
Aug 17, 2020
Sep 16, 2021
Aug 12, 2020
Aug 12, 2020
Aug 17, 2020
Aug 17, 2020
Aug 17, 2020
Sep 16, 2021

Repository files navigation

Species Tree Estimation using Sister Matrices from weighted quartets & triplets with Neighbor Joining algorithm

For Species Tree estimation using FastME (NJ) and sister matrix

Pipeline (with quartets):

  1. Generate all embedded weighted quartets from a set of gene trees
  2. Generate the most dominant (i.e. best weighted) quartets from all combinations of quartets

Pipeline (with triplets):

  1. Generate all embedded weighted triplets from a set of gene trees
  2. Generate the most dominant (i.e. best weighted) triplets from all combinations of triplets

Pipeline (common steps):

  1. Form a sister matrix using the above weighted quartets (S: sister/similarity matrix)
  2. Form a difference matrix (D) using S i.e. D = 1 - S (element-wise, normalized).
  3. Run NJ on this D matrix.

To remove branch/edge length.

  • Use DendroPy library
taxa = dendropy.TaxonNamespace()
tree = dendropy.Tree.get_from_path(input_file, "newick", taxon_namespace=taxa, rooting="force-rooted")

# https://dendropy.org/primer/trees.html
for edge in tree.postorder_edge_iter():
    edge.length = None

output_tree = tree.as_string("newick").strip()
output_tree = output_tree.replace("[&R] ", "") ## remove this sign

Dependencies:

  1. Needs fastme to be setup and the tool fastme-2.1.5.2-linux64 in the same directory as the required python scripts
  2. For quartets, need the quartet-controller.sh, summarize_quartets.py and numeric_form_matrix_quartets.py scripts
  3. For triplets, need the triplet_count.sh, triplet-encoding-controller.sh and numeric_form_matrix_quartets.py scripts

Running:

For Quartets:

  python3 SCRIPTS_For_NJ_quartets/get_NJ_Tree_using_quartets.py "best-wqrts-file" "output-file-name"

For Triplets:

  python3 SCRIPTS_For_NJ_triplets/compute_NJ_Tree_using_triplets.py "best-wtriplets-file" "output-file-name"

Acknowledgements

  • Neighbor Joining is computed by the FastME tool.

    Lefort, Vincent et al. “FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program.” Molecular biology and evolution vol. 32,10 (2015): 2798-800. doi:10.1093/molbev/msv150

  • SisterEstimation uses some methods of the PhyloNet package for rf computations.

    C. Than, D. Ruths, L. Nakhleh (2008) PhyloNet: A software package for analyzing and reconstructing reticulate evolutionary histories, BMC Bioinformatics 9:322.

About

For Species Tree estimation using similarity matrix, distance matrix and NJ (FastME)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published