Skip to content

Latest commit

 

History

History
154 lines (104 loc) · 5.21 KB

README.md

File metadata and controls

154 lines (104 loc) · 5.21 KB

BioNick

BioNick includes a series of modular functions for the manipulation of Newick strings in python, e.g., extracting leaves, swapping roots, removing node labels, flipping the order of nodes, removing leaves, extracting subtrees, visualizing cladograms with matplotlib visuals, etc. BioNick is also equipped with the ability to represent trees as a collection of node objects and create Neighbor-Joining trees from distance matrices.

If you want new functions, feel free to open an issue.

Install

pip install BioNick

Requirements

Python (tested with 3.9.19)
│
│─── numpy (tested with 2.0.2)
│─── pandas (tested with 2.2.3)
└─── matplotlib (tested with 3.9.4)

Documentation and example use-cases

The current version is designed to work with unrooted trees without node labels. For example, tips A, B, C, D and E will be recognized in the following string:

(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F

But only tips B, C, D and E will be recognized in this one, which is the same tree, but explicitly rooted on A.:

((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A

I ran all my tests without the trailing semi-colon that is conventional in Newick files.

Run import BioNick as bn to load the package.

Load a tree string as wiki_tree = (A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F

  1. Remove node labels

    tree = bn.remove_node_labels(wiki_tree)

All following examples call functions from bn and assume that node labels have been removed.

  1. Extract leaves of trees

    bn.leaves(tree)

  2. Extract leaves with branches

    bn.leaves_wb(tree)

  3. Convert to a list of [node, child, branch-length]s

    bn.nw_pd(tree)

  4. Root at taxon (taxon 'C' here for example)

    taxon = 'C' 
    bn.root_at(tree, taxon) 
    
  5. Root at node. Nodes are supposed to be encoded sequentially from 0 starting from the leaves.

    node = 5
    bn.root_at_node(tree, node)
    
  6. Flip all edges

    bn.flip_all_edges(tree)

  7. Flip leaves at an internal node

    node = 4
    bn.flip_leaves_at_node(tree,node)   
    
  8. Export all possible rooted trees

    bn.all_trees(tree)

  9. Export nodes with all descendants. Internal nodes begin with a "__" prefix and descendants are stored as a set.

    bn.nodes_w_all_descendants(tree)

  10. Extract subtree. Remove all leaves except those listed. In this example, ['A','B','D'] are kept.

    bn.extract_subtree(tree, ['A','B','D'])

  11. Remove leaf

    bn.remmove_leaf(tree, 'A')

  12. Create Neighbor-Joining tree from distance matrix. Assumes a symmetrical distance matrix. Written over Pandas.

    # A test tree from wikipedia
    test = pd.DataFrame([[0,5,9,9,8],[5,0,10,10,9],[9,10,0,8,7],[9,10,8,0,3],[8,9,7,3,0]])
    
    # Indices and columns must be str objects. A prefix 't' is also added for clarity.
    test.index = 't'+test.index.astype(str)
    test.columns = 't'+test.columns.astype(str)
    
    # The neighbor-joining function is called. A second function converts the output dataframe to a BioNick tree object. 
    tt = bn.njtr(pd.DataFrame(bn.nj(test.copy(),[])))
    
    # A root must be specified to allow the nodes to being expanding recursively. Tree objects can be rooted using the root_at_node or root_at_tip methods.
    tt.root_at_node(0)
    tt.export_nw('','')
    
  13. Draw a cladogram. Negative branch lengths are currently not supported and will create messy lines. Dashes and node labels can be specified if needed.

    # A phylogeny of the genus Oryza
    
    twn = '((((((((A_O.sativa:0.1,A_O.glaberrima:0.1):0.1,(A_O.barthii:0.1,A_O.glumipatula:0.1):0.1):0.1,(A_O.meridionalis:0.1,A_O.nivara:0.1,A_O.rufipogon:0.1):0.1):0.1,B_O.punctata:0.1):0.1,((C_O.officinalis:0.1,C_O.alta:0.1):0.1,D_O.alta:0.1):0.1):0.1,E_O.australiensis:0.1):0.1,F_O.brachyantha:0.1):0.1,(K_O.coarctata:0.1,L_O.coarctata:0.1):0.1,OG_L.perrieri:0.1)'
    
    # import figure and specify dimensions. 
    from matplotlib.pyplot import figure
    import matplotlib.pyplot as plt
    figure(figsize=(max(5,len(bn.leaves(twn))/12), max(10,len(bn.leaves(twn))/5)), dpi=100)
    
    #draw cladogram with dashes and labels
    bn.draw_clad(bn.remove_node_labels(twn), dash = True, labels = True)
    plt.ylim(-1,len(bn.leaves(twn))+1)
    plt.gca().spines[['left','right', 'top']].set_visible(False)
    plt.gca().get_yaxis().set_visible(False)
    plt.xlabel('Substitutions/Site')
    plt.show()
    
    #draw cladogram without dashes.
    bn.draw_clad(bn.remove_node_labels(twn), dash = False, labels = True)
    plt.ylim(-1,len(bn.leaves(twn))+1)
    plt.gca().spines[['left','right', 'top']].set_visible(False)
    plt.gca().get_yaxis().set_visible(False)
    plt.xlabel('Substitutions/Site')
    plt.show()
    
    
    # Export with the bbox_inches = 'tight' argument to make sure the figure doesn't cut off.
    plt.savefig('BioNick_Example_Oryza_with_dashes.pdf', format = 'pdf', bbox_inches='tight')
    
    

    Example output:

    Dashed Not dashed