Vocabulary

Neo4j objects

There are objects that are specific to the neo4j implementation.

Node "types" and properties

While there aren't node type specifications in neo4j, in our implementation we use nodes in a number of ways.

Database root node

There is a node with id 0 that contains metadata about the graph as a whole. It contains the properties

newTaxUIDCurIter: a long that represents the highest taxomonic ID. The node will only contain this property if names have been added to the taxonomy in the graph

Metadata node

For each of the source trees loaded in the graph there will be a metadata node. This node stores information about the source tree and points to the root(s) of the source tree in the graph. They have the properties

newick: the newick string that was originally imported for the file
source: the name of the source that was added
original_taxa_map: the list of the taxa (with the list being the original mapping to the Long ids of the nodes in the graph)
and study specific nexson fields (see this reference for that information)

Default node

These are the typical nodes in the graph. They have the properties

mrca: list of the taxa Long database ids included in this node
nested_mrca: list of the nested taxa Long database ids included in this node

Taxonomy node

Taxonomy nodes are a special form of graph node. References to them are stored in the graphNamedNodes index (under the "name" key). If they have a non-empty tax_uid, they will be stored in the graphTaxUIDNodes index (under the tax_uid key. The nodes from the same taxonomy will be connected via TAXCHILDOF relationships. Taxonomy nodes have following properites (in addition to general properties listed in the "Default node" section):

name: name for the node
uniqname: unique name for the node (if there are homonyms)
tax_rank: rank from the source taxonomy
tax_source: source for the name such as "ncbi"
tax_sourceid: original unique id from the source
tax_sourcepid: original unique id for the parent from the source
tax_parent_uid: ottol id for the parent
tax_uid: ottol id for the name
Question:: Won't there be a TAXCHILDOF relationship to the parent? If so, it seems like we don't need to store "tax_sourcepid" and "tax_parent_uid".

Synonym nodes

If an input taxonomy has an associated synonym file, then nodes will be created for each synonym. Each synonym node will have a SYNONYMOF relationship pointing to the taxonomy node that corresponds to the accepted name. The synonym node will be indexed in the graphNamedNodesSyns index (by the "name" key), and in the "synTaxUIDNodes" index (with key tax_uid) if it has a tax_uid. Each synonym node will have the following properties:

name a string
tax_uid a string representation of the ID
nametype a string such as: "acronym", "anamorph", "authority", "blast name", "common name", "equivalent name", "genbank acronym", "genbank anamorph", "genbank common name", "genbank synonym", "includes", "in-part", "misnomer", "misspelling", "synonym", "teleomorph", "type material"
source string such as "ncbi"

Relationship types and properties of relationships

MRCACHILDOF: the standard relationship in the graph that is used for traversing. There will only be one of these between two nodes. They have no relationship to the source trees that are input and are just used for easy traversing. The start node is the child and the end node is the parent. There are no properties for these relationships currently.
TAXCHILDOF: the standard relationship for connecting nodes with names. The start node is the child and the end node is the parent. Has the properties
- source: the name of the taxonomy source (usually "ottol")
- childid: the uid of the original child as ingest from the source.
- parentid: the uid of the original parent as ingest from the source
- Question: Are childid and parentid identical to the "tax_uid" property of the start and end nodes for the relationship? If so why do we store them? If not what is the distinction?
SYNONYMOF: the relationship for connecting synonyms (if there are any) of a node to the node.
STREECHILDOF: the relationship connecting nodes that are the result of source tree ingest. There can be many of these connecting two nodes if there are nodes that are seen in many source trees. The start node is the child and the end node is the parent. These have the properties
- branch_length: branch lengths as read from the original source trees.
- licas: the reference to the least inclusive common ancestor nodes. There can be at least 1 and as many as there are ambiguous mappings.
- exclusive_mrca: the list of the mrcas that are exclusive to this node
- root_exclusive_mrca: the list of the taxa (with the list being the original mapping to the Long ids of the nodes in the graph) NOTE this is currently the same as the metadata node original_taxa_map and can be deleted once the references are corrected in the code
- source: the name of the source from which this relationship came
- inclusive_relids: the list of streechildof relationships that are involved with this mapping.
SYNTHCHILDOF: the relationship created from synthesis analysis. The start node is the child and the end node is the parent.
- name: name for the synthesis stored tree
- supporting_sources: an array of strings containing the names of the sources that exhibit (i.e. support) this relationship. source names correspond exactly to the source property of the supporting relationships and can be used to access the corresponding source metadata nodes via the souceMetaNodes index.
METADATAFOR: the relationship pointing from metadata node to the root node(s). It currently has no additional properties.

Indices

Indices work with a key1,key2 and value. So for example, often key1 will just be "name" and then key2 will be a name like "Panda" and then the value would be the node in the graph for Panda. For node indices, the value will be a node and for relationships the value will be a relationship. There is also the ability to have

Node indices

sourceMetaNodes: key1=source, key2=the source name or id, the nodes are the metadata nodes for the input source trees
graphNamedNodes: key1=name, key2=the name you are looking for, the nodes are nodes with names (probably from the taxonomy)
synTaxUIDNodes: key1=tax_uid, key2=the ottol id of the synonym, the node will be the taxonomy node not the synonym
sourceRootNodes: key1=rootnode or rootnodeForID, key2=the source name or id, the node will be the the source root node
graphTaxNewNodes: key1=tax_uid or phylografter_study, key2=ottol id or phylografter study id, the node will be the taxonomy node
graphNamedNodesSyns: key1=name, key2=synonym, the node will be the taxonomy node
graphTaxUIDNodes: key1=tax_uid, key2=the ottol id of the node, the node will be the taxonomy node

Relationship indices

sourceRels: key1=source, key2=source name or id, the relationships in this source will all be returned

Jade objects

There are objects that are present in the jade package.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly