-
Notifications
You must be signed in to change notification settings - Fork 6
Vocabulary
There are objects that are specific to the neo4j implementation.
While there aren't node type specifications in neo4j, in our implementation we use nodes in a number of ways.
There is a node with id 0 that contains metadata about the graph as a whole. It contains the properties
- newTaxUIDCurIter: a long that represents the highest taxomonic ID. The node will only contain this property if names have been added to the taxonomy in the graph
For each of the source trees loaded in the graph there will be a metadata node. This node stores information about the source tree and points to the root(s) of the source tree in the graph. They have the properties
- newick: the newick string that was originally imported for the file
- source: the name of the source that was added
- original_taxa_map: the list of the taxa (with the list being the original mapping to the Long ids of the nodes in the graph)
- and study specific nexson fields (see this reference for that information)
These are the typical nodes in the graph. They have the properties
- mrca: list of the taxa Long database ids included in this node
- nested_mrca: list of the nested taxa Long database ids included in this node
Taxonomy nodes are a special form of graph node. References to them are stored in the graphNamedNodes index (under the "name" key). If they have a non-empty tax_uid, they will be stored in the graphTaxUIDNodes index (under the tax_uid key. The nodes from the same taxonomy will be connected via TAXCHILDOF relationships. Taxonomy nodes have following properites (in addition to general properties listed in the "Default node" section):
- name: name for the node
- uniqname: unique name for the node (if there are homonyms)
- tax_rank: rank from the source taxonomy
- tax_source: source for the name such as "ncbi"
- tax_sourceid: original unique id from the source
- tax_sourcepid: original unique id for the parent from the source
- tax_parent_uid: ottol id for the parent
- tax_uid: ottol id for the name
- Question:: Won't there be a TAXCHILDOF relationship to the parent? If so, it seems like we don't need to store "tax_sourcepid" and "tax_parent_uid".
If an input taxonomy has an associated synonym file, then nodes will be created for each synonym. Each synonym node will have a SYNONYMOF relationship pointing to the taxonomy node that corresponds to the accepted name. The synonym node will be indexed in the graphNamedNodesSyns index (by the "name" key), and in the "synTaxUIDNodes" index (with key tax_uid) if it has a tax_uid. Each synonym node will have the following properties:
- name a string
- tax_uid a string representation of the ID
- nametype a string such as: "acronym", "anamorph", "authority", "blast name", "common name", "equivalent name", "genbank acronym", "genbank anamorph", "genbank common name", "genbank synonym", "includes", "in-part", "misnomer", "misspelling", "synonym", "teleomorph", "type material"
- source string such as "ncbi"
- MRCACHILDOF: the standard relationship in the graph that is used for traversing. There will only be one of these between two nodes. They have no relationship to the source trees that are input and are just used for easy traversing. The start node is the child and the end node is the parent. There are no properties for these relationships currently.
-
TAXCHILDOF: the standard relationship for connecting nodes with names. The start node is the child and the end node is the parent. Has the properties
- source: the name of the taxonomy source (usually "ottol")
- childid: the uid of the original child as ingest from the source.
- parentid: the uid of the original parent as ingest from the source
- Question: Are childid and parentid identical to the "tax_uid" property of the start and end nodes for the relationship? If so why do we store them? If not what is the distinction?
- SYNONYMOF: the relationship for connecting synonyms (if there are any) of a node to the node.
-
STREECHILDOF: the relationship connecting nodes that are the result of source tree ingest. There can be many of these connecting two nodes if there are nodes that are seen in many source trees. The start node is the child and the end node is the parent. These have the properties
- branch_length: branch lengths as read from the original source trees.
- licas: the reference to the least inclusive common ancestor nodes. There can be at least 1 and as many as there are ambiguous mappings.
- exclusive_mrca: the list of the mrcas that are exclusive to this node
- root_exclusive_mrca: the list of the taxa (with the list being the original mapping to the Long ids of the nodes in the graph) NOTE this is currently the same as the metadata node original_taxa_map and can be deleted once the references are corrected in the code
- source: the name of the source from which this relationship came
- inclusive_relids: the list of streechildof relationships that are involved with this mapping.
-
SYNTHCHILDOF: the relationship created from synthesis analysis. The start node is the child and the end node is the parent.
- name: name for the synthesis stored tree
- supporting_sources: an array of strings containing the names of the sources that exhibit (i.e. support) this relationship. source names correspond exactly to the source property of the supporting relationships and can be used to access the corresponding source metadata nodes via the souceMetaNodes index.
- METADATAFOR: the relationship pointing from metadata node to the root node(s). It currently has no additional properties.
Indices work with a key1,key2 and value. So for example, often key1 will just be "name" and then key2 will be a name like "Panda" and then the value would be the node in the graph for Panda. For node indices, the value will be a node and for relationships the value will be a relationship. There is also the ability to have
- sourceMetaNodes: key1=source, key2=the source name or id, the nodes are the metadata nodes for the input source trees
- graphNamedNodes: key1=name, key2=the name you are looking for, the nodes are nodes with names (probably from the taxonomy)
- synTaxUIDNodes: key1=tax_uid, key2=the ottol id of the synonym, the node will be the taxonomy node not the synonym
- sourceRootNodes: key1=rootnode or rootnodeForID, key2=the source name or id, the node will be the the source root node
- graphTaxNewNodes: key1=tax_uid or phylografter_study, key2=ottol id or phylografter study id, the node will be the taxonomy node
- graphNamedNodesSyns: key1=name, key2=synonym, the node will be the taxonomy node
- graphTaxUIDNodes: key1=tax_uid, key2=the ottol id of the node, the node will be the taxonomy node
- sourceRels: key1=source, key2=source name or id, the relationships in this source will all be returned
There are objects that are present in the jade package.