-
Notifications
You must be signed in to change notification settings - Fork 0
Neo4j::Core Traverse
Traversing nodes and relationships are one of the key benefits of using Neo4j.
In neo4j.rb there are two different ways of finding nodes.
- Use Lucene, see Neo4j::Wrapper-Lucene or Neo4j::Core-Lucene
- Use Neo4j cypher query language.
- Use Traversals
Neo4j is very efficient and can easily traverse 100.000 nodes/relationships of any depth.
Ruby is slower then Java, so you will always get better performance implementing
traverser filters in Java. However, compared to traversing relationships in SQL Neo4j.rb
is still several maginutes faster (e.g. >1000 ggr times faster for large data set, see here ).
For example, on my laptop I can traverse 500.000 nodes in 0.5 secondes with Java but in JRuby it takes about 4 seconds (with neo4j.rb version < 2.0 and old JRuby 1.5 !). For more information check the neo4j-perf github project
It could be very expensive to create Ruby wrappers around every native Neo4j java node.
You can avoid this by
- Use Neo4j::Node instead of Neo4j::NodeMixin or the Neo4j::Rails::Model (e.g. use the _rels and _node methods).
- Use Neo4j::IdentityMap (avoid loading the same Ruby wrapper more then once).
- Use the pacer gem, see below.
- Use the Neo4j::Core Cypher language
A Neo4j traversal is created using methods like #incoming and #outgoing method on the Neo4j::Node
. These methods are also available on the Neo4j::NodeMixin
and Neo4j::Rails::Model
. All the traversal methods such as #outgoing, #filter, #depth can be combined and chained. By default the start node is not included in the result. If you want to included the start node: node.outgoing(:foo).include_start_node
The traversal methods like :outgoing
returns a Neo4j::Core::Traversal::Traverser
which by default returns a an Ruby Enumerable of Neo4j::Node objects (or wrapper ruby classes if using the Neo4j::NodeMixin and Neo4j::Rails::Model). If you instead want to return path objects or relationships from the traversal you use the #path method. See this example
Check the Java Documentation for a more detailed info – http://docs.neo4j.org/chunked/snapshot/tutorials-java-embedded-traversal.html or the RDoc API YARD Neo4j::Core::Traversal
TIP: In all the code examples here, I have skipped creating Transactions. If you want to try these example, wrap the write operations (such as creating nodes and relationships) in a Neo4j::Transaction.run{}
block, or use the (Rails) Neo4j::Model instead which will create transactions automatically for you.
The result of a traversal returns an object which includes the Ruby Enumerable mixin.
This means that you can use any of the Enumerable method on traversals.
Example:
# find all nodes with property name == 'foo' from node a
# with outgoing relationship 'friends'
a.outgoing(:friends).find{|node| node[:name} == 'foo'}
# return an array names of all nodes from node a with
# outgoing relationship 'friends'
a.outgoing(:friends).collect{|node| node[:name}}
As shown in the example above the traversal returns Neo4j nodes. If you instead want to get the path objects.
a.outgoing(:frineds).paths.to_a
The following code:
a = Neo4j::Node.new
b = Neo4j::Node.new
c = Neo4j::Node.new
a.outgoing(:friends) << b << c
Creates the following nodes and relationships:
To find b
and c
:
a.outgoing(:friends)
Let say we have the following node space:
which is created with
a = Neo4j::Node.new :name => 'A'
b = Neo4j::Node.new :name => 'B'
c = Neo4j::Node.new :name => 'C'
d = Neo4j::Node.new :name => 'D'
e = Neo4j::Node.new :name => 'E'
a.outgoing(:friends) << b << c
b.outgoing(:friends) << d << e
c.outgoing(:friends) << b
To find A’s friends friends and his friends
a.outgoing(:friends).depth(2).each {|node| puts node[:name]}
The above example prints: B, C, D, E
In the example above let say we only want to include the friends friends nodes (D and E) and
not nodes at depth one.
a.outgoing(:friends).depth(2).
filter{|path| path.length == 2}.
each {|node| puts node[:name]}
The above example prints: D, E
In the following examples we are going to use a more complex graph which is created by the following code snippet:
a = Neo4j::Node.new :name => 'A'
b = Neo4j::Node.new :name => 'B'
c = Neo4j::Node.new :name => 'C'
d = Neo4j::Node.new :name => 'D'
e = Neo4j::Node.new :name => 'E'
f = Neo4j::Node.new :name => 'F'
g = Neo4j::Node.new :name => 'G'
Neo4j::Relationship.new(:friends, a, b)[:since] = 2008
Neo4j::Relationship.new(:friends, a, c)[:since] = 2005
Neo4j::Relationship.new(:friends, a, d)[:since] = 2003
Neo4j::Relationship.new(:friends, c, b)[:since] = 2004
Neo4j::Relationship.new(:friends, b, d)[:since] = 2001
Neo4j::Relationship.new(:friends, b, e)[:since] = 2010
Neo4j::Relationship.new(:friends, e, f)[:since] = 1998
Neo4j::Relationship.new(:friends, e, g)[:since] = 2010
Neo4j::Relationship.new(:recommend, a, d)
Neo4j::Relationship.new(:recommend, a, c)
Neo4j::Relationship.new(:recommend, c, g)
Creates this graph:
Several traversal methods uses a path parmeter for guiding the traversal.
The following methods are defined on the path object.
- #end_node – Returns the end node of this path.
- #last_relationship – Returns the last Relationship in this path.
- #length – Returns the length of this path.
- #nodes – Returns all the nodes in this path.
- #relationships – Returns all the relationships in between the nodes which this path consists of.
- #start_node – Returns the start node of this path.
You can traverse several relationship types at the same time:
a.outgoing(:friends).outgoing(:recommend).
each{|node| puts node[:name]}
The example above prints B, C and D.
You can traverse both incoming and outgoing relationships.
a.outgoing(:recommend).incoming(:friends).depth(:all).
each{|node| puts node[:name]}
The example above prints: D, C, B, G and E (not F).
This value specifies which paths will be traversed and if the same path should be traversed more then once.
The following values are possible:
- :node_global :: A node cannot be traversed more than once (default)
- :node_path :: For each returned node there ’s a unique path from the start node to it.
- :node_recent :: This is like :node_global, but only guarantees uniqueness among the most recent visited nodes, with a configurable count.
- :none :: No restriction (the user will have to manage it).
- :rel_global :: A relationship cannot be traversed more than once, whereas nodes can.
- :rel_path :: No restriction (the user will have to manage it).
- :rel_recent :: Same as for :node_recent, but for relationships.
See http://docs.neo4j.org/chunked/snapshot/examples-uniqueness-of-paths-in-traversals.html or the example below.
See also the example of generating a HTML view of a node space below or this gist
Instead of specifying which relationships should be traversed and which nodes should be returned you can supply the traversal with a code block.
The code block gets an Path argument (see above) and should return one of the following values:
- :exclude_and_continue
- :exclude_and_prune
- :include_and_continue
- :include_and_prune
Example:
b.eval_paths{|path| puts path; :exclude_and_continue}.depth(2).to_a
will print:
(2) (2)--[friends,4]-->(4) (2)--[friends,5]-->(5) (2)<--[friends,0]--(1) (2)<--[friends,3]--(3) (2)--[friends,5]-->(5)--[friends,6]-->(6) (2)--[friends,5]-->(5)--[friends,7]-->(7)
and return zero nodes since we always return :exclude_and_continue.
If we instead change the uniqueness to :node_path (:node_global is default)
b.eval_paths{|path| puts path; :exclude_and_continue}.depth(2).unique(:node_path).to_a.size
It will print the following paths:
(2)
(2)—[friends,4]—>(4)
(2)—[friends,5]—>(5)
(2)<—[friends,0]—(1)
(2)<—[friends,3]—(3)
(2)—[friends,4]—>(4)<—[friends,2]—(1)
(2)—[friends,4]—>(4)<—[recommend,8]—(1)
(2)—[friends,5]—>(5)—[friends,6]—>(6)
(2)—[friends,5]—>(5)—[friends,7]—>(7)
(2)<—[friends,0]—(1)—[friends,1]—>(3)
(2)<—[friends,0]—(1)—[friends,2]—>(4)
(2)<—[friends,0]—(1)—[recommend,8]—>(4)
(2)<—[friends,0]—(1)—[recommend,9]—>(3)
(2)<—[friends,3]—(3)<—[friends,1]—(1)
(2)<—[friends,3]—(3)—[recommend,10]—>(7)
(2)<—[friends,3]—(3)<—[recommend,9]—(1)
Notice that we same thing can be accomplished using the #filter and #prune methods, see below.
Instead of using the #eval_paths method you can just specify which nodes should be included in the traversal result.
The path end_node method returns the node it has traversed to.
Example: Find all your friends friends friends that are recommended by someone (uses the node space from the example above).
a.outgoing(:friends).outgoing(:recommend).depth(3).
filter{|path| path.end_node.rel?(:recommend, :incoming)}.
each{|node| puts node[:name]}
This prints C, D and G. There is also a start_node
method on the path
paramenter.
To only include nodes which has been friends before 2005 or is recommended by someone in my network (any depth).
a.outgoing(:friends).outgoing(:recommend).depth(:all).filter do |path|
path.last_relationship.rel_type == 'recommend' ||
path.last_relationship[:since] < 2005
end.each {|node| puts node[:name]}
The following prints D, G and F.
You can ‘cut of’ parts of the traversals.
Let say you don’t want to traverse past node B.
a.outgoing(:friends).outgoing(:recommend).depth(:all).
prune{|path| path.end_node[:name] == 'B'}.
each{|node| puts node[:name]}
The example above prints: B, C, D and G.
You can also implement this using :exclude_and_prune or :include_and_prune in the :expand_paths block.
Instead of specifying which relationship should be traversed with outgoing
and incoming
you can use the expand
method
to specify which relationship should be traversed.
Example, traverse all relationship with property age
above 5:
some_node.expand { |n| n._rels.find_all { |r| r[:age] > 5 } }.
depth(:all).to_a
# use _rels since it can be perform a little bit better since it does not wrap the Java Relationships
# with your own Ruby classes (if you have a Ruby class for that relationship).
You can set which order you should traverse.
- traverse depth first, visiting each node before visiting its child nodes, example:
node.outgoing(:foo).depth_first(:pre)
- traverse depth first, visiting each node after visiting its child nodes, example
node.outgoing(:foo).depth_first(:post)
- traverse breadth first, visiting each node before visiting its child nodes, example
node.outgoing(:foo).breadth_first(:pre)
- traverse breadth first, visiting each node after visiting its child nodes, example
node.outgoing(:foo).breadth_first(:post)
Here is an example of producing a HTML tree from the node space above.
It traverse the graph depth first.
def show_tree(parent)
s = ""
prev_path = nil
prev_level = -1
parent._java_node.outgoing(:friends).outgoing(:recommend).unique(:node_path).include_start_node.raw.paths.depth_first(:pre).depth(:all).to_a.each do |path|
n = path.end_node
level = path.length
# same level then close the previous HTML element
curr = prev_level
while curr >= level
curr -= 1
s << "#{space_indent(curr)}</ul>\n"
end
s << "#{space_indent(level)}<ul><li>name: #{n[:name]}</li>\n"
prev_level = level
end
prev_level.downto(0) {|level| s << "#{space_indent(level)}</ul>\n"}
s
end
def space_indent(level)
level.times.inject(""){|s, _| s + " "}
end
puts show_tree(a)
Notice that we get all the paths instead of the nodes when traversing in the example above.
It might be both faster and easier to express search traversals using the pacer gem, see pacer
require 'neo4j'
require 'pacer-neo4j'
Neo4j.start
g = Pacer.neo4j('/home/andreas/myrails_project/db/neo4j-development')
g.v.count #=> 1175734
g.e.count #=> 8269064
g.search_manual_indices = true
g.v('email'=>'arikan@gmail.com') # using lucene
friends.out_e(:friend).in_v(:type => 'person').except(friends).except(person).most_frequent(0...10)