-
Notifications
You must be signed in to change notification settings - Fork 0
Neo4j::Core Lucene
Neo4j comes included with the lucene document database.
A common use case for using lucene is searching for one node and from that node traverse or use a cypher query.
The lucene integration uses the beforeCommit hook, see http://docs.neo4j.org/chunked/1.7/transactions-events.html. The method Neo4j::Node.trigger_on
is used to tell neo4j.rb which nodes or relationship it will index.
The Neo4j::Node.index
method is used to tell which properties are going to be index and which type of index should be used.
Example:
Neo4j::Node.trigger_on(:typex => 'MyTypeX')
Neo4j::Node.index :name
Neo4j::Node.index :description, :type => :fulltext
Neo4j::Node.index :age, :field_type => Fixnum
Notice:
only properties that have been set are added to the index. Thus a property with no set value and no default value will NOT be matched by a wildcard query (*)!
The declaration above will tell neo4j.rb that only nodes which have the property typex
value MyTypeX
will be indexed. When the before commit hook finds a node with that property it will index each property declared with the Neo4j::Node.index
method. You can declare several properties and each property can take several values. This is used for example when the same property (_classname
) can be triggered by different values (the name of each subclass) – see for example the use of trigger_on
in the Neo4j::NodeMixin
implemementation.
The default type of index being used is exact
. The example above declares index on property name
and description
with different types of indexes (exact and fulltext).
All values are indexed as Strings in lucene by default. If you want to index a value as a numeric value you can specify that with setting :field_type
to Fixnum or Float. By doing that allows you to use lucene range searches.
Notice, if you are using the Neo4j::NodeMixin
or Neo4j::Rails::
Model@ you define an index using the property
method instead.
Notice You can’t find using a String on none String field_types. Instead you must use hash queryes (e.g. Neo4j::Node.find(:age => 4) !
Instead of using the Neo4j::Node.index
or Neo4j::Relationship.index
methods you can define your own class for indexing. This allows you to have different lucene index files and configurations.
Example:
class MyIndex
extend Neo4j::Core::Index::ClassMethods
include Neo4j::Core::Index
self.node_indexer do
index_names :exact => 'myindex_exact', :fulltext => 'myindex_fulltext'
trigger_on :myindex => true # trigger on all nodes having property myindex == true
end
index :name
end
This class is also used when searching, e.g. MyIndex.find(:name => 'andreas').first
For index on relationship use the rel_indexer
method instead of the node_indexer
method.
You can use the lucene query language
Let say you have defined the index on the Neo4j::Node class like this:
Neo4j::Node.trigger_on(:typex => 'MyTypeX')
Neo4j::Node.index(:name)
And you have created a node like this:
a = Neo4j::Node.new(:name => 'andreas', :typex => 'MyTypeX')
Then you can find it using the lucene query syntax like this:
Neo4j::Node.find('name: andreas') {|result| puts result.first[:name] }
# or same if this if you want to close the lucene connection yourself
result = Neo4j::Node.find('name: andreas')
puts result.first[:name]
result.close
Notice It is important that you close the lucene connection. Either use the block approach or call the close method on the query result.
Notice The Neo4j::Rails::Model.find
and Neo4j::Rails::Relationship.find
methods does close the lucene connection automatically using a Rack.
A property in neo4j can be an array of (primitive) values.
Lucene does also support searching in arrays of values.
Example:
# don't forget declare index on things: Neo4j::Node.index :things
Neo4j::Transaction.new do
Neo4j::Node.new(:things => ['aaa', 'bbb', 'ccc'])
end
Neo4j::Node.find("things: bbb"){|r| puts r.first}
You can also search with an array of values. Example let find a node with property :things
having value "bbb"
or "ccc"
:
Neo4j::Node.find(:things => ["bbb", "ccc"]){|r| puts r.first}
The search will result in an OR lucene search for that property.
By default indexes are of type :exact
which is great for indexing keywords etc.
To index each word in a text you should use a fulltext index. Fulltext uses white-space tokenizer in its analyzer. Add the type :fulltext
(:exact
is default) when you declare the index and in the find method.
Example:
MyIndex.index :name, :type => :fulltext
MyIndex.find('name: andreas', :type => :fulltext).first #=> andreas
Notice You must specify the :type
in the find method unless you are using a :exact
index.
By using a hash instead of String inj the find method you can search on several properties at once (a compound AND lucene query).
MyIndex.find(:name => 'asd').or(:wheels => 8).first.should == thing3
You can make compound queries using the and
or
and not
method on the query result.
Example
MyIndex.find(:name => 'asd').or(:wheels => 8).first.should == thing3
If you want to do a numerical range search you must declare the index field_type
to Fixnum or Float
For Neo4j::Rails::Model
and Neo4j::NodeMixin
you do that with the :type
config on the property
instead of using the field_type
on the index
method.
There are two ways of doing range search – using the between method or using the Ruby Range class.
Example, using between
MyIndex.find(:age).between(2, 5)
Example, using the Ruby Range class:
MyIndex.find(:name => 'thing').and(:wheels => (9..15)).should be_empty
Notice Range queries on none String (e.g. :field_type
=> Fixnum) is not possible using a String lucene query, instead you must use a hash query, as shown above.
Use the asc
or desc
method on the query result, example:
MyIndex.find('name: *@gmail.com').asc(:name).desc(:age)
Notice you must have an index on the sorted fields.
You can instead of waiting for the transaction to finish manually index a node or relationship.
Example (from RSpec):
new_node = Neo4j::Node.new
new_node[:name] = 'Kalle Kula'
new_node.add_index(:name)
new_node.rm_index(:name)
new_node[:name] = 'lala'
new_node.add_index(:name)
Neo4j::Node.find('name: lala').first.should == new_node
Neo4j::Node.find('name: "Kalle Kula"').first.should_not == new_node
If you are looping thru a lot of nodes you might get better performance by not loading the Ruby wrappers
around the Java nodes.
MyIndex.find('name: andreas', :type => :fulltext, :wrapped => false)
When using the :wrapped => false
parameter the find method will return a Java org.neo4j.graphdb.index.IndexHit
instance
(which works like an Ruby Enumerable so you can use the normal each, collect etc.. methods)
It is possible to create your own lucene configuration.
Example, see the configuration for fulltext and exact indexing in the Neo4j::Config[:lucene]
You can add your own lucene indexing configuration in the Neo4j::Config and use it with the index keyword.
Neo4j::Config[:lucene][:my_index_type] = ...
class Person
index :name, :type => :my_index_type
end
I have not tested this :-)
Nil values will never be indexed !