-
Notifications
You must be signed in to change notification settings - Fork 1
Usage with Gremlin
Gremlin is a graph traversal machine and language developed by Apache TinkerPop of the Apache Software Foundation.
As an explanatory analogy, Apache TinkerPop and Gremlin are to graph databases what the JDBC and SQL are to relational databases. Likewise, the Gremlin traversal machine is to graph computing as what the Java virtual machine is to general purpose computing.
Gremlin is incredibly easy to use. Gremlin is supported by multiple graph vendors such as Titan, Neo4j, OrientDB, Hadoop (Giraph), Hadoop (Spark), InfiniteGraph, BlazeGraph, DEX, Sqlg, Bitsy.
Caution: With each release of Tinkerpop, the gremlin query language has some breaking changes. To support Titan DB 1.0.0, Toomba is implemented in Gremlin in (Tinkerpop-3.0.1-Incubating) version.
Toomba creates Concept
, Keyword
, Entity
, Taxonomy
, CIConcept
(ConceptInsights Concept), and REMention
vertices that connect to your input vertex, which is defined by def n = g.V().hasLabel(nodeType).has("id", nodeId).next();
. The edges are labeled HAS_CONCEPT
, HAS_KEYWORD
etc, and all have score
properties. HAS_KEYWORD
and HAS_ENTITY
also have sentiment
properties, which ranges from [-1, 1]. Concept
, Keyword
and Entity
vertices are connected to Type
vertices through an IS_A
relationship, which continues as Type
vertices connect to other Type
vertices. Taxonomy
has its own chain of IS_A
relationships, but they connect to other Taxonomy
vertices, as does CIConcept
.
Nothing is required to set beforehand since you don't need to predefine any types in titan (since it's a graph database). We recommend setting indexes and unique constraints on all of the toomba types on the id
field to prevent duplicates and increase toombatization speed (most of the transactions check if the data exists before inserting the data).
At a high level, you make a post request to the toomba service with { nodeType, nodeId, content, contentType }
json-encoded in the post-body, and get a result in the form { message: '', status: 'OK | ERROR', transactions: []}
. The transactions will be an array of insert statements that you execute on your titan database. An example in javascript (ES7):
'use strict';
// import the axios library (a promise based HTTP client)
import axios from 'axios';
// first, you need to create the node. The transactions returned by toomba assume that a node with this
// type and this id exists already, and will attach concepts, entities, keywords, and taxonomies to this node.
await axios.post(url, {
"gremlin": "def g = graph.traversal(); g.V().hasLabel('test').has('id', 'Barack Obama').tryNext().orElseGet({g.addV(T.label, 'test', 'id', 'Barack Obama')})"
});
// only need to supply nodeType, nodeId, and content. contentType is text by default
// contentType URL and HTML are also supported.
// response is given as a json object with fields 'status' 'message' and 'transactions' which is an array of transactions.
const {data, ...rest} = await axios.post('http://<your ip>/gremlin/roombamatize', {
"nodeType": "test",
"nodeId": "Barack Obama",
"content": "Barack Obama is the president of the united states.",
"contentType": "text",
"annotators": { "alchemy": { "apikey": "" }, "relationship-extraction": { "username": "", "password": "" } }
});
// loop through transactions and execute them against the db.
for (let transaction of data.transactions) {
await axios.post(url, transaction);
}
console.log("done");