Skip to content

Usage with Gremlin

Taimur Shah edited this page Oct 6, 2016 · 1 revision

Gremlin is a graph traversal machine and language developed by Apache TinkerPop of the Apache Software Foundation.

As an explanatory analogy, Apache TinkerPop and Gremlin are to graph databases what the JDBC and SQL are to relational databases. Likewise, the Gremlin traversal machine is to graph computing as what the Java virtual machine is to general purpose computing.

Gremlin is incredibly easy to use. Gremlin is supported by multiple graph vendors such as Titan, Neo4j, OrientDB, Hadoop (Giraph), Hadoop (Spark), InfiniteGraph, BlazeGraph, DEX, Sqlg, Bitsy.

Caution: With each release of Tinkerpop, the gremlin query language has some breaking changes. To support Titan DB 1.0.0, Toomba is implemented in Gremlin in (Tinkerpop-3.0.1-Incubating) version.

Schema

Toomba creates Concept, Keyword, Entity, Taxonomy, CIConcept (ConceptInsights Concept), and REMention vertices that connect to your input vertex, which is defined by def n = g.V().hasLabel(nodeType).has("id", nodeId).next();. The edges are labeled HAS_CONCEPT, HAS_KEYWORD etc, and all have score properties. HAS_KEYWORD and HAS_ENTITY also have sentiment properties, which ranges from [-1, 1]. Concept, Keyword and Entity vertices are connected to Type vertices through an IS_A relationship, which continues as Type vertices connect to other Type vertices. Taxonomy has its own chain of IS_A relationships, but they connect to other Taxonomy vertices, as does CIConcept.

Nothing is required to set beforehand since you don't need to predefine any types in titan (since it's a graph database). We recommend setting indexes and unique constraints on all of the toomba types on the id field to prevent duplicates and increase toombatization speed (most of the transactions check if the data exists before inserting the data).

schema

Ingestion

At a high level, you make a post request to the toomba service with { nodeType, nodeId, content, contentType } json-encoded in the post-body, and get a result in the form { message: '', status: 'OK | ERROR', transactions: []}. The transactions will be an array of insert statements that you execute on your titan database. An example in javascript (ES7):

'use strict';
// import the axios library (a promise based HTTP client)
import axios from 'axios';

// first, you need to create the node. The transactions returned by toomba assume that a node with this
// type and this id exists already, and will attach concepts, entities, keywords, and taxonomies to this node.
await axios.post(url, {
  "gremlin": "def g = graph.traversal(); g.V().hasLabel('test').has('id', 'Barack Obama').tryNext().orElseGet({g.addV(T.label, 'test', 'id', 'Barack Obama')})"
});

// only need to supply nodeType, nodeId, and content. contentType is text by default
// contentType URL and HTML are also supported.
// response is given as a json object with fields 'status' 'message' and 'transactions' which is an array of transactions.
const {data, ...rest} = await axios.post('http://<your ip>/gremlin/roombamatize', {
  "nodeType": "test",
  "nodeId": "Barack Obama",
  "content": "Barack Obama is the president of the united states.",
  "contentType": "text",
  "annotators": { "alchemy": { "apikey": "" }, "relationship-extraction": { "username": "", "password": "" } }
});

// loop through transactions and execute them against the db.
for (let transaction of data.transactions) {
  await axios.post(url, transaction);
}

console.log("done");
Clone this wiki locally