Skip to content

GraphSON Format

okram edited this page Aug 18, 2012 · 11 revisions

GraphSON is a JSON based graph format developed by TinkerPop with readers and writers provided by Blueprints. Faunus makes use of a slight variation of the typical form that is:

  • vertex-centric: a row in a Faunus GraphSON file is a vertex, its properties, and its incident edges (and their respective properties).
  • long id based: vertex and edge ids must be longs as Faunus uses the long address space for all its graph computing operations.
  • less verbose: a row does not include _type information nor are both _inV and _outV long ids required to be represented by an edge as one of the ids can be inferred from the incident vertex.

The Graph of the Gods dataset deployed with all Aurelius products is represented below in Faunus GraphSON.

{"name":"saturn","type":"titan","_id":0,"_inE":[{"_label":"father","_id":12,"_outV":1}]}
{"name":"jupiter","type":"god","_id":1,"_outE":[{"_label":"lives","_id":13,"_inV":4},{"_label":"brother","_id":16,"_inV":3},{"_label":"brother","_id":14,"_inV":2},{"_label":"father","_id":12,"_inV":0}],"_inE":[{"_label":"brother","_id":17,"_outV":3},{"_label":"brother","_id":15,"_outV":2},{"_label":"father","_id":24,"_outV":7}]}
{"name":"neptune","type":"god","_id":2,"_outE":[{"_label":"lives","_id":20,"_inV":5},{"_label":"brother","_id":19,"_inV":3},{"_label":"brother","_id":15,"_inV":1}],"_inE":[{"_label":"brother","_id":18,"_outV":3},{"_label":"brother","_id":14,"_outV":1}]}
{"name":"pluto","type":"god","_id":3,"_outE":[{"_label":"pet","_id":23,"_inV":11},{"_label":"lives","_id":21,"_inV":6},{"_label":"brother","_id":17,"_inV":1},{"_label":"brother","_id":18,"_inV":2}],"_inE":[{"_label":"brother","_id":19,"_outV":2},{"_label":"brother","_id":16,"_outV":1}]}
{"name":"sky","type":"location","_id":4,"_inE":[{"_label":"lives","_id":13,"_outV":1}]}
{"name":"sea","type":"location","_id":5,"_inE":[{"_label":"lives","_id":20,"_outV":2}]}
{"name":"tartarus","type":"location","_id":6,"_inE":[{"_label":"lives","_id":21,"_outV":3},{"_label":"lives","_id":22,"_outV":11}]}
{"name":"hercules","type":"demigod","_id":7,"_outE":[{"_label":"mother","_id":25,"_inV":8},{"time":1,"_label":"battled","_id":26,"_inV":9},{"time":2,"_label":"battled","_id":27,"_inV":10},{"time":12,"_label":"battled","_id":28,"_inV":11},{"_label":"father","_id":24,"_inV":1}]}
{"name":"alcmene","type":"human","_id":8,"_inE":[{"_label":"mother","_id":25,"_outV":7}]}
{"name":"nemean","type":"monster","_id":9,"_inE":[{"time":1,"_label":"battled","_id":26,"_outV":7}]}
{"name":"hydra","type":"monster","_id":10,"_inE":[{"time":2,"_label":"battled","_id":27,"_outV":7}]}
{"name":"cerberus","type":"monster","_id":11,"_outE":[{"_label":"lives","_id":22,"_inV":6}],"_inE":[{"_label":"pet","_id":23,"_outV":3},{"time":12,"_label":"battled","_id":28,"_outV":7}]}

GraphSON is a space-expensive graph format in that it is a text-based markup language. However, it is convenient for many developers to work with as its structure is simple (easy to create and easy to parse). Note that Faunus only uses GraphSON as one of its supported formats for reading and writing a graph. Within a larger MapReduce job chain, Faunus makes use of binary Hadoop sequence files.

An Example with GraphSON DBpedia

Below is a small snippet of a single vertex of DBpedia that represents albedo. When DBpedia is represented in Faunus GraphSON, the resulting GraphSON file is approximately 23gigs.

{"_id":1,"name":"albedo","_outE":[{"_id":39812795,"_inV":23533115,"_label":"wikipagewikilink"} ... }

This GraphSON file can be pushed to HDFS.

faunus$ hadoop fs -put dbpedia.json dbpedia.json
faunus$ hadoop fs -ls
Found 1 items
-rw-r--r--   3 ubuntu supergroup 24687402417 2012-08-02 02:27 /user/ubuntu/dbpedia.json

Once in HDFS, it can be processed with Faunus. The below Faunus job is making use of 9 small instance Amazon EC2 machines with the following Hadoop configurations.

mapred.child.java.opts=-Xmx750m
mapred.tasktracker.map.tasks.maximum=1
faunus$ bin/faunus.sh 'g.V.degree("name",REVERSE,IN)'
12/08/03 07:31:23 INFO faunus.FaunusGraph: Generating job chain: g.V.vertexDegree("name",REVERSE,IN)
12/08/03 07:31:23 INFO faunus.FaunusGraph: Compiled to 1 MapReduce job(s)
12/08/03 07:31:23 INFO faunus.FaunusGraph: Executing job 1 out of 1: com.thinkaurelius.faunus.mapreduce.operators.SortedVertexDegree
12/08/03 07:31:28 INFO input.FileInputFormat: Total input paths to process : 1
12/08/03 07:31:28 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/08/03 07:31:29 INFO mapred.JobClient: Running job: job_201208022033_0018
12/08/03 07:31:30 INFO mapred.JobClient:  map 0% reduce 0%
12/08/03 07:32:18 INFO mapred.JobClient:  map 1% reduce 0%
...
faunus$ hadoop fs -cat output.txt/part-r-00000 | sed -n 1,11p
race and ethnicity in the united states census	243508
england	195539
france	170667
united kingdom	169357
germany	150264
canada	147741
animal	131959
list of sovereign states	129870
world war ii	128366
japan	118342
Clone this wiki locally