-
Notifications
You must be signed in to change notification settings - Fork 58
GraphSON Format
GraphSON is a JSON based graph format developed by TinkerPop with readers and writers provided by Blueprints. Faunus makes use of a slight variation of the typical form that is:
- vertex-centric: a row in a Faunus GraphSON file is a vertex, its properties, and its incident edges (and their respective properties).
- long id based: vertex and edge ids must be longs as Faunus uses the long address space for all its graph computing operations.
-
less verbose: a row does not include
_type
information nor are both_inV
and_outV
long ids required to be represented by an edge as one of the ids can be inferred from the incident vertex.
The Graph of the Gods dataset deployed with all Aurelius products is represented below in Faunus GraphSON.
{"name":"saturn","type":"titan","_id":0,"_inE":[{"_label":"father","_id":12,"_outV":1}]}
{"name":"jupiter","type":"god","_id":1,"_outE":[{"_label":"lives","_id":13,"_inV":4},{"_label":"brother","_id":16,"_inV":3},{"_label":"brother","_id":14,"_inV":2},{"_label":"father","_id":12,"_inV":0}],"_inE":[{"_label":"brother","_id":17,"_outV":3},{"_label":"brother","_id":15,"_outV":2},{"_label":"father","_id":24,"_outV":7}]}
{"name":"neptune","type":"god","_id":2,"_outE":[{"_label":"lives","_id":20,"_inV":5},{"_label":"brother","_id":19,"_inV":3},{"_label":"brother","_id":15,"_inV":1}],"_inE":[{"_label":"brother","_id":18,"_outV":3},{"_label":"brother","_id":14,"_outV":1}]}
{"name":"pluto","type":"god","_id":3,"_outE":[{"_label":"pet","_id":23,"_inV":11},{"_label":"lives","_id":21,"_inV":6},{"_label":"brother","_id":17,"_inV":1},{"_label":"brother","_id":18,"_inV":2}],"_inE":[{"_label":"brother","_id":19,"_outV":2},{"_label":"brother","_id":16,"_outV":1}]}
{"name":"sky","type":"location","_id":4,"_inE":[{"_label":"lives","_id":13,"_outV":1}]}
{"name":"sea","type":"location","_id":5,"_inE":[{"_label":"lives","_id":20,"_outV":2}]}
{"name":"tartarus","type":"location","_id":6,"_inE":[{"_label":"lives","_id":21,"_outV":3},{"_label":"lives","_id":22,"_outV":11}]}
{"name":"hercules","type":"demigod","_id":7,"_outE":[{"_label":"mother","_id":25,"_inV":8},{"time":1,"_label":"battled","_id":26,"_inV":9},{"time":2,"_label":"battled","_id":27,"_inV":10},{"time":12,"_label":"battled","_id":28,"_inV":11},{"_label":"father","_id":24,"_inV":1}]}
{"name":"alcmene","type":"human","_id":8,"_inE":[{"_label":"mother","_id":25,"_outV":7}]}
{"name":"nemean","type":"monster","_id":9,"_inE":[{"time":1,"_label":"battled","_id":26,"_outV":7}]}
{"name":"hydra","type":"monster","_id":10,"_inE":[{"time":2,"_label":"battled","_id":27,"_outV":7}]}
{"name":"cerberus","type":"monster","_id":11,"_outE":[{"_label":"lives","_id":22,"_inV":6}],"_inE":[{"_label":"pet","_id":23,"_outV":3},{"time":12,"_label":"battled","_id":28,"_outV":7}]}
GraphSON is a space-expensive graph format in that it is a text-based markup language. However, it is convenient for many developers to work with as its structure is simple (easy to create and easy to parse). Note that Faunus only uses GraphSON as one of its supported formats for reading and writing a graph. Within a larger MapReduce job chain, Faunus makes use of binary Hadoop sequence files.
Below is a small snippet of a single vertex of DBpedia that represents albedo. When DBpedia is represented in Faunus GraphSON, the resulting GraphSON file is approximately 23gigs.
{"_id":1,"name":"albedo","_outE":[{"_id":39812795,"_inV":23533115,"_label":"wikipagewikilink"} ... }
This GraphSON file can be pushed to HDFS.
faunus$ hadoop fs -put dbpedia.json dbpedia.json
faunus$ hadoop fs -ls
Found 1 items
-rw-r--r-- 3 ubuntu supergroup 24687402417 2012-08-02 02:27 /user/ubuntu/dbpedia.json
Once in HDFS, it can be processed with Faunus. The below Faunus job is making use of 9 small instance Amazon EC2 machines with the following Hadoop configurations.
mapred.child.java.opts=-Xmx750m
mapred.tasktracker.map.tasks.maximum=1
faunus$ bin/faunus.sh 'g.V.degree("name",REVERSE,IN)'
12/08/03 07:31:23 INFO faunus.FaunusGraph: Generating job chain: g.V.vertexDegree("name",REVERSE,IN)
12/08/03 07:31:23 INFO faunus.FaunusGraph: Compiled to 1 MapReduce job(s)
12/08/03 07:31:23 INFO faunus.FaunusGraph: Executing job 1 out of 1: com.thinkaurelius.faunus.mapreduce.operators.SortedVertexDegree
12/08/03 07:31:28 INFO input.FileInputFormat: Total input paths to process : 1
12/08/03 07:31:28 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/08/03 07:31:29 INFO mapred.JobClient: Running job: job_201208022033_0018
12/08/03 07:31:30 INFO mapred.JobClient: map 0% reduce 0%
12/08/03 07:32:18 INFO mapred.JobClient: map 1% reduce 0%
...
faunus$ hadoop fs -cat output.txt/part-r-00000 | sed -n 1,11p
race and ethnicity in the united states census 243508
england 195539
france 170667
united kingdom 169357
germany 150264
canada 147741
animal 131959
list of sovereign states 129870
world war ii 128366
japan 118342