Skip to content

Latest commit

 

History

History
170 lines (134 loc) · 5.79 KB

README.md

File metadata and controls

170 lines (134 loc) · 5.79 KB

Aerospike Graph Synth

Graph Synthesizer Logo

Graph Synth is a tool for synthesizing graph structured datasets.

You can run Graph Synth via either a traditional Command Line Interface (CLI) or via a Gremlin call() step.

Getting Started

Clone the repo

 git clone --recursive git@github.com:aerospike/graph-synth.git

To get started, download the latest jar release here.

Usage

$ java -jar GraphSynth-1.0.0.jar --help
Graph Synth, by Aerospike.
Usage: GraphSynth [--help] [--scale-factor=<scaleFactor>]
                  [--input-uri=<inputUri>] [--output-uri=<outputUri>]
                  [--list-sample-schemas] [--load-sample=<loadSample>]
                  [--dump-sample=<dumpSample>] [--export-schema]
                  [--load-schema] [--clear] [--set=<String=String>]... [--debug]
      --help            Help
      --scale-factor=<scaleFactor>
                        Comma delimited list of scale factors
      --input-uri=<inputUri>
                        File or Gremlin Server URI for schema, supported
                          schemes:
                         file://
                         ws://
                         wss://
      --output-uri=<outputUri>
                        File or Gremlin Server URI for output, supported
                          schemes:
                         file://
                         ws://
                         wss://
      --list-sample-schemas
                        List Sample Schemas
      --load-sample=<loadSample>
                        Load Sample to Gremlin Server
      --dump-sample=<dumpSample>
                        Dump Sample Schema to YAML
      --export-schema   Export Schema from Gremlin Server to YAML file
      --load-schema     Load YAML Schema to Gremlin Server
      --clear           Delete and overwrite existing remote graph
      --set=<String=String>
                        Set or override configuration key
      --debug           Show Debug Output

Example Usage

Using a yaml schema file to write out csv data:

$ java -jar GraphSynth-1.0.0.jar \
  --input-uri=file:$(pwd)/conf/schema/gdemo_schema.yaml \
  --output-uri=file:/tmp/output-data \
  --scale-factor=10

You can list some built-in sample schemas with the --list-sample-schemas command:

$ java -jar GraphSynth-1.0.0-SNAPSHOT.jar --list-sample-schemas
Synthetic
Simplest
Benchmark2024
GDemoSchema

If you have a Gremlin Server handy, you can load a schema into it. This may be useful for exploring and modifying the schema. Be careful, note the --clear option will erase your existing graph.

$ java -jar graph-synth/target/GraphSynth-1.0.0-SNAPSHOT.jar --load-sample GDemoSchema --output-uri=ws://localhost:8182/g --clear
$ 

Once you have it loaded, you can use that remote schema to generate data.

Here is an example of using a remote schema graph to write out csv data at 3 different scales:

$ java -jar GraphSynth-1.0.0.jar \
  --input-uri=ws://localhost:8182/g \
  --output-uri=file:/tmp/output-data \
  --scale-factor=10,100,1000
...

Files generated at Scale Factor: 10 26
Files generated at Scale Factor: 1000 260
Files generated at Scale Factor: 100 26
$

You can also generate directly into a remote gremlin server.

Either from YAML:

$ java -jar GraphSynth-1.0.0.jar \
   --input-uri=file:$(pwd)/conf/schema/gdemo_schema.yaml \
   --output-uri=ws://localhost:8182/g  \
   --scale-factor=1000 \
   --clear

Or directly from a schema graph into another graph:

$ java -jar GraphSynth-1.0.0.jar \
  --input-uri=ws://my-gremlin-schema-server:8182/schema \
  --output-uri=ws://my-gremlin-schema-server:8182/g \
  --scale-factor=77

Configuration and Schema

Most configuration options can be provided on the command line. You will however need to provide a Graph Schema

You can declare a schema in yaml or in Gremlin

Sample schema yaml files are provided in the conf directory

Read more about Schema declerations in the docs section

Building

Maven is used as the build tool for this project. A simple script is provided to build the project.

$ script/build.sh

...

[INFO] Movement 1.0.0-SNAPSHOT ............................ SUCCESS [  0.073 s]
[INFO] core 1.0.0-SNAPSHOT ................................ SUCCESS [  1.513 s]
[INFO] cluster 1.0.0-SNAPSHOT ............................. SUCCESS [  0.160 s]
[INFO] plugin 1.0.0-SNAPSHOT .............................. SUCCESS [  0.068 s]
[INFO] extensions 1.0.0-SNAPSHOT .......................... SUCCESS [  0.001 s]
[INFO] tinkerpop 1.0.0-SNAPSHOT ........................... SUCCESS [  2.249 s]
[INFO] files 1.0.0-SNAPSHOT ............................... SUCCESS [  0.182 s]
[INFO] cli 1.0.0-SNAPSHOT ................................. SUCCESS [  1.754 s]
[INFO] GraphSynth 1.1.0-SNAPSHOT .......................... SUCCESS [  0.005 s]
[INFO] GraphSynth 1.1.0-SNAPSHOT .......................... SUCCESS [  2.312 s]
[INFO] integration 1.0.0-SNAPSHOT ......................... SUCCESS [  0.112 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  8.486 s
[INFO] Finished at: 2024-09-19T13:24:04-07:00
[INFO] ------------------------------------------------------------------------


$ ls graph-synth/target/GraphSynth-1.1.0-SNAPSHOT.jar
graph-synth/target/GraphSynth-1.1.0-SNAPSHOT.jar

License

This project is provided under the Apache2 software license.

No Warranty

Graph Synth is provided without warranty and is intended for testing and pre-production environments. It is not recommended for production operations.