ETL service that converts relational blockchain data into a native graph format for storage in ArangoDB.
Helium's Blockchain API is an effective way to view historical data stored on-chain, but the ledger-based format is less useful for feeding directly into network models. In this project, we propose to build a framework for a graph-based representation of blockchain activity, including Proof of Coverage and Token Flow. By capturing the natural adjacency between hotspots and accounts, we will be able to build machine learning models to, for instance, identify likely "gaming" behavior and predict coverage maps based on hotspot placement.
More details in the full project proposal.
To run helium-arango-etl, you will need:
- Read/write access to a running ArangoDB instance.
- e.g.
docker run -d --name arango -p 8529:8529 -e ARANGO_ROOT_PASSWORD=openSesame arangodb/arangodb:3.8.2
- If running locally, you can view the Arango WebUI at
http://localhost:8529/
- e.g.
- Read access to a PostgreSQL database populated by a blockchain-etl node.
-
Make a copy of
.env.template
called.env
and include the URL's and credentials to access both databases. -
Build the docker image with:
docker build -t helium-arango-etl:latest .
-
Run the container with:
docker run -d --name etl helium-arango-etl
-
To view logs:
docker exec etl tail -f logs/etl.log
Exploring the Helium Network with Graph Theory
: Blog post inspiring much of this work.evandiewald/helium-arango-http
: an HTTP API to run queries on the data stored in the ArangoDB database populated by this ETL.evandiewald/helium-arango-analysis
: (coming soon) methods and models for running Graph Theory- and Graph Neural Network-based analyses of the Arango graphs using Python-friendly formats, such as networkx and torch-geometric.
Pull requests are welcome, especially when it comes to adding additional interesting queries. The focus of this project is to leverage the native graph format of ArangoDB to run analyses that are not already covered by the Blockchain API, such as token flow and coverage mapping. If you are not familiar with ArangoDB, the AQL query language allows for powerful extraction of adjacencies in the dataset.
This project is supported by a grant from the Decentralized Wireless Alliance.