Skip to content

Latest commit

 

History

History
138 lines (92 loc) · 5.47 KB

INTEGRATION.md

File metadata and controls

138 lines (92 loc) · 5.47 KB

Chain Integration Document

Concepts

Blockchain data extraction occurs by two processes running conjunction a Firehose & Reader. We run an instrumented version of a process (usually a node) to sync the chain referred to as Firehose. The Firehose process instruments the blockchain and outputs logs over the standard output pipe, which is subsequently read and processed by the Reader process.
The Reader process will read, and stitch together the output of Firehose to create rich blockchain data models, which it will subsequently write to files. The data models in question are Google Protobuf Structures.

Data Modeling

Designing the Google Protobuf Structures for your given blockchain is one of the most important steps in an integrators journey. The data structures needs to represent as precisely as possible the on chain data and concepts. By carefully crafting the Protobuf structure, the next steps will be a lot simpler. The data model need.

As a reference, here is Ethereum's Protobuf Structure: https://github.com/streamingfast/proto-ethereum/blob/develop/sf/ethereum/codec/v1/codec.proto

Running the Demo Chain

We have built an end-to-end template, to start the on-boarding process of new chains. This solution consist of:

firehose-acme As mentioned above, the Reader process consumes the data that is extracted and streamed from Deepmind. In Actuality the Reader is one process out of multiple ones that creates the Firehose. These processes are launched by one application. This application is chain specific and by convention, we name is "firehose-". Though this application is chain specific, the structure of the application is standardized and is quite similar from chain to chain. For convenience, we have create a boiler plate app to help you get started. We named our chain Acme this the app is firehose-acme

Firehose Firehose consist of an instrumented syncing node. We have created a "dummy" chain to simulate a node process syncing that can be found https://github.com/streamingfast/dummy-blockchain.

Setting up the dummy chain

Clone the repository:

git clone https://github.com/streamingfast/dummy-blockchain.git
cd dummy-blockchain

Install dependencies:

go mod download

Then build the binary:

make build

Ensure the build was successful

./dchain --version

Take note of the location of the built dchain binary, you will need to configure firehose-acme with it.

Setting up firehose-acme

Clone the repository:

git clone git@github.com:streamingfast/firehose-acme.git
cd firehose-acme

Configure firhose test etup

cd devel/standard/
vi standard.yaml

modify the flag reader-node-path: "dchain" to point to the path of your dchain binary you compiled above

Starting and testing Firehose

all subsequent commands are run from the devel/standard/ directory

Start fireacme

./start.sh

This will launch fireacme application. Behind the scenes we are starting 3 sub processes: reader-node, relayer, firehose

reader-node

The reader-node is a process that runs and manages the blockchain node Geth. It consumes the blockchain data that is extracted from our instrumented Geth node. The instrumented Geth node outputs individual block data. The reader-node process will either write individual block data into separate files called one-block files or merge 100 blocks data together and write into a file called 100-block file.

This behaviour is configurable with the reader-node-merge-and-store-directly flag. When running the reader-node process with reader-node-merge-and-store-directly flag enable, we say the “reader is running in merged mode”. When the flag is disabled, we will refer to the reader as running in its normal mode of operation.

In the scenario where the reader-node process stores one-block files. We can run a merger process on the side which would merge the one-block files into 100-block files. When we are syncing the chain we will run the reader-node process in merged mode. When we are synced we will run the reader-node in it’s regular mode of operation (storing one-block files)

The one-block files and 100-block files will be store in data-dir/storage/merged-blocks and data-dir/storage/one-blocks respectively. The naming convention of the file is the number of the first block in the file.

As the instrumented node process outputs blocks, you can see the merged block files in the working dir

ls -las ./fire-data/storage/merged-blocks

We have also built tools that allow you to introspect block files:

go install ../../cmd/fireacme && fireacme tools print blocks --store ./fire-data/storage/merged-blocks 100

At this point we have reader-node process running as well a relayer & firehose process. Both of these processes work together to provide the Firehose data stream. Once the Firehose process is running, it will be listening on port 13042. At it’s core the Firehose is a gRPC stream. We can list the available gRPC service

grpcurl -plaintext localhost:16042 list

We can start streaming blocks with sf.firehose.v1.Stream Service:

grpcurl -plaintext -d '{"start_block_num": 10}' localhost:16042 sf.firehose.v1.Stream.Blocks

License

Apache 2.0