Skip to content

Commit

Permalink
Data node docs (#492)
Browse files Browse the repository at this point in the history
* WIP data node docs

* Formatting fixes

* Add info about releases

* Tidy up

* Update repo urls

* Move files

* Update docs

---------

Co-authored-by: James Bayly <james@bayly.xyz>
Co-authored-by: James Bayly <bayly.james@gmail.com>
  • Loading branch information
3 people authored Apr 4, 2024
1 parent caf5a21 commit 9b16687
Show file tree
Hide file tree
Showing 5 changed files with 417 additions and 0 deletions.
2 changes: 2 additions & 0 deletions crowdin.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,5 @@ files:
translation: /docs/%two_letters_code%/run_publish/%original_file_name%
- source: /docs/subquery_network/*.md
translation: /docs/%two_letters_code%/subquery_network/%original_file_name%
- source: /docs/subquery_data_node/*.md
translation: /docs/%two_letters_code%/subquery_data_node/%original_file_name%
10 changes: 10 additions & 0 deletions docs/.vuepress/sidebar.ts
Original file line number Diff line number Diff line change
Expand Up @@ -583,6 +583,16 @@ export const getSidebar = (locale: string) =>
`${locale}/subquery_network/architects/publish.md`,
],
},
{
text: "SubQuery Data Node",
link: `${locale}/subquery_network/data_node/introduction.md`,
collapsible: true,
children: [
`${locale}/subquery_network/data_node/introduction.md`,
`${locale}/subquery_network/data_node/run.md`,
`${locale}/subquery_network/data_node/rpc.md`,
],
},
`${locale}/subquery_network/parameters.md`,
`${locale}/subquery_network/foundation.md`,
`${locale}/subquery_network/community.md`,
Expand Down
53 changes: 53 additions & 0 deletions docs/subquery_network/data_node/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Introduction

The performance limitations of data indexers and many other decentralised applications have long been limited by the RPC endpoint. Developers have long focused on building nodes to ensure efficient validation, ensuring the safety of the network. As a result of this sacrifice, RPCs are extremely costly, and they're not at all optimized for querying.

This is compounded by the rise of L2 chains, where, with higher throughput, the query performance and cost are major outstanding issues to address. With limited querying capabilities, many don’t support complex querying languages or operations, making it inefficient to retrieve specific data.

The **SubQuery Data Node** solves this; it's an enhanced RPC node that is perfectly optimized for querying, especially on endpoints like `eth_getLogs`, and provides the ability to filter transactions in a single API call. The Data Node will be open source, allowing people to contribute, extend, or fork the implementation in any way. It will support leading layer-2s and other EVM networks initially, and they will be optimised to run on the SubQuery Network in a decentralised way.

Working together, the SubQuery Indexer and SubQuery Data Node will provide the most performant indexing performance possible and in a completely decentralised way thanks to the SubQuery Network. Our node runners will be able to run these data nodes on the network, bringing powerful and cost-effective RPC access to all, cementing SubQuery’s position as the fastest decentralised data indexer in web3.

## Networks Supported

SubQuery currently provides these Data Nodes:

- [data-node-go-ethereum](https://github.com/subquery/data-node-go-ethereum) (forked from [ethereum/go-ethereum](https://github.com/ethereum/go-ethereum))
- Ethereum and testnets
- [data-node-op-geth](https://github.com/subquery/data-node-op-geth) (forked from [ethereum-optimism/op-geth](https://github.com/ethereum-optimism/op-geth))
- Optimism and testnets
- Base and testnets

### Releases

We aim to make releases in line with the official releases. These releases include Linux binaries on the releases page for each repo as well as Docker images. If you wish to run these on other platforms please checkout the source code and build for yourself.

## Differences from the original nodes

### Additional Data

In order to efficiently query transactions data, the nodes will now build bloom filters for each block. This bloom data is based on transaction `to`, `from` and the first 4 bytes of `input`. On top of this, there is also a bloombits indexer to increase the lookup performance of the bloom filters. This is effectively the same approach as what is used to filter logs, but applied to transactions.

::: note

To give an indication of the amount of extra data this will store. At [Sepolia Block 4909273](https://sepolia.etherscan.io/block/0x52b5a4e702041b63d8e0b5ab9377bb6372921bd530825991dfc82e23c2af98bf) the ancients data added an additional 1.2Gb of storage for the bloom filters.

:::

### New RPCs

In order to make use of the new transaction data, new RPC methods have also been added. These allow querying transaction data as well as a combination of transaction and logs. [Learn more about new RPCs](./rpc.md).

### Differences while running the Data Node

Running SubQuery data nodes is the same as running the unforked nodes but there is an additional option for exposing the new RPCS. [Read more here](./run.md)

## Roadmap

### Sharded Data Nodes

Later, SubQuery will work to democratise RPCs (and [solve EIP-4444](https://blog.subquery.network/using-the-subquery-sharded-data-node-to-solve-eip-4444/?lng=en)) in the process by delivering the Sharded Data Node, which will make RPCs cheaper to run and operate by node providers. EIP-4444 focuses on the sheer size of the node - an Ethereum archive requires about ~12 TB on Geth. SubQuery believes that in order to drive decentralisation of RPCs, you need to be able to make running these nodes easier and more accessible to everyday users.

SubQuery will extend its Data Node to support sharding, that is making each Data Node smaller by splitting up block ranges between node operators. Since SubQuery's Data Node only runs within the boundaries of a specific block range, it does not need to constantly sync new data, allowing it to optimize further for query performance rather than validation and verification.

Besides the standard RPC interface a normal RPC node and Data Node supports, A Sharded Data Node will also extend the Data Node to add endpoints that expose the block range of data/state it maintains. Significantly, the Sharded Data Node need not encompass historical chain state when functioning as a full node. This article, however, will exclusively discuss the Sharded Data Node in the context of an archive node.
Loading

0 comments on commit 9b16687

Please sign in to comment.