The repo contains some of the info/config/migrations for our analytics pipeline.
We currently extract data via xatu and load into clickhouse. We use Kafka and vector for the pipeline and also store raw data into s3 for a small period of time.
graph LR
subgraph Pipeline
XatuServer -->|HTTP sink > HTTP source| Vector1[Vector]
Vector1 -->|Kafka sink| Kafka
Kafka -->|Kafka source| Vector2[Vector]
Kafka -->|Kafka source| Vector3[Vector]
end
subgraph Data Warehouse
Vector2 -->|Clickhouse sink| Clickhouse[Clickhouse]
Clickhouse --> dbt[dbt]
dbt --> Clickhouse
end
subgraph Cloud object storage
Vector3 -->|s3 sink| S3Events[r2 raw event backup bucket]
S3Public[r2 Public bucket data.ethpandaops.io]
S3GCS[GCS bucket]
end
subgraph Clients
XatuSentry[Xatu Sentry] -->|beaconAPI| ConsensusClient[Consensus Client]
XatuCannon[Xatu Cannon] -->|beaconAPI| ConsensusClient
XatuSentry -->|gRPC| XatuServer[Xatu Server]
XatuMimicryEL[Xatu Mimicry EL] -->|gRPC| XatuServer
XatuMimicryCL[Xatu Mimicry CL] -->|gRPC| XatuServer
XatuCannon -->|gRPC| XatuServer
XatuDiscovery[Xatu Discovery] -->|gRPC| XatuServer
end
subgraph Parquet Exporter
dailyexport[Daily export] -->|http| S3Public
dailyexport -->|http| S3GCS
dailyexport -->|native| Clickhouse
hourlyexport[Hourly export] -->|http| S3Public
hourlyexport -->|http| S3GCS
hourlyexport -->|native| Clickhouse
chunkexport[Chunk export] -->|http| S3Public
chunkexport -->|native| Clickhouse
end
subgraph BigQuery
S3GCS -->|Data transfer service| EUMainnet[EU Mainnet dataset]
S3GCS -->|Data transfer service| USMainnet[US Mainnet dataset]
S3GCS -->|Data transfer service| EUHolesky[EU Holesky dataset]
S3GCS -->|Data transfer service| USHolesky[US Holesky dataset]
S3GCS -->|Data transfer service| EUSepolia[EU Sepolia dataset]
S3GCS -->|Data transfer service| USSepolia[US Sepolia dataset]
end
subgraph Cryo
cryocronjob[Cryo extraction jobs] -->|http| Clickhouse
cryocronjob-->|execution JSON RPC| ExecutionClient[Execution Client]
end
We use kubernetes for orchestration and helm for deployment. Setup of kubernetes, helm and other tools is out of scope for this repo.
vector/http
- Vector config for Xatu server -> Kafkavector/clickhouse
- Vector config for Kafka -> clickhousevector/s3
- Vector config for Kafka -> s3clickhouse/migrations
- Clickhouse migrationsclickhouse/helm/clickhouse
- Clickhouse helm chartclickhouse/helm/zookeeper
- Zookeeper helm chartkafka
- Kafka configxatu/helm/server
- Xatu server helm chartxatu/helm/sentry
- Xatu sentry helm chartxatu/helm/mimicry
- Xatu mimicry helm chartxatu/helm/cannon
- Xatu cannon helm chartxatu/helm/discovery
- Xatu discovery helm chart
We use strimzi-kafka-operator for the kafka cluster.
We us Altinity clickhouse-operator for the clickhouse cluster and golange-migrate for migrations. The cluster is distributed with 3 shards and 2 replicas per shard.
Example golange-migrate command to run migrations:
# ! replace username/password/database with your own
migrate -database "clickhouse://127.0.0.1:9000?username=admin&password=XYZ&database=default&x-multi-statement=true&x-cluster-name='{cluster}'&x-migrations-table-engine=ReplicatedMergeTree" -path ./clickhouse/migrations up
We use ethereum-node helm chart to deploy xatu sentries along side CL nodes.
For the execution layer we typically deploy one execution layer client (nethermind) per region, then use eleel to allow us to have multiple consensus layer clients for one execution layer client as a cost saving measure.