Skip to content

ethpandaops/analytics-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

analytics-pipeline

The repo contains some of the info/config/migrations for our analytics pipeline.

We currently extract data via xatu and load into clickhouse. We use Kafka and vector for the pipeline and also store raw data into s3 for a small period of time.

graph LR
    subgraph Pipeline
        XatuServer -->|HTTP sink > HTTP source| Vector1[Vector]
        Vector1 -->|Kafka sink| Kafka
        Kafka -->|Kafka source| Vector2[Vector]
        Kafka -->|Kafka source| Vector3[Vector]
    end
    subgraph Data Warehouse
        Vector2 -->|Clickhouse sink| Clickhouse[Clickhouse]
        Clickhouse --> dbt[dbt]
        dbt --> Clickhouse
    end
    subgraph Cloud object storage
        Vector3 -->|s3 sink| S3Events[r2 raw event backup bucket]
        S3Public[r2 Public bucket data.ethpandaops.io]
        S3GCS[GCS bucket]
    end
    subgraph Clients
        XatuSentry[Xatu Sentry] -->|beaconAPI| ConsensusClient[Consensus Client]
        XatuCannon[Xatu Cannon] -->|beaconAPI| ConsensusClient
        XatuSentry -->|gRPC| XatuServer[Xatu Server]
        XatuMimicryEL[Xatu Mimicry EL] -->|gRPC| XatuServer
        XatuMimicryCL[Xatu Mimicry CL] -->|gRPC| XatuServer
        XatuCannon -->|gRPC| XatuServer
        XatuDiscovery[Xatu Discovery] -->|gRPC| XatuServer
    end
    subgraph Parquet Exporter
        dailyexport[Daily export] -->|http| S3Public
        dailyexport -->|http| S3GCS
        dailyexport -->|native| Clickhouse
        hourlyexport[Hourly export] -->|http| S3Public
        hourlyexport -->|http| S3GCS
        hourlyexport -->|native| Clickhouse
        chunkexport[Chunk export] -->|http| S3Public
        chunkexport -->|native| Clickhouse
    end
    subgraph BigQuery
        S3GCS -->|Data transfer service| EUMainnet[EU Mainnet dataset]
        S3GCS -->|Data transfer service| USMainnet[US Mainnet dataset]
        S3GCS -->|Data transfer service| EUHolesky[EU Holesky dataset]
        S3GCS -->|Data transfer service| USHolesky[US Holesky dataset]
        S3GCS -->|Data transfer service| EUSepolia[EU Sepolia dataset]
        S3GCS -->|Data transfer service| USSepolia[US Sepolia dataset]
    end
    subgraph Cryo
        cryocronjob[Cryo extraction jobs] -->|http| Clickhouse
        cryocronjob-->|execution JSON RPC| ExecutionClient[Execution Client]
    end
Loading

We use kubernetes for orchestration and helm for deployment. Setup of kubernetes, helm and other tools is out of scope for this repo.

Directory layout

Kafka

We use strimzi-kafka-operator for the kafka cluster.

Clickhouse

We us Altinity clickhouse-operator for the clickhouse cluster and golange-migrate for migrations. The cluster is distributed with 3 shards and 2 replicas per shard.

Example golange-migrate command to run migrations:

# ! replace username/password/database with your own
migrate -database "clickhouse://127.0.0.1:9000?username=admin&password=XYZ&database=default&x-multi-statement=true&x-cluster-name='{cluster}'&x-migrations-table-engine=ReplicatedMergeTree" -path ./clickhouse/migrations up

Xatu sentries

We use ethereum-node helm chart to deploy xatu sentries along side CL nodes.

For the execution layer we typically deploy one execution layer client (nethermind) per region, then use eleel to allow us to have multiple consensus layer clients for one execution layer client as a cost saving measure.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages