Analytics/NLP engine for decentralised social networks.
Data on Decentralised social networks is sharded and distributed across multiple nodes. Unlike blockchains there is no single source of truth. Decentralised analytics aims to provide a robust, architecture that facilitates analytics for decentralised networks such as Nostr and Farcaster.
- Python Engine
- Prefect for Orchestration
- Apache Kafka for data streaming
- BigQuery for Data Warehousing
- DBT for data modelling
- Looker Studio for dashboards
- Number of active relays distributed on a geographic map
- Real-time dashboard of events of kinds 1,7 and 30023 kinds
-
Clone this repo!
-
Create free tier accounts for the following:
-
Generate GCP key that has admin rights to BigQuery and set default credentials
-
Install Terraform and [dbt](https://docs.getdbt.com/docs/core/pip-install
-
Configure the .env.prd with api keys, etc..
cd infrastructure/terraform/ && terraform apply
Run commands:
Nostr Events producer:
docker compose -f docker-compose.yml up produce_events
Nostr Events consumer:
docker compose -f docker-compose.yml up process_events
Nostr Relays producer:
docker compose -f docker-compose.yml up produce_relays
Nostr Relays consumer:
docker compose -f docker-compose.yml up process_relays
Run DBT Transformation for dashboards after producers/consumers are running:
cd transformation/ && dbt run
pdm install
cd tests && pytest
- Installation / deployment needs streamlining.
- Remove dependency on IP Geolocation service. Import data into BigQuery from free resources.
- Prefect Orchestration could be refactored without Click and retain similar functionality.
- More tests
- Better dashboards
- NLP analytics on content
Thanks to @jessthibault author of python-nostr where the Nostr base models were largely taken from and modified.