This is a work in progress...
-
TODO
- AWS EKS + Terraform
- Staging environment with production-like characteristics
- AWS EKS + Terraform
flowchart TD
Postgres(Postgres Database) -->|CDC| Kafka(Kafka Strimzi)
SQLServer(SQL Server Database) -->|CDC| Kafka
Kafka -->|AVRO Data Stream| ConsumerMinio(Minio S3)
ConsumerMinio -->|AVRO Data Stream| ConsumerSpark(Apache Spark)
ConsumerSpark --> |CDC Replication using Scala Engine - TODO| ConsumerDelta(Delta Lake)
ConsumerSpark --> |Data catalog, lineage| ConsumerDatahub(Datahub)
ConsumerSpark --> HiveMetastore(Hive metastore)
Kafka -->|Schema Management| SchemaRegistry(Confluent Schema Registry)
Kafka --> RedpandaConsole(Redpanda Console)
SchemaRegistry -->|Schema Use - API| ConsumerSpark
ConsumerDelta -->|Data Query| Trino(Trino)
click ConsumerDelta href "https://github.com/rogeriomm/debezium-cdc-replication-delta" "Visit GitHub repository"
Airflow(Apache Airflow) -->|Orchestrate| ConsumerSpark
Trino --> Zeppelin(Zeppelin)
Trino --> Jupyter(Jupyter)
Trino --> Metabase(Metabase)
class Postgres,SQLServer database;
class Kafka,SchemaRegistry kafka;
class ConsumerMinio,ConsumerSpark,ConsumerDelta consumers;
class Datahub datahub;
Internet Web (Protected by Firewall)
Public URL | Description | |||
---|---|---|---|---|
https://world-zeppelin.duckdns.org | Zeppelin | |||
https://world-jupyter.duckdns.org/jupyter | Jupyter notebook: Python, Scala, RUST |