This repository contains the PostgresSync function, a Pulsar function designed to synchronize data from a Debezium source to a PostgreSQL database.
PostgresSync listens to events produced by Debezium, processes the incoming records, and writes the transformed data to a PostgreSQL database. It's designed to be efficient, robust, and scalable.
- Apache Pulsar setup and running
- PostgreSQL database setup and running
- Debezium connector setup with a source (e.g., MySQL, MongoDB, etc.)
-
Clone the Repository:
git clone <repository-url> cd <repository-directory>
-
Install Dependencies: Ensure you have
pipinstalled:pip install -r requirements.txt
-
Configure Debezium: Ensure your Debezium connector is correctly configured and is publishing events to a Pulsar topic.
-
Run the
PostgresSyncFunction:./bin/pulsar-admin functions localrun \ --classname PostgresSync \ --py test_postgres_sync.py \ --inputs <YOUR-DEBEZIUM-TOPIC> \ --output <YOUR-OUTPUT-TOPIC> \ --tenant public \ --namespace default \ --name PostgresSyncFunction -
Monitor Logs: Monitor the function logs to ensure data is being processed and inserted into PostgreSQL.
- Connection Issues: Ensure PostgreSQL and Debezium are both running and accessible.
- Schema Issues: Make sure the schema of the incoming data matches the expected schema.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.