MySQL to BigQuery CDC is REALLY slow #28967
Unanswered
ravenscroftj
asked this question in
Connector Questions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there,
I've got a sync config that uses CDC mode in MySQL with binary logs and syncs to BigQuery using Incremental+Deduped.
I'm copying about 10GB of data spread over 25 tables - the biggest of the tables has about 8 million rows in it. The smallest has a few hundred.
The MySQL machine and the VM running the airbyte instance are both inside the same google virtual private network. The BigQuery bucket and instance are in the same "region". I'm using the GCS buffering strategy for importing into BQ.
The first time I run a sync or when I do a reset it takes about 50 minutes. Subsequent runs take anywhere from 2 to 6 hours to transfer of-the-order-of 50k records or ~25MB of data which are representative of our daily changes to the MySQL dataset.
I just can't get my head around why it would be so slow? I thought the point of using the binlogs was that we only need to care about the transactions that happened in the period between syncs. It shouldn't be doing any huge scans of the dataset.
I have two theories:
If anyone can offer advice or maybe could suggest some reading material that would help me get my head around this problem, I'd be very grateful.
Thanks in advance
Beta Was this translation helpful? Give feedback.
All reactions