How to Improve Speed and Performance for Large Data Sync #28726
Unanswered
jenisha0512
asked this question in
Connector Questions
Replies: 1 comment 1 reply
-
If you're using a source API maybe there are limitation from the service provider of how many requests and data you can retrieve each time. If you're doing a sync database to database, the source connector has a dynamic batch size which depends on how much memory you have available for the connector. One way to increase it is give more resource to the connection pod/container. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey everyone,
I've been using Airbyte for data syncing, and while it's been great for most cases, I've encountered a performance challenge when dealing with large datasets. Currently, it takes approximately 1 hour and 15 minutes to sync just 2 GBs of data.
To tackle this, I've tried a couple of adjustments. First, I increased the
MAX_CONCURRENT_STREAM_IN_BUFFER
till 5000, hoping that it would speed up the process. Additionally, I decreased the batch size to 5 MB from 25 MB, but it hasn't made a significant difference. Also tried increasing theDEFAULT_FETCH_SIZE
which is the desired buffer size in memory to store the fetched rows.I'd love to hear your suggestions on further optimizing the sync process for large datasets. Are there any other configurations or settings that could be adjusted to improve the performance? Any tips or best practices from those who have dealt with similar challenges?
Looking forward to your insights and solutions!
Beta Was this translation helpful? Give feedback.
All reactions