How to Improve Speed and Performance for Large Data Sync #28726

jenisha0512 · 2023-07-26T07:49:25Z

jenisha0512
Jul 26, 2023

Hey everyone,

I've been using Airbyte for data syncing, and while it's been great for most cases, I've encountered a performance challenge when dealing with large datasets. Currently, it takes approximately 1 hour and 15 minutes to sync just 2 GBs of data.

To tackle this, I've tried a couple of adjustments. First, I increased the MAX_CONCURRENT_STREAM_IN_BUFFER till 5000, hoping that it would speed up the process. Additionally, I decreased the batch size to 5 MB from 25 MB, but it hasn't made a significant difference. Also tried increasing the DEFAULT_FETCH_SIZE which is the desired buffer size in memory to store the fetched rows.

I'd love to hear your suggestions on further optimizing the sync process for large datasets. Are there any other configurations or settings that could be adjusted to improve the performance? Any tips or best practices from those who have dealt with similar challenges?

Looking forward to your insights and solutions!

marcosmarxm · 2023-07-26T15:52:34Z

marcosmarxm
Jul 26, 2023
Maintainer

If you're using a source API maybe there are limitation from the service provider of how many requests and data you can retrieve each time. If you're doing a sync database to database, the source connector has a dynamic batch size which depends on how much memory you have available for the connector. One way to increase it is give more resource to the connection pod/container.

1 reply

jenisha0512 Aug 8, 2023
Author

How can that be done using current codebase, like batch size and other parallel streaming configuration i have tried changing, but what else i can try?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Improve Speed and Performance for Large Data Sync #28726

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How to Improve Speed and Performance for Large Data Sync #28726

jenisha0512 Jul 26, 2023

Replies: 1 comment · 1 reply

marcosmarxm Jul 26, 2023 Maintainer

jenisha0512 Aug 8, 2023 Author

jenisha0512
Jul 26, 2023

Replies: 1 comment 1 reply

marcosmarxm
Jul 26, 2023
Maintainer

jenisha0512 Aug 8, 2023
Author