Redshift inserts taking longer #160

rangan-anand · 2023-05-21T13:01:35Z

rangan-anand
May 21, 2023

Hi All,

I am encountering an issue where we are trying to insert close to 500k records into Redshift per day. The data is coming from a streaming source that has varying levels of data influx, with high activity during certain times and low volume during others.

Currently, we are using the Redshift connector for this task, but we are facing several challenges. The main issue we are encountering is that the connector is only processing 30k records per hour. This is far below our current needs, and we need to find a way to increase the processing speed.

Furthermore, we have observed that when multiple AWS Lambdas are triggered to insert data, a deadlock occurs. This leads to all remaining processes/lambdas failing, resulting in disruption of our workflow.

below settings are used by the cursor to load the data.

cursor.paramstyle = 'named'

cursor.executemany(sql, [values])

please let me know if there is any functionality with the connector that can help us achieve this goal.
Thank you for your help.

Answered by Brooke-white

May 22, 2023

Hi @rangan-anand ,

Thank you for reaching out. To summarize, it sounds like there are 2 separate issues you are facing: slow performance when inserting data and deadlocks.

Regarding the first issue, performance is an area of redshift-connector we are looking to improve. We anticipate a minor performance improvement in our next release for customers who are not using bind parameters. While it seems you are using bind parameters, I figured I'd mention.

For the second issue, could you share the exception/error messages you are seeing when the deadlock occurs? Are these multiple Lambdas modifying the same table? Until then, I'm going to share some resources about locks and Redshift that may h…

View full answer

Brooke-white · 2023-05-22T19:39:18Z

Brooke-white
May 22, 2023
Maintainer

Hi @rangan-anand ,

Thank you for reaching out. To summarize, it sounds like there are 2 separate issues you are facing: slow performance when inserting data and deadlocks.

Regarding the first issue, performance is an area of redshift-connector we are looking to improve. We anticipate a minor performance improvement in our next release for customers who are not using bind parameters. While it seems you are using bind parameters, I figured I'd mention.

For the second issue, could you share the exception/error messages you are seeing when the deadlock occurs? Are these multiple Lambdas modifying the same table? Until then, I'm going to share some resources about locks and Redshift that may help in narrowing down this issue further:

https://repost.aws/knowledge-center/prevent-locks-blocking-queries-redshift
https://docs.aws.amazon.com/redshift/latest/dg/c_write_readwrite.html#c_write_readwrite-potential-deadlock

1 reply

rangan-anand May 24, 2023
Author

Hi @Brooke-white, Thanks for explaining, currently we are looking at using approach similar to aws-sdk-pandas which uses Redshift copy command.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redshift inserts taking longer #160

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Redshift inserts taking longer #160

rangan-anand May 21, 2023

Replies: 1 comment · 1 reply

Brooke-white May 22, 2023 Maintainer

rangan-anand May 24, 2023 Author

rangan-anand
May 21, 2023

Replies: 1 comment 1 reply

Brooke-white
May 22, 2023
Maintainer

rangan-anand May 24, 2023
Author