Timeout copy batches that take too long and copy tables in parallel by lutter · Pull Request #5918 · graphprotocol/graph-node

lutter · 2025-03-28T16:50:51Z

Our estimation of batch sizes is generally good and stays within the prescribed bounds, but there are cases where proper estimation of the batch size is nearly impossible since the size of the rows in the table jumps sharply at some point that is hard to predict. This mechanism ensures that if our estimation is wrong, the consequences aren't too severe.

That's what the first three commits do; the rest of this PR changes how we copy so that we can copy the tables for a deployment in parallel. The copying parallelizes opportunistically, i.e., it will always copy at least one table, and more if there are database connections available and the configuration allows copying more than one table

Copilot

Pull Request Overview

This PR implements a mechanism to handle long-running copy batch operations by timing out queries and retrying with a smaller batch size if necessary. Key changes include:

Adding a setter method for batch size in VidBatcher.
Wrapping batch copy transactions in a loop that sets a local statement timeout and resets the batch size on a timeout.
Introducing a new environment variable and error variant (StatementTimeout) to support batch timeout functionality.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
store/postgres/src/vid_batcher.rs	Added a new method to adjust the batch size dynamically.
store/postgres/src/copy.rs	Implemented timeout checking for copy batches with retry logic and a timeout.
graph/src/env/store.rs	Updated EnvVarsStore to include batch_timeout with a constraint check.
graph/src/env/mod.rs	Modified from_env to use try_into for proper error handling.
graph/src/components/store/err.rs	Added a new StatementTimeout error variant and refactored error conversion logic.
docs/environment-variables.md	Updated documentation to describe the new GRAPH_STORE_BATCH_TIMEOUT variable.

Copilot · 2025-03-28T16:51:31Z

store/postgres/src/copy.rs


-                let status = self.transaction(|conn| table.copy_batch(conn))?;
+                let status = {


Consider introducing a maximum retry limit in this loop to avoid potential infinite retries in the event of persistent statement timeouts.

zorancv · 2025-03-31T13:05:07Z

graph/src/env/store.rs

+    pub batch_timeout: Option<Duration>,
+
+    /// The number of workers to use for batch operations. If there are idle
+    /// connectiosn, each subgraph copy operation will use up to this many


Suggested change

/// connectiosn, each subgraph copy operation will use up to this many

/// connections, each subgraph copy operation will use up to this many

Fixed the typo

zorancv · 2025-03-31T21:06:01Z

store/postgres/src/copy.rs

+            if let Some(worker) = self.default_worker(&mut state, &progress)? {
+                workers.push(worker);
+            }
+            loop {


Probably question of taste but I would move the loop inside the above if. Current way is correct too.

That wouldn't be correct - we can only call self.default_worker when self.conn.is_some(), i.e., once per while loop. The loop { .. } is about trying to get more workers than just the one we always have.

Just noticed that I misread your comment - yes, putting the loop inside the if would also have been possible, but as you said, more a matter of taste

zorancv

Hope it helps the copies.

lutter · 2025-03-31T23:43:27Z

Messed up my git commands, this was merged at b3543bb

lutter requested review from Copilot and zorancv March 28, 2025 16:50

Copilot AI reviewed Mar 28, 2025

View reviewed changes

lutter changed the title ~~Timeout copy batches that take too long~~ Timeout copy batches that take too long and copy tables in parallel Mar 29, 2025

lutter force-pushed the lutter/copy-timeout branch from c456c0a to 417ee08 Compare March 29, 2025 22:30

zorancv reviewed Mar 31, 2025

View reviewed changes

zorancv approved these changes Mar 31, 2025

View reviewed changes

lutter closed this Mar 31, 2025

lutter force-pushed the lutter/copy-timeout branch from a39eaf1 to b3543bb Compare March 31, 2025 23:42

lutter deleted the lutter/copy-timeout branch April 9, 2025 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Timeout copy batches that take too long and copy tables in parallel#5918

Timeout copy batches that take too long and copy tables in parallel#5918
lutter wants to merge 0 commit intomasterfrom
lutter/copy-timeout

lutter commented Mar 28, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 28, 2025

Uh oh!

zorancv Mar 31, 2025

Uh oh!

lutter Mar 31, 2025

Uh oh!

zorancv Mar 31, 2025

Uh oh!

lutter Mar 31, 2025

Uh oh!

lutter Apr 1, 2025

Uh oh!

zorancv left a comment

Uh oh!

lutter commented Mar 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		let status = self.transaction(\|conn\| table.copy_batch(conn))?;
		let status = {

	/// connectiosn, each subgraph copy operation will use up to this many
	/// connections, each subgraph copy operation will use up to this many

Comments

Conversation

lutter commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

zorancv Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

lutter Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

zorancv Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

lutter Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

lutter Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

zorancv left a comment

Choose a reason for hiding this comment

Uh oh!

lutter commented Mar 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lutter commented Mar 28, 2025 •

edited

Loading