`fatxpool`: invalid transaction can stuck on RPC nodes #7783

michalkucharczyk · 2025-03-03T17:38:26Z

Currently the transactions are not executed on the RPC nodes, they are only executed on collators during block building.

In some cases it may happen that transaction will be found invalid during execution. Such transaction will be reported to the transaction pool as invalid, and will be removed from the local pool. However this only can happen on collator meaning that such transaction will never be removed from the pool on RPC node side.

Here are some high-level ideas:

drop transaction that were not included in blocks for some period of time,
re-execute transactions on RPC nodes (some throttling can be applied - e.g. only re-execute transaction that were un-included in pool for some specific time),
new transaction protocol could report invalid transactions.

bkchr · 2025-03-03T20:32:11Z

In some cases it may happen that transaction will be found invalid during execution. Such transaction will be reported to the transaction pool as invalid, and will be removed from the local pool.

This should never happen or there is a bug in the runtime.

Most of the transactions should be mortal and will be removed after some time. If someone is sending immortal transactions, they will stick in the pool until they are removed because there are other transactions with higher priority and the pool being full. Not sure we need any of the proposed solutions.

michalkucharczyk · 2025-03-04T07:15:39Z

This should never happen or there is a bug in the runtime.

If that situation can be caused only by bug in the runtime - I am also not sure anymore if we shall handle this. What you say makes sense, but I am still a bit hesitant: open question is if a transaction pool should be resilient to the buggy runtime (to some extent - because there are some possible behaviour when we can do nothing - e.g. providing invalid tags)?

The problem is also that in this situation user is not receiving any feedback - so it is quite hard to detect that something wrong happened. There is no explicit error about execution failure.

Maybe we should have an eye on number of stuck transactions - if that is very rare event, than we can just keep failed transactions in the pool. Otherwise - if this affect user-experience or txpool performance - then we can re-open the discussion.

michalkucharczyk · 2025-03-04T08:04:13Z

From offline discussion (with @skunert): we need to improve metrics which allow us to detect stuck transactions. report_inavlid count from collator is not enough, because we cannot correlate removed transactions with transaction which were removed during revalidation on rpc node.

So we need an explicit metric which provides some insight. It could be the age of transactions in the the pool, maybe some kind of histogram. We could also try do write some logic that detects potentially stuck transactions - but this requires some thinking about how to do it. The naive approach is to use some fixed period (e.g. 1 hour) - but this can lead to false reports during periods of high loads.

bkchr · 2025-03-04T09:59:37Z

Maybe we should have an eye on number of stuck transactions - if that is very rare event, than we can just keep failed transactions in the pool. Otherwise - if this affect user-experience or txpool performance - then we can re-open the discussion.

If this affects user experience, the runtime is broken and not the pool :) Also these assumptions that normal users will send bare transactions is not really correct. Users are using DApps and they construct the transactions. So, developers in the end should check if the transactions they are constructing are invalid or whatever. They could also dry run them before etc. (which some are doing, but I generally would also not do in production)

From offline discussion (with @skunert): we need to improve metrics which allow us to detect stuck transactions. report_inavlid count from collator is not enough, because we cannot correlate removed transactions with transaction which were removed during revalidation on rpc node.

Did we detect any stuck transactions or did this idea just came up in some discussion?

michalkucharczyk · 2025-03-04T11:01:47Z

Did we detect any stuck transactions or did this idea just came up in some discussion?

Currently we don't have a clear metric to say if anything like this is happening in the field. So it is all speculative - entire idea came up in a discussion.

bkchr · 2025-03-04T23:02:39Z

Basically this can only happen for immortal transactions, all other transactions sooner or later are invalid. However, I could also send some valid immortal transaction with a future nonce and it would stay in the pool forever as well.

michalkucharczyk · 2025-03-05T07:02:02Z

I would say main problem here is NOT transaction being stuck in the pool. As you said it will be evicted under certain conditions.

The main problem is lack of feedback that something is wrong with the transaction. The stuck transaction will be reported as bug, we will spent time debugging and discussing the problem, to finally learn that transaction failed on collator (assuming we are lucky enough to have access to logs and logs being enabled).

michalkucharczyk · 2025-03-05T07:03:18Z

Question - maybe you know this: what is typical transaction sent to chain? Is it mortal? How wallets are configuring it?

I mean if we consider immortal transaction as something not recommended, then it becomes a much smaller problem, as timeout feedback will be provided.

bkchr · 2025-03-05T10:05:53Z

The main problem is lack of feedback that something is wrong with the transaction. The stuck transaction will be reported as bug, we will spent time debugging and discussing the problem, to finally learn that transaction failed on collator (assuming we are lucky enough to have access to logs and logs being enabled).

You can just dry run them to find this out. Which would be the first thing I would do if something is fishy.

I mean if we consider immortal transaction as something not recommended, then it becomes a much smaller problem, as timeout feedback will be provided.

Yes immortal transactions are not recommended in general.

michalkucharczyk added this to fork-aware-txpool Mar 3, 2025

michalkucharczyk moved this to Todo in fork-aware-txpool Mar 3, 2025

michalkucharczyk changed the title ~~Invalid transaction can stuck on RPC nodes~~ fatxpool: invalid transaction can stuck on RPC nodes Mar 3, 2025

michalkucharczyk mentioned this issue Mar 4, 2025

fatxpool: metrics for detecting stuck transactions #7791

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`fatxpool`: invalid transaction can stuck on RPC nodes #7783

`fatxpool`: invalid transaction can stuck on RPC nodes #7783

michalkucharczyk commented Mar 3, 2025

bkchr commented Mar 3, 2025

michalkucharczyk commented Mar 4, 2025

michalkucharczyk commented Mar 4, 2025

bkchr commented Mar 4, 2025

michalkucharczyk commented Mar 4, 2025

bkchr commented Mar 4, 2025

michalkucharczyk commented Mar 5, 2025 •

edited

Loading

michalkucharczyk commented Mar 5, 2025 •

edited

Loading

bkchr commented Mar 5, 2025

fatxpool: invalid transaction can stuck on RPC nodes #7783

fatxpool: invalid transaction can stuck on RPC nodes #7783

Comments

michalkucharczyk commented Mar 3, 2025

bkchr commented Mar 3, 2025

michalkucharczyk commented Mar 4, 2025

michalkucharczyk commented Mar 4, 2025

bkchr commented Mar 4, 2025

michalkucharczyk commented Mar 4, 2025

bkchr commented Mar 4, 2025

michalkucharczyk commented Mar 5, 2025 • edited Loading

michalkucharczyk commented Mar 5, 2025 • edited Loading

bkchr commented Mar 5, 2025

`fatxpool`: invalid transaction can stuck on RPC nodes #7783

`fatxpool`: invalid transaction can stuck on RPC nodes #7783

michalkucharczyk commented Mar 5, 2025 •

edited

Loading

michalkucharczyk commented Mar 5, 2025 •

edited

Loading