Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatxpool: invalid transaction can stuck on RPC nodes #7783

Open
michalkucharczyk opened this issue Mar 3, 2025 · 9 comments
Open

fatxpool: invalid transaction can stuck on RPC nodes #7783

michalkucharczyk opened this issue Mar 3, 2025 · 9 comments

Comments

@michalkucharczyk
Copy link
Contributor

Currently the transactions are not executed on the RPC nodes, they are only executed on collators during block building.

In some cases it may happen that transaction will be found invalid during execution. Such transaction will be reported to the transaction pool as invalid, and will be removed from the local pool. However this only can happen on collator meaning that such transaction will never be removed from the pool on RPC node side.

Here are some high-level ideas:

  • drop transaction that were not included in blocks for some period of time,
  • re-execute transactions on RPC nodes (some throttling can be applied - e.g. only re-execute transaction that were un-included in pool for some specific time),
  • new transaction protocol could report invalid transactions.
@michalkucharczyk michalkucharczyk changed the title Invalid transaction can stuck on RPC nodes fatxpool: invalid transaction can stuck on RPC nodes Mar 3, 2025
@bkchr
Copy link
Member

bkchr commented Mar 3, 2025

In some cases it may happen that transaction will be found invalid during execution. Such transaction will be reported to the transaction pool as invalid, and will be removed from the local pool.

This should never happen or there is a bug in the runtime.

Most of the transactions should be mortal and will be removed after some time. If someone is sending immortal transactions, they will stick in the pool until they are removed because there are other transactions with higher priority and the pool being full. Not sure we need any of the proposed solutions.

@michalkucharczyk
Copy link
Contributor Author

This should never happen or there is a bug in the runtime.

If that situation can be caused only by bug in the runtime - I am also not sure anymore if we shall handle this. What you say makes sense, but I am still a bit hesitant: open question is if a transaction pool should be resilient to the buggy runtime (to some extent - because there are some possible behaviour when we can do nothing - e.g. providing invalid tags)?

The problem is also that in this situation user is not receiving any feedback - so it is quite hard to detect that something wrong happened. There is no explicit error about execution failure.

Maybe we should have an eye on number of stuck transactions - if that is very rare event, than we can just keep failed transactions in the pool. Otherwise - if this affect user-experience or txpool performance - then we can re-open the discussion.

@michalkucharczyk
Copy link
Contributor Author

From offline discussion (with @skunert): we need to improve metrics which allow us to detect stuck transactions. report_inavlid count from collator is not enough, because we cannot correlate removed transactions with transaction which were removed during revalidation on rpc node.

So we need an explicit metric which provides some insight. It could be the age of transactions in the the pool, maybe some kind of histogram. We could also try do write some logic that detects potentially stuck transactions - but this requires some thinking about how to do it. The naive approach is to use some fixed period (e.g. 1 hour) - but this can lead to false reports during periods of high loads.

@bkchr
Copy link
Member

bkchr commented Mar 4, 2025

Maybe we should have an eye on number of stuck transactions - if that is very rare event, than we can just keep failed transactions in the pool. Otherwise - if this affect user-experience or txpool performance - then we can re-open the discussion.

If this affects user experience, the runtime is broken and not the pool :) Also these assumptions that normal users will send bare transactions is not really correct. Users are using DApps and they construct the transactions. So, developers in the end should check if the transactions they are constructing are invalid or whatever. They could also dry run them before etc. (which some are doing, but I generally would also not do in production)

From offline discussion (with @skunert): we need to improve metrics which allow us to detect stuck transactions. report_inavlid count from collator is not enough, because we cannot correlate removed transactions with transaction which were removed during revalidation on rpc node.

Did we detect any stuck transactions or did this idea just came up in some discussion?

@michalkucharczyk
Copy link
Contributor Author

Did we detect any stuck transactions or did this idea just came up in some discussion?

Currently we don't have a clear metric to say if anything like this is happening in the field. So it is all speculative - entire idea came up in a discussion.

@bkchr
Copy link
Member

bkchr commented Mar 4, 2025

Basically this can only happen for immortal transactions, all other transactions sooner or later are invalid. However, I could also send some valid immortal transaction with a future nonce and it would stay in the pool forever as well.

@michalkucharczyk
Copy link
Contributor Author

michalkucharczyk commented Mar 5, 2025

I would say main problem here is NOT transaction being stuck in the pool. As you said it will be evicted under certain conditions.

The main problem is lack of feedback that something is wrong with the transaction. The stuck transaction will be reported as bug, we will spent time debugging and discussing the problem, to finally learn that transaction failed on collator (assuming we are lucky enough to have access to logs and logs being enabled).

@michalkucharczyk
Copy link
Contributor Author

michalkucharczyk commented Mar 5, 2025

Question - maybe you know this: what is typical transaction sent to chain? Is it mortal? How wallets are configuring it?

I mean if we consider immortal transaction as something not recommended, then it becomes a much smaller problem, as timeout feedback will be provided.

@bkchr
Copy link
Member

bkchr commented Mar 5, 2025

The main problem is lack of feedback that something is wrong with the transaction. The stuck transaction will be reported as bug, we will spent time debugging and discussing the problem, to finally learn that transaction failed on collator (assuming we are lucky enough to have access to logs and logs being enabled).

You can just dry run them to find this out. Which would be the first thing I would do if something is fishy.

I mean if we consider immortal transaction as something not recommended, then it becomes a much smaller problem, as timeout feedback will be provided.

Yes immortal transactions are not recommended in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants