[ottersec] Remove tx from cache when canAddPendingTx fails #228

yzang2019 · 2024-04-16T08:38:12Z

Describe your changes and provide context

When PendingTxs is full, canAddPendingTx fails here, and the removeHandler doesn’t execute. This behavior prevent the addition of identical transactions, as an tx already exists in cache error occurs here.

Testing performed to validate your change

* Perf: Increase buffer size for pubsub server to boost performance (#167) * Increase buffer size for pubsub server * Add more timeout for test failure * Add more timeout * Fix test split scripts * Fix test split * Fix unit test * Unit test * Unit test * [P2P] Optimize block pool requester retry and peer pick up logic (#170) * P2P Improvements: Fix block sync reactor and block pool retry logic * Revert "Add event data to result event (#165)" (#176) This reverts commit 72bb29c. * Fix block sync auto restart not working as expected (#175) * Fix edge case for blocksync (#178) * fix evm pending nonce * fix test * deflake a test * de-flake test * Revert "merge main" This reverts commit 58b9424, reversing changes made to 02d1478. * consider keep-in-cache logic when removing from cache * undo test tweaks --------- Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com> Co-authored-by: Jeremy Wei <jeremy.t.wei@gmail.com>

* Add metrics for mempool pending transaction size * Add expired tx count metrics

* add mempool prioritization with evm nonce * fix priority stability * index fixes * replace with binary search insert * impl binary search

* debug duplicate evm tx * add more logs * add some \ns * more logs * fix swap check * add-lockable-reap-by-gas * add invariant checks * fix invariant parenthesis * fix log * remove invalid invariant * fix nonce ordering pain * handle ordering of insert * fix remove * cleanup * fix imports * cleanup * avoid getTransactionByHash(hash) panic due to index * use Key() to compare instead of pointer

* prevent duplicates in mempool * use timestamp in priority queue

* add logging for expired txs * cleanup

* remove heapIndex to avoid nil scenario * avoid returning nil in loop (mimic Peek)

* add heapIndex with safety check * cleanup * comment out for perf test * add back perf improvement * fix nil test * Use write-lock in (*TxPriorityQueue).ReapMax funcs (#209) ReapMaxBytesMaxGas and ReapMaxTxs funcs in TxPriorityQueue claim > Transactions returned are not removed from the mempool transaction > store or indexes. However, they use a priority queue to accomplish the claim > Transaction are retrieved in priority order. This is accomplished by popping all items out of the whole heap, and then pushing then back in sequentially. A copy of the heap cannot be obtained otherwise. Both of the mentioned functions use a read-lock (RLock) when doing this. This results in a potential scenario where multiple executions of the ReapMax can be started in parallel, and both would be popping items out of the priority queue. In practice, this can be abused by executing the `unconfirmed_txs` RPC call repeatedly. Based on our observations, running it multiple times per millisecond results in multiple threads picking it up at the same time. Such a scenario can be obtained via the WebSocket interface, and spamming `unconfirmed_txs` calls there. The behavior that happens is a `Panic in WSJSONRPC handler` when a queue item unexpectedly disappears for `mempool.(*TxPriorityQueue).Swap`. (`runtime error: index out of range [0] with length 0`) This can additionally lead to a `CONSENSUS FAILURE!!!` if the race condition occurs for `internal/consensus.(*State).finalizeCommit` when it tries to do `mempool.(*TxPriorityQueue).RemoveTx`, but the ReapMax has already removed all elements from the underlying heap. (`runtime error: index out of range [-1]`) This commit switches the lock type to a write-lock (Lock) to ensure no parallel modifications take place. This commit additionally updates the tests to allow parallel execution of the func calls in testing, as to prevent regressions (in case someone wants to downgrade the locks without considering the implications from the underlying heap usage). --------- Co-authored-by: Valters Jansons <sigv@users.noreply.github.com>

* reformat logs to use simple concatenation with separators (#207) * Use write-lock in (*TxPriorityQueue).ReapMax funcs (#209) ReapMaxBytesMaxGas and ReapMaxTxs funcs in TxPriorityQueue claim > Transactions returned are not removed from the mempool transaction > store or indexes. However, they use a priority queue to accomplish the claim > Transaction are retrieved in priority order. This is accomplished by popping all items out of the whole heap, and then pushing then back in sequentially. A copy of the heap cannot be obtained otherwise. Both of the mentioned functions use a read-lock (RLock) when doing this. This results in a potential scenario where multiple executions of the ReapMax can be started in parallel, and both would be popping items out of the priority queue. In practice, this can be abused by executing the `unconfirmed_txs` RPC call repeatedly. Based on our observations, running it multiple times per millisecond results in multiple threads picking it up at the same time. Such a scenario can be obtained via the WebSocket interface, and spamming `unconfirmed_txs` calls there. The behavior that happens is a `Panic in WSJSONRPC handler` when a queue item unexpectedly disappears for `mempool.(*TxPriorityQueue).Swap`. (`runtime error: index out of range [0] with length 0`) This can additionally lead to a `CONSENSUS FAILURE!!!` if the race condition occurs for `internal/consensus.(*State).finalizeCommit` when it tries to do `mempool.(*TxPriorityQueue).RemoveTx`, but the ReapMax has already removed all elements from the underlying heap. (`runtime error: index out of range [-1]`) This commit switches the lock type to a write-lock (Lock) to ensure no parallel modifications take place. This commit additionally updates the tests to allow parallel execution of the func calls in testing, as to prevent regressions (in case someone wants to downgrade the locks without considering the implications from the underlying heap usage). * Fix root dir for tendermint reindex command (#210) * Replay events during restart to avoid tx missing (#211) --------- Co-authored-by: Denys S <150304777+dssei@users.noreply.github.com> Co-authored-by: Valters Jansons <sigv@users.noreply.github.com> Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com>

* Add more tendermint metrics for block processing * Fix * Fix metric

codchen and others added 28 commits February 27, 2024 22:06

Make ReadMaxTxs atomic (#166)

a81be2f

Support pending transaction in mempool (#169)

e1640c3

fix unconfirmed tx to consider pending txs (#172)

56bd634

fix pending pop (#173)

e475a94

add TTL for pending txs (#174)

20b4b85

Fix bug when popping pending TXs (#188)

c95208e

Add mempool metrics for number of pending tx and expired txs (#189)

8528f3a

* Add metrics for mempool pending transaction size * Add expired tx count metrics

[EVM] Allow multiple txs from same account in a block (#190)

7ec8f94

* add mempool prioritization with evm nonce * fix priority stability * index fixes * replace with binary search insert * impl binary search

fix removeTx to push next queued evm tx (#191)

4aa38c0

fix expire metric (#193)

b730c59

[EVM] prevent duplicate txs from getting inserted (#196)

cf41dab

* prevent duplicates in mempool * use timestamp in priority queue

[EVM] Add logging for expiration (#198)

9f115c7

* add logging for expired txs * cleanup

[EVM] Avoid returning nil transactions on ForEach (#197)

585bed3

* remove heapIndex to avoid nil scenario * avoid returning nil in loop (mimic Peek)

call callback from mempool (#200)

80279d4

separate limit for pending tx (#202)

e4415a8

Add EVM txs eviction logic (#204)

c9e2944

Fix debug log (#205)

b4190aa

EVM transaction replacement (#206) (#208)

77ad5bd

Pending Txs Update Condition (#214)

fbc1bc0

Add metrics for mempool size changes (#220)

e250648

Add more tendermint metrics for block processing (#223)

124468c

* Add more tendermint metrics for block processing * Fix * Fix metric

Fix metric labeling issue (#226)

dd96411

[EVM] Adjust locking for replacement (#224)

faadb1d

Remove tx from cache when mempool is full

be07d81

yzang2019 requested review from stevenlanders and codchen April 16, 2024 08:39

yzang2019 changed the base branch from main to seiv2 April 16, 2024 08:39

codchen approved these changes Apr 16, 2024

View reviewed changes

udpatil force-pushed the seiv2 branch from faadb1d to a214d48 Compare April 16, 2024 23:53

Add unit test

554558c

yzang2019 changed the title ~~Remove tx from cache when canAddPendingTx fails~~ [ottersec] Remove tx from cache when canAddPendingTx fails Apr 17, 2024

yzang2019 changed the base branch from seiv2 to main April 17, 2024 07:02

yzang2019 changed the base branch from main to seiv2 April 17, 2024 07:02

Merge branch 'seiv2' into yzang/SEI-7114

d1ea226

yzang2019 closed this Apr 17, 2024

yzang2019 reopened this Apr 17, 2024

yzang2019 closed this Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ottersec] Remove tx from cache when canAddPendingTx fails #228

[ottersec] Remove tx from cache when canAddPendingTx fails #228

yzang2019 commented Apr 16, 2024

[ottersec] Remove tx from cache when canAddPendingTx fails #228

[ottersec] Remove tx from cache when canAddPendingTx fails #228

Conversation

yzang2019 commented Apr 16, 2024

Describe your changes and provide context

Testing performed to validate your change