-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closing too many channels simultaneously can kill the node #2702
Comments
Try increasing For the record, attempting 100 million payments that probably don't fulfill is not only non-sensical, it is a class of dos attack on the network that protocol developers including us actively work towards preventing. |
I believe that's a duplicate of #2610 |
The number doesn't seem that non-sensical if you rebalance channels a lot. |
Yes, I rebalance a lot and that is how my node reaches this numbers. I believe a simple fix could be to make the function removeChannel() synchronized, so only one channel gets removed at a time. |
When a channel is closed, we can forget the data from historical HTLCs sent and received through that channel (which is otherwise required to punish cheating attempts by our peer). We previously synchronously removed that data from the DB when the closing transaction confirmed. However, this could create performance issues as the `htlc_infos` table can be very large for busy nodes and many concurrent writes may be happening at the same time. We don't need to get rid of this data immediately: we only want to remove it to avoid degrading the performance of active channels that read and write to the `htlc_infos` table. We now mark channels as closed in a dedicated table, and run a background actor that deletes batches of obsolete htlc data at regular intervals. This ensures that the table is eventually cleaned up, without impacting the performance of active channels. Fixes #2610 and #2702
When a channel is closed, we can forget the data from historical HTLCs sent and received through that channel (which is otherwise required to punish cheating attempts by our peer). We previously synchronously removed that data from the DB when the closing transaction confirmed. However, this could create performance issues as the `htlc_infos` table can be very large for busy nodes and many concurrent writes may be happening at the same time. We don't need to get rid of this data immediately: we only want to remove it to avoid degrading the performance of active channels that read and write to the `htlc_infos` table. We now mark channels as closed in a dedicated table, and run a background actor that deletes batches of obsolete htlc data at regular intervals. This ensures that the table is eventually cleaned up, without impacting the performance of active channels. Fixes #2610 and #2702
When a channel is closed, we can forget the data from historical HTLCs sent and received through that channel (which is otherwise required to punish cheating attempts by our peer). We previously synchronously removed that data from the DB when the closing transaction confirmed. However, this could create performance issues as the `htlc_infos` table can be very large for busy nodes and many concurrent writes may be happening at the same time. We don't need to get rid of this data immediately: we only want to remove it to avoid degrading the performance of active channels that read and write to the `htlc_infos` table. We now mark channels as closed in a dedicated table, and run a background actor that deletes batches of obsolete htlc data at regular intervals. This ensures that the table is eventually cleaned up, without impacting the performance of active channels. Fixes #2610 and #2702
When a channel is closed, we can forget the data from historical HTLCs sent and received through that channel (which is otherwise required to punish cheating attempts by our peer). We previously synchronously removed that data from the DB when the closing transaction confirmed. However, this could create performance issues as the `htlc_infos` table can be very large for busy nodes and many concurrent writes may be happening at the same time. We don't need to get rid of this data immediately: we only want to remove it to avoid degrading the performance of active channels that read and write to the `htlc_infos` table. We now mark channels as closed in a dedicated table, and run a background actor that deletes batches of obsolete htlc data at regular intervals. This ensures that the table is eventually cleaned up, without impacting the performance of active channels. Fixes #2610 and #2702
When a channel is closed, we can forget the data from historical HTLCs sent and received through that channel (which is otherwise required to punish cheating attempts by our peer). We previously synchronously removed that data from the DB when the closing transaction confirmed. However, this could create performance issues as the `htlc_infos` table can be very large for busy nodes and many concurrent writes may be happening at the same time. We don't need to get rid of this data immediately: we only want to remove it to avoid degrading the performance of active channels that read and write to the `htlc_infos` table. We now mark channels as closed in a dedicated table, and run a background actor that deletes batches of obsolete htlc data at regular intervals. This ensures that the table is eventually cleaned up, without impacting the performance of active channels. Fixes #2610 and #2702
When a channel is closed, we can forget the data from historical HTLCs sent and received through that channel (which is otherwise required to punish cheating attempts by our peer). We previously synchronously removed that data from the DB when the closing transaction confirmed. However, this could create performance issues as the `htlc_infos` table can be very large for busy nodes and many concurrent writes may be happening at the same time. We don't need to get rid of this data immediately: we only want to remove it to avoid degrading the performance of active channels that read and write to the `htlc_infos` table. We now mark channels as closed in a dedicated table, and run a background actor that deletes batches of obsolete htlc data at regular intervals. This ensures that the table is eventually cleaned up, without impacting the performance of active channels. Fixes #2610 and #2702
When a channel is closed, we can forget the data from historical HTLCs sent and received through that channel (which is otherwise required to punish cheating attempts by our peer). We previously synchronously removed that data from the DB when the closing transaction confirmed. However, this could create performance issues as the `htlc_infos` table can be very large for busy nodes and many concurrent writes may be happening at the same time. We don't need to get rid of this data immediately: we only want to remove it to avoid degrading the performance of active channels that read and write to the `htlc_infos` table. We now mark channels as closed in a dedicated table, and run a background actor that deletes batches of obsolete htlc data at regular intervals. This ensures that the table is eventually cleaned up, without impacting the performance of active channels. Fixes #2610 and #2702
When a channel is closed, we can forget the data from historical HTLCs sent and received through that channel (which is otherwise required to punish cheating attempts by our peer). We previously synchronously removed that data from the DB when the closing transaction confirmed. However, this could create performance issues as the `htlc_infos` table can be very large for busy nodes and many concurrent writes may be happening at the same time. We don't need to get rid of this data immediately: we only want to remove it to avoid degrading the performance of active channels that read and write to the `htlc_infos` table. We now mark channels as closed in a dedicated table, and run a background actor that deletes batches of obsolete htlc data at regular intervals. This ensures that the table is eventually cleaned up, without impacting the performance of active channels. Fixes #2610 and #2702
When a channel is closed, we can forget the data from historical HTLCs sent and received through that channel (which is otherwise required to punish cheating attempts by our peer). We previously synchronously removed that data from the DB when the closing transaction confirmed. However, this could create performance issues as the `htlc_infos` table can be very large for busy nodes and many concurrent writes may be happening at the same time. We don't need to get rid of this data immediately: we only want to remove it to avoid degrading the performance of active channels that read and write to the `htlc_infos` table. We now mark channels as closed in a dedicated table, and run a background actor that deletes batches of obsolete htlc data at regular intervals. This ensures that the table is eventually cleaned up, without impacting the performance of active channels. When a splice transaction confirms, all the revoked commitment transactions that only applied to the previous funding transaction cannot be published anymore, because the previous funding output has already been spent. We can thus forget all the historical HTLCs that were included in those commitments, because we will never need to generate the corresponding penalty transactions. This ensures that the growth of our DB is bounded, and will shrink every time a splice transaction is confirmed. Fixes #2610, #2702 and #2740
Fixed by #2705 |
My
local.htlc_infos
table has grown to over 100 million entries.The average channel has around 2 million entries in the table.
Deleting one of the channels from the table can take up to 20 minutes.
This is usually not a problem.
The delete job will just run in the DB and during this time the node reacts just a bit slower.
Yesterday I decided to clean out some channels and close 6 channels in short succession.
All transaction from the 6 closes got mined within 2 minutes and shortly after this my node entered a restart circle.
This error would happen:
2023-06-26 22:35:59,593 ERROR- eclair is shutting down because of a fatal database errorjava.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30002ms.
Node shuts down, restarts, runs for 2 minutes then shuts down again...
This would happen over and over.
The problem was this:
I had
eclair.db.postgres.pool.size
=4When the node started up it would take a moment to realize that the 6 channel close TX were already mined and started the delete
Those deletes would run for several minutes and block all 4 connections to the db.
This would then lead to the shutdown of the node.
Because the node was shut down the delete transactions would not get committed and the circle would start over on next tart of node.
Increasing eclair.db.postgres.pool.size to 8 allowed the node to use 6 connections for the deletes while 2 connections would enough to handle the rest of the DB load.
How to reproduce:
- Grow your
local.htlc_infos
to the point where the delete of one channel takes more than 5 minutes.- Close double the number of channels than your
eclair.db.postgres.pool.size
- Shut down your node and wait for all closing TX to be mined (this is to make sure the deletes all happen at once)
- Start your node (node will then fill up the DB connections with deletes and run in a DB timeout)
Version:
version=0.9.0 commit=623f7e4 with postgres DB
The text was updated successfully, but these errors were encountered: