Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closing too many channels simultaneously can kill the node #2702

Closed
DerEwige opened this issue Jun 27, 2023 · 5 comments
Closed

Closing too many channels simultaneously can kill the node #2702

DerEwige opened this issue Jun 27, 2023 · 5 comments

Comments

@DerEwige
Copy link
Contributor

My local.htlc_infos table has grown to over 100 million entries.
The average channel has around 2 million entries in the table.
Deleting one of the channels from the table can take up to 20 minutes.

This is usually not a problem.
The delete job will just run in the DB and during this time the node reacts just a bit slower.

Yesterday I decided to clean out some channels and close 6 channels in short succession.
All transaction from the 6 closes got mined within 2 minutes and shortly after this my node entered a restart circle.

This error would happen:
2023-06-26 22:35:59,593 ERROR- eclair is shutting down because of a fatal database errorjava.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30002ms.
Node shuts down, restarts, runs for 2 minutes then shuts down again...
This would happen over and over.

The problem was this:

I had eclair.db.postgres.pool.size=4
When the node started up it would take a moment to realize that the 6 channel close TX were already mined and started the delete
Those deletes would run for several minutes and block all 4 connections to the db.
This would then lead to the shutdown of the node.
Because the node was shut down the delete transactions would not get committed and the circle would start over on next tart of node.

Increasing eclair.db.postgres.pool.size to 8 allowed the node to use 6 connections for the deletes while 2 connections would enough to handle the rest of the DB load.

How to reproduce:
- Grow your local.htlc_infos to the point where the delete of one channel takes more than 5 minutes.
- Close double the number of channels than your eclair.db.postgres.pool.size
- Shut down your node and wait for all closing TX to be mined (this is to make sure the deletes all happen at once)
- Start your node (node will then fill up the DB connections with deletes and run in a DB timeout)

Version:
version=0.9.0 commit=623f7e4 with postgres DB

@thomash-acinq thomash-acinq changed the title Closing to many channels simultaneously can kill the node Closing too many channels simultaneously can kill the node Jun 27, 2023
@pm47
Copy link
Member

pm47 commented Jun 27, 2023

Try increasing eclair.db.postgres.pool.connection-timeout, it defaults to 30s.

For the record, attempting 100 million payments that probably don't fulfill is not only non-sensical, it is a class of dos attack on the network that protocol developers including us actively work towards preventing.

@rorp
Copy link
Contributor

rorp commented Jun 27, 2023

I believe that's a duplicate of #2610

@rorp
Copy link
Contributor

rorp commented Jun 27, 2023

For the record, attempting 100 million payments that probably don't fulfill is not only non-sensical, it is a class of dos attack on the network that protocol developers including us actively work towards preventing.

The number doesn't seem that non-sensical if you rebalance channels a lot.

@DerEwige
Copy link
Contributor Author

Yes, I rebalance a lot and that is how my node reaches this numbers.
But a big zero fee routing node would have similar numbers.

I believe a simple fix could be to make the function removeChannel() synchronized, so only one channel gets removed at a time.
There might be other SQL statements, that run a long time, but so far that is the only one that gave me problems.

t-bast added a commit that referenced this issue Jul 4, 2023
When a channel is closed, we can forget the data from historical HTLCs
sent and received through that channel (which is otherwise required to
punish cheating attempts by our peer).

We previously synchronously removed that data from the DB when the closing
transaction confirmed. However, this could create performance issues as
the `htlc_infos` table can be very large for busy nodes and many concurrent
writes may be happening at the same time.

We don't need to get rid of this data immediately: we only want to remove
it to avoid degrading the performance of active channels that read and
write to the `htlc_infos` table. We now mark channels as closed in a
dedicated table, and run a background actor that deletes batches of
obsolete htlc data at regular intervals. This ensures that the table is
eventually cleaned up, without impacting the performance of active
channels.

Fixes #2610 and #2702
t-bast added a commit that referenced this issue Jul 4, 2023
When a channel is closed, we can forget the data from historical HTLCs
sent and received through that channel (which is otherwise required to
punish cheating attempts by our peer).

We previously synchronously removed that data from the DB when the closing
transaction confirmed. However, this could create performance issues as
the `htlc_infos` table can be very large for busy nodes and many concurrent
writes may be happening at the same time.

We don't need to get rid of this data immediately: we only want to remove
it to avoid degrading the performance of active channels that read and
write to the `htlc_infos` table. We now mark channels as closed in a
dedicated table, and run a background actor that deletes batches of
obsolete htlc data at regular intervals. This ensures that the table is
eventually cleaned up, without impacting the performance of active
channels.

Fixes #2610 and #2702
t-bast added a commit that referenced this issue Sep 21, 2023
When a channel is closed, we can forget the data from historical HTLCs
sent and received through that channel (which is otherwise required to
punish cheating attempts by our peer).

We previously synchronously removed that data from the DB when the closing
transaction confirmed. However, this could create performance issues as
the `htlc_infos` table can be very large for busy nodes and many concurrent
writes may be happening at the same time.

We don't need to get rid of this data immediately: we only want to remove
it to avoid degrading the performance of active channels that read and
write to the `htlc_infos` table. We now mark channels as closed in a
dedicated table, and run a background actor that deletes batches of
obsolete htlc data at regular intervals. This ensures that the table is
eventually cleaned up, without impacting the performance of active
channels.

Fixes #2610 and #2702
t-bast added a commit that referenced this issue Sep 28, 2023
When a channel is closed, we can forget the data from historical HTLCs
sent and received through that channel (which is otherwise required to
punish cheating attempts by our peer).

We previously synchronously removed that data from the DB when the closing
transaction confirmed. However, this could create performance issues as
the `htlc_infos` table can be very large for busy nodes and many concurrent
writes may be happening at the same time.

We don't need to get rid of this data immediately: we only want to remove
it to avoid degrading the performance of active channels that read and
write to the `htlc_infos` table. We now mark channels as closed in a
dedicated table, and run a background actor that deletes batches of
obsolete htlc data at regular intervals. This ensures that the table is
eventually cleaned up, without impacting the performance of active
channels.

Fixes #2610 and #2702
t-bast added a commit that referenced this issue Oct 4, 2023
When a channel is closed, we can forget the data from historical HTLCs
sent and received through that channel (which is otherwise required to
punish cheating attempts by our peer).

We previously synchronously removed that data from the DB when the closing
transaction confirmed. However, this could create performance issues as
the `htlc_infos` table can be very large for busy nodes and many concurrent
writes may be happening at the same time.

We don't need to get rid of this data immediately: we only want to remove
it to avoid degrading the performance of active channels that read and
write to the `htlc_infos` table. We now mark channels as closed in a
dedicated table, and run a background actor that deletes batches of
obsolete htlc data at regular intervals. This ensures that the table is
eventually cleaned up, without impacting the performance of active
channels.

Fixes #2610 and #2702
t-bast added a commit that referenced this issue Nov 10, 2023
When a channel is closed, we can forget the data from historical HTLCs
sent and received through that channel (which is otherwise required to
punish cheating attempts by our peer).

We previously synchronously removed that data from the DB when the closing
transaction confirmed. However, this could create performance issues as
the `htlc_infos` table can be very large for busy nodes and many concurrent
writes may be happening at the same time.

We don't need to get rid of this data immediately: we only want to remove
it to avoid degrading the performance of active channels that read and
write to the `htlc_infos` table. We now mark channels as closed in a
dedicated table, and run a background actor that deletes batches of
obsolete htlc data at regular intervals. This ensures that the table is
eventually cleaned up, without impacting the performance of active
channels.

Fixes #2610 and #2702
t-bast added a commit that referenced this issue Nov 23, 2023
When a channel is closed, we can forget the data from historical HTLCs
sent and received through that channel (which is otherwise required to
punish cheating attempts by our peer).

We previously synchronously removed that data from the DB when the closing
transaction confirmed. However, this could create performance issues as
the `htlc_infos` table can be very large for busy nodes and many concurrent
writes may be happening at the same time.

We don't need to get rid of this data immediately: we only want to remove
it to avoid degrading the performance of active channels that read and
write to the `htlc_infos` table. We now mark channels as closed in a
dedicated table, and run a background actor that deletes batches of
obsolete htlc data at regular intervals. This ensures that the table is
eventually cleaned up, without impacting the performance of active
channels.

Fixes #2610 and #2702
t-bast added a commit that referenced this issue Jan 18, 2024
When a channel is closed, we can forget the data from historical HTLCs
sent and received through that channel (which is otherwise required to
punish cheating attempts by our peer).

We previously synchronously removed that data from the DB when the closing
transaction confirmed. However, this could create performance issues as
the `htlc_infos` table can be very large for busy nodes and many concurrent
writes may be happening at the same time.

We don't need to get rid of this data immediately: we only want to remove
it to avoid degrading the performance of active channels that read and
write to the `htlc_infos` table. We now mark channels as closed in a
dedicated table, and run a background actor that deletes batches of
obsolete htlc data at regular intervals. This ensures that the table is
eventually cleaned up, without impacting the performance of active
channels.

Fixes #2610 and #2702
t-bast added a commit that referenced this issue Feb 14, 2024
When a channel is closed, we can forget the data from historical HTLCs
sent and received through that channel (which is otherwise required to
punish cheating attempts by our peer).

We previously synchronously removed that data from the DB when the closing
transaction confirmed. However, this could create performance issues as
the `htlc_infos` table can be very large for busy nodes and many concurrent
writes may be happening at the same time.

We don't need to get rid of this data immediately: we only want to remove
it to avoid degrading the performance of active channels that read and
write to the `htlc_infos` table. We now mark channels as closed in a
dedicated table, and run a background actor that deletes batches of
obsolete htlc data at regular intervals. This ensures that the table is
eventually cleaned up, without impacting the performance of active
channels.

When a splice transaction confirms, all the revoked commitment transactions
that only applied to the previous funding transaction cannot be published
anymore, because the previous funding output has already been spent.

We can thus forget all the historical HTLCs that were included in those
commitments, because we will never need to generate the corresponding
penalty transactions.

This ensures that the growth of our DB is bounded, and will shrink every
time a splice transaction is confirmed.

Fixes #2610, #2702 and #2740
@t-bast
Copy link
Member

t-bast commented Feb 14, 2024

Fixed by #2705

@t-bast t-bast closed this as completed Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants