-
Notifications
You must be signed in to change notification settings - Fork 942
lightning-rpc file isn't created #8035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Strange thing is that when I checked again today, When I deleted it and restarted the container, it doesn't recreate it. |
I think tracking this issue might be hard for you. If some of you have a commit I can cherry pick adding printf logs left and right to narrow down the issue, I can build it and tweak it myself to help find the issue. |
This implies gossipd isn't starting up properly: we expect to see "io_break: gossipd_init_done". The next line, in fact, should be " DEBUG gossipd: Store compact time: NNN msec". Is it stuck reading the gossip store somehow? Is it large and/or strange? |
Thanks for looking at it @rustyrussell So here are the logs that have shown after 20 min or so, it actually seems cln restart continuously. The Additional context: We run on pruned nodes. Also
Are logs coming from our docker container, you can dismiss this. However, the fact we see that mean that the
|
May be related to #7724 |
Would be awesome to get this unblocked, since BitcoinSmiles initiative is eagerly waiting to start a new campaign, but their node is down for quite a while and blocked by this (they couldn't receive new donations for weeks now) and it would really be bad to push them into using custodial node solution or an alternative implementation, since CLN worked for them great since 2021, plus there's still funds and channels there. Thanks you guys! |
OK, I've got it. It's actually related to this:
We stall there and don't make more progress. So gossipd itself doesn't see the entire gossip_store. Then things get really batshit:
This took 1429 seconds to process. Why? Because it hasn't been processing the gossip store fully, gossipd kept adding "new" records to the end:
It has 31GB of gossip in there! No wonder it took so long, and no wonder topology plugin (which tried to digest that crap) timed out. |
Hmm, can you send me the first 2MB of the gossip_store file BTW? And your node id? You should delete the gossip store once this is done, BTW, and this problem should resolve itself (for now: I'm still interested in HOW we got here...) |
Indeed, the gossip store is reaching 30GB. @rustyrussell here is the first 2MB. https://aois.blob.core.windows.net/public/gossip_store_truncated |
Deleted the 30GB gosspi_store. Restarted, now it is showing this and lightning-rpc still not available. Those logs keep showing as if stuck in an infinite loop.
|
So I tried to run I reviewed the code and could not really find anything weird, you use Something seems wrong in the call of EDIT: I believe it is because you always ask the block to |
After:
My |
While this shouldn't happen, it does (pending other fixes), and we stop reading the gossip store until next time. The result is partial gossip, demonstrated beautifully by NicolasDorier's report: ``` lightning_gossipd: gossmap: redundant channel_announce for 864063x1306x1, offsets 1272259 and 1784859!" ``` Gossipd stalld there and don't make more progress. So gossipd itself doesn't see the entire gossip_store. Then things get really batshit: ``` 2025-02-04T05:53:28.582Z DEBUG gossipd: Store compact time: 1429910 msec ``` This took 1429 seconds to process. Why? Because it hasn't been processing the gossip store fully, gossipd kept adding "new" records to the end: ``` 2025-02-04T05:53:28.583Z DEBUG gossipd: gossip_store: Read 62716143/1739952/5158256/0 cannounce/cupdate/nannounce/delete from store in 31634458462 bytes, now 31634458440 bytes (populated=true) ``` It has 31GB of gossip in there! No wonder it took so long... Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Fixes: ElementsProject#8035 Changelog-Fixed: gossipd: corruption in the gossip_store could cause ever-longer startup times and no gossip updates.
I'm not sure if this is the same problem, but the lightning node on my BTCPay server crashes periodically. Here's the log from the latest crash:
And here's the last set of lines from the crash log file:
|
I think I may have been bitten by this bug too. One of my channel peers informed me that my node had been hammering out payment attempts that were all failing locally on their node due to insufficient fees. They kindly requested that I investigate, as my node was effectively spamming their logs. As it turned out, the payment attempts were due to my Sling plugin attempting to rebalance some of my channels through their node. They pointed out, quite correctly I believe, that it appeared as though my node was constructing routes using severely outdated gossip. My |
Good to see this marked as fixed. I'm looking forward to the next release. Is it worth deleting gosspi_store in the meantime? My node is still unstable. |
It seems that several of BTCPay Server users are experiencing an issue with core-lightning 24.08.2.
It is hard to say when it started, but I have this issue now on two servers.
Core lightning seems to be running fine, but the
lightning-rpc
isn't created.Config
Logs
Note that after
DEBUG hsmd: new_client: 0
, nothing happen. No more logs ever happen even withlog-level=debug
.The text was updated successfully, but these errors were encountered: