Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN crash and gossip store corruption #7971

Open
MegalithicBTC opened this issue Jan 2, 2025 · 4 comments
Open

CLN crash and gossip store corruption #7971

MegalithicBTC opened this issue Jan 2, 2025 · 4 comments
Assignees
Milestone

Comments

@MegalithicBTC
Copy link

MegalithicBTC commented Jan 2, 2025

After running this image of CLN without (apparent) problems for about two weeks....

from elementsproject/lightningd:v24.11

CLN suddenly crashed with ...

lightning_connectd: common/gossmap.c:121: map_copy: Assertion `offset + len <= map->map_size' failed.
lightning_gossipd: common/gossmap.c:121: map_copy: Assertion `offset + len <= map->map_size' failed.
lightning_connectd: FATAL SIGNAL 6 (version v24.11)
lightning_gossipd: FATAL SIGNAL 6 (version v24.11)
0x581daff576d8 send_backtrace
	common/daemon.c:33

A more complete log showing the crash and the creation of gossip_store.corrupt is attached

cln-gossip-store-corrupt.txt

@JssDWt
Copy link
Contributor

JssDWt commented Jan 2, 2025

**BROKEN** connectd: STATUS_FAIL_INTERNAL_ERROR: FATAL SIGNAL
0x6260c41d8b69 map_copy
  common/gossmap.c:121
0x6260c41d8bab map_be16
  common/gossmap.c:142
0x6260c41daa45 map_catchup
  common/gossmap.c:705
0x6260c41dab95 gossmap_refresh_mayfail
  common/gossmap.c:1192
0x6260c41daca6 gossmap_refresh
  common/gossmap.c:1213
0x6260c41cee32 gossmap_manage_get_gossmap
  gossipd/gossmap_manage.c:1314
0x6260c41d0686 gossmap_manage_new_block
  gossipd/gossmap_manage.c:1221
0x6260c41cbfdd new_blockheight
  gossipd/gossipd.c:473
0x6260c41cc363 recv_req
  gossipd/gossipd.c:584
0x6260c41d6b1d handle_read
  common/daemon_conn.c:35
0x6260c43175b5 next_plan
  ccan/ccan/io/io.c:60
0x6260c4317a40 do_plan
  ccan/ccan/io/io.c:422
0x6260c4317af9 io_ready
  ccan/ccan/io/io.c:439
0x6260c4319446 io_loop
  ccan/ccan/io/poll.c:455
0x6260c41cccf4 main
  gossipd/gossipd.c:665
0x70508e88bd79 ???
  ???:0
0x6260c41c9d99 ???
  ???:0
0xffffffffffffffff ???
  ???:0
lightning_gossipd: FATAL SIGNAL (version v24.11)
0x6260c41d682a send_backtrace
  common/daemon.c:33
0x6260c41e098b status_failed
  common/status.c:221
0x6260c41e0b41 status_backtrace_exit
  common/subdaemon.c:18
0x6260c41d68b8 crashdump
  common/daemon.c:78
0x70508ea6913f ???
  ???:0
0x70508e8a0d51 ???
  ???:0
0x70508e88a536 ???
  ???:0
0x70508e88a40e ???
  ???:0
0x70508e8996d1 ???
  ???:0
0x6260c41d8b69 map_copy
  common/gossmap.c:121
0x6260c41d8bab map_be16
  common/gossmap.c:142
0x6260c41daa45 map_catchup
  common/gossmap.c:705
0x6260c41dab95 gossmap_refresh_mayfail
  common/gossmap.c:1192
0x6260c41daca6 gossmap_refresh
  common/gossmap.c:1213
0x6260c41cee32 gossmap_manage_get_gossmap
  gossipd/gossmap_manage.c:1314
0x6260c41d0686 gossmap_manage_new_block
  gossipd/gossmap_manage.c:1221
0x6260c41cbfdd new_blockheight
  gossipd/gossipd.c:473
0x6260c41cc363 recv_req
  gossipd/gossipd.c:584
0x6260c41d6b1d handle_read
  common/daemon_conn.c:35
0x6260c43175b5 next_plan
  ccan/ccan/io/io.c:60
0x6260c4317a40 do_plan
  ccan/ccan/io/io.c:422
0x6260c4317af9 io_ready
  ccan/ccan/io/io.c:439
0x6260c4319446 io_loop
  ccan/ccan/io/poll.c:455
0x6260c41cccf4 main
  gossipd/gossipd.c:665
0x70508e88bd79 ???
  ???:0
0x6260c41c9d99 ???
  ???:0
0xffffffffffffffff ???
  ???:0
lightningd: connectd failed (exit status 242), exiting.

@MegalithicBTC
Copy link
Author

I think it happened again:

lightning_gossipd: gossip_store: get delete entry offset 3771339/105791801 (version v24.11)
0x577413b1182a send_backtrace
  common/daemon.c:33
0x577413b1b98b status_failed
  common/status.c:221
0x577413b08b8b gossip_store_get_with_hdr
  gossipd/gossip_store.c:466
0x577413b0911c gossip_store_set_timestamp
  gossipd/gossip_store.c:592
0x577413b0a7b4 process_channel_update
  gossipd/gossmap_manage.c:777
0x577413b0b10b gossmap_manage_channel_update
  gossipd/gossmap_manage.c:905
0x577413b07942 handle_recv_gossip
  gossipd/gossipd.c:210
0x577413b07a48 connectd_req
  gossipd/gossipd.c:302
0x577413b11b1d handle_read
  common/daemon_conn.c:35
0x577413c525b5 next_plan
  ccan/ccan/io/io.c:60
0x577413c52a40 do_plan
  ccan/ccan/io/io.c:422
0x577413c52af9 io_ready
  ccan/ccan/io/io.c:439
0x577413c54446 io_loop
  ccan/ccan/io/poll.c:455
0x577413b07cf4 main
  gossipd/gossipd.c:665
0x731bcb65dd79 ???
  ???:0
0x577413b04d99 ???
  ???:0
0xffffffffffffffff ???
  ???:0
2025-01-25T15:26:19.784Z **BROKEN** gossipd: gossip_store: get delete entry offset 3771339/105791801 (version v24.11)
2025-01-25T15:26:19.784Z **BROKEN** gossipd: backtrace: common/daemon.c:38 (send_backtrace) 0x577413b11872
2025-01-25T15:26:19.784Z **BROKEN** gossipd: backtrace: common/status.c:221 (status_failed) 0x577413b1b98b
2025-01-25T15:26:19.784Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:466 (gossip_store_get_with_hdr) 0x577413b08b8b
2025-01-25T15:26:19.784Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:592 (gossip_store_set_timestamp) 0x577413b0911c
2025-01-25T15:26:19.785Z **BROKEN** gossipd: backtrace: gossipd/gossmap_manage.c:777 (process_channel_update) 0x577413b0a7b4
2025-01-25T15:26:19.785Z **BROKEN** gossipd: backtrace: gossipd/gossmap_manage.c:905 (gossmap_manage_channel_update) 0x577413b0b10b
2025-01-25T15:26:19.785Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:210 (handle_recv_gossip) 0x577413b07942
2025-01-25T15:26:19.785Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:302 (connectd_req) 0x577413b07a48
2025-01-25T15:26:19.785Z **BROKEN** gossipd: backtrace: common/daemon_conn.c:35 (handle_read) 0x577413b11b1d
2025-01-25T15:26:19.785Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:60 (next_plan) 0x577413c525b5
2025-01-25T15:26:19.785Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:422 (do_plan) 0x577413c52a40
2025-01-25T15:26:19.785Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:439 (io_ready) 0x577413c52af9
2025-01-25T15:26:19.785Z **BROKEN** gossipd: backtrace: ccan/ccan/io/poll.c:455 (io_loop) 0x577413c54446
2025-01-25T15:26:19.785Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:665 (main) 0x577413b07cf4
2025-01-25T15:26:19.785Z **BROKEN** gossipd: backtrace: (null):0 ((null)) 0x731bcb65dd79
2025-01-25T15:26:19.786Z **BROKEN** gossipd: backtrace: (null):0 ((null)) 0x577413b04d99
2025-01-25T15:26:19.786Z **BROKEN** gossipd: backtrace: (null):0 ((null)) 0xffffffffffffffff
2025-01-25T15:26:19.786Z **BROKEN** gossipd: STATUS_FAIL_INTERNAL_ERROR: gossip_store: get delete entry offset 3771339/105791801

@rustyrussell rustyrussell self-assigned this Jan 28, 2025
@rustyrussell rustyrussell added this to the v25.02 milestone Jan 28, 2025
@rustyrussell
Copy link
Contributor

I cannot access the attachment?

Also, what's the filesystem where you're putting the gossip_store file? I've just done an audit and I cannot see how this would happen, but I'm double-checking now.

@MegalithicBTC
Copy link
Author

MegalithicBTC commented Jan 30, 2025

Hm, seems like GitHub didn't like the attachment. Here it is on S3: https://rizful-public.s3.us-east-1.amazonaws.com/temp/gossip_store-corrupt.zip

CLN is running in docker, please see the Dockerfile below

The entire CLN directory is on a ZFS mirror, running on Ubuntu 22 -- the mirror is two drives that ZFS mirrors to look like one. But as far as I know this should only reduce the chances of data corruption because ZFS mirrors are (supposedly) so rock-solid.

Dockerfile


#from elementsproject/lightningd:v24.02-amd64
from elementsproject/lightningd:v24.11
RUN echo root/extras/foo.bar
RUN pwd
COPY ./extras root/extras
# WORKDIR usr/local/libexec/c-lightning/plugins/clnrest/
# RUN pip install -r requirements.txt
# RUN chmod 1777 /tmp
RUN apt-get update
WORKDIR /root/extras/
RUN dpkg -i migrate.linux-amd64.deb
RUN rm -rf /usr/local/go && tar -C /usr/local -xzf /root/extras/go1.22.0.linux-amd64.tar.gz
ENV PATH=$PATH:/usr/local/go/bin
WORKDIR /root/extras/lspd/
#RUN go get github.com/breez/lspd
#RUN go build .
# RUN go get github.com/breez/lspd/cln_plugin
#RUN go build -o lspd_plugin ./cln_plugin/cmd
# Update package lists and install necessary packages
RUN apt-get update && apt-get install -y \
    build-essential \
    libssl-dev \
    pkg-config \
    curl \
    git \
    python3-pip
#RUN make release-all
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
ENV PATH="/root/.cargo/bin:${PATH}"
ENV RABBITMQ_URL=REDACTED
RUN pip install pika
RUN pip install pyln-client
RUN pip install prettytable
RUN apt-get install python3-json5 python3-flask python3-gunicorn -y
RUN pip3 install --user flask-cors flask-restx pyln-client flask-socketio gevent gevent-websocket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants