Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node unable to reach bootstrap peers #1666

Open
masch1na opened this issue May 31, 2023 · 3 comments
Open

Node unable to reach bootstrap peers #1666

masch1na opened this issue May 31, 2023 · 3 comments

Comments

@masch1na
Copy link

masch1na commented May 31, 2023

Hi all,

I keep having intermittent issue with my node. It works fine for multiple weeks, then suddenly crashes. When I reboot it, it's not coming back up. I do nothing to fix it except wait. I reboot it the next day or so and it will come back up.

I am having the same issue now again after downloading new 2.19.1 version. It was working fine for couple of hours and then it shut down. After rebooting it, the docker container shuts itself down in about 10 seconds and I see the following error in the logs. I am aware the logs say bootstrap nodes aren't reachable, and of course the issue could be on my end, but I am doing nothing to fix this except waiting and rebooting later and then it starts working. I am running default config.

2023-05-31T12:04:56.855Z [Error] [chainwebVersion=mainnet01|cluster=docker-node|peerId=tWFImq|port=1789|host=x.x.x.x|type=ChainwebApp] Only 0 out of 12 bootstrap peers are reachable.Required number of reachable bootstrap nodes: 6 2023-05-31T12:04:57.976Z [Error] [chainwebVersion=mainnet01|cluster=docker-node|type=ChainwebStatus] {"tag":"ProcessDied","contents":"ReachabilityException (Expected {getExpected = 6}) (Actual {getActual = 0})"} chainweb-node: ReachabilityException (Expected {getExpected = 6}) (Actual {getActual = 0}) <<ghc: 193707528 bytes, 13 GCs, 3969660/9231544 avg/max bytes residency (13 samples), 413M in use, 0.004 INIT (0.004 elapsed), 7.311 MUT (8.508 elapsed), 2.300 GC (0.200 elapsed) :ghc>>

I kept getting this error for about a day until now, when it suddenly started syncing after rebooting it. I haven't done anything to fix this... just reboot.

I thought you guys should be aware of this.

EDIT: I just noticed that as my node started working, my ISP issued a new IP. I am using dynamic IP with DDNS service. I eventually update my node config to use DDNS hostname instead of my IP. As mentioned though, my node was using default config while it wasn't working. Not sure why would the old IP be bad and new IP be good. Mentioning this because the timing of this is interesting. I am eventually updating my node config to use my DDNS hostname instead of my IP after which it runs fine for multiple weeks during which time my IP changes a lot of times.

@chessai
Copy link
Contributor

chessai commented Jun 12, 2023

I'm seeing crashes from time to time too. This is one of the messages I see when the vm goes offline (I am reproducing it from my human memory, not copy paste):
CPU#3 stuck for 25s! [chainweb-node:5368]
CPU#2 stuck for 30s! [chainweb-node:5368]

I had some issues sometimes re-syncing afterwards but going away faster than with the poster above.

This seems like a separate problem. Could you open another issue @trendzetter?

@masch1na
Copy link
Author

Just had this issue appear to me again. My node has been up for 3 weeks, suddenly stopped working with errors like this

2023-07-19T21:32:16.445Z [Warn] [chainwebVersion=mainnet01|cluster=docker-node|peerId=ia0FnS|port=1789|host=78.98.249.53|chain=17|component=mempool-sync|type=ChainwebApp] failed to sync peers from fr1.chainweb.com:443#: FailureResponse (Request {requestPath = (BaseUrl {baseUrlScheme = Https, baseUrlHost = "fr1.chainweb.com", baseUrlPort = 443, baseUrlPath = ""},"/chainweb/0.0/mainnet01/chain/17/mempool/peer"), requestQueryString = fromList [], requestBody = Just ((),application/json;charset=utf-8), requestAccept = fromList [], requestHeaders = fromList [("X-Chainweb-Node-Version","2.19.1")], requestHttpVersion = HTTP/1.1, requestMethod = "PUT"}) (Response {responseStatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, responseHeaders = fromList [("Transfer-Encoding","chunked"),("Date","Wed, 19 Jul 2023 21:32:15 GMT"),("Server","Warp/3.3.25"),("X-Server-Timestamp","1689802336"),("X-Peer-Addr","178.40.242.251:62564"),("X-Chainweb-Node-Version","2.19.1"),("Content-Type","text/plain;charset=utf-8")], responseHttpVersion = HTTP/1.1, responseBody = "Invalid hostaddress: IsNotReachable (PeerInfo {_peerId = Just (PeerId \"\\137\\173\\ENQ\\157!\\191Ar\\147f}\\162\\227\\152?H\\145\\239\\\"\\FS\\158Q\\156\\196\\208\\144#~\\208A\\182\\157\"), _peerAddr = HostAddress {_hostAddressHost = 78.98.249.53, _hostAddressPort = 1789}}) \"\\\"HttpExceptionRequest Request {\\\\n host = \\\\\\\"78.98.249.53\\\\\\\"\\\\n port = 1789\\\\n secure = True\\\\n requestHeaders = [(\\\\\\\"X-Chainweb-Node-Version\\\\\\\",\\\\\\\"2.19.1\\\\\\\")]\\\\n path = \\\\\\\"/chainweb/0.0/mainnet01/chain/17/mempool/peer\\\\\\\"\\\\n queryString = \\\\\\\"\\\\\\\"\\\\n method = \\\\\\\"GET\\\\\\\"\\\\n proxy = Nothing\\\\n rawBody = False\\\\n redirectCount = 10\\\\n responseTimeout = ResponseTimeoutMicro 2000000\\\\n requestVersion = HTTP/1.1\\\\n proxySecureMode = ProxySecureWithConnect\\\\n}\\\\n ConnectionTimeout\\\"\""})

To fix this I rebooted my node, waited overnight and next day it was back up, syncing and responding.

So not sure what the deal is, again, it was up and working fine for 3 weeks and all I had to do to fix this was to reboot it and wait.

Again, I have dynamic IP, but I have my node configured to use my DDNS hostname instead of IP. I did this by editing
p2p:
hostaddress:
hostname

There might be an easy "way out" by just saying you aren't supporting dynamic IP solutions, as in the requirements, it does say static IP is required. I'm not sure how much and if even DDNS with static hostname differs from static IP.

This is not a deadly critical issue for me, I just thought you guys should be aware that sometimes my node out of nowhere stops working and to fix this I just reboot and wait.

@chessai
Copy link
Contributor

chessai commented Jan 9, 2024

@masch1na does this still happen for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants