-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block production node crashed (Berkeley Testnet Release 2.0.0rampup5 (ITN))- concurrent map writes #14345
Comments
Same here. Adding my crash report. coda_crash_report_2023-10-15_07-50-17.615640.tar.gz Configuration: AMD EPYC 7313P 16C / 128 GB Memory |
AMD Ryzen 7 3700X, 64 RAM. Docker. |
Also same crash trigger, my report : Server info : 8C / 30G M |
It might be related to the fact that the network is configured to start blocks production on Oct 17th, 4pm UTC and I'm not sure if the corresponding fix to not crash the Daemon if connected earlier was actually rolled out. |
It shows: "fatal error: concurrent map writes" in the second screenshot. @garethtdavies noticed it before. @shimkiv |
Yeah I saw that, thanks, I'm just making assumptions. Team will get to it next week I believe. |
The fix in #14328 has not been merged and so is not in the ITN images. |
+1 also crashing with same |
These crashes should cease after the genesis time, which is on October 17 at 0900 US Pacific. |
Thanks for the update |
Curious why this behavior can vary between nodes, i.e., I've only seen this on one of my nodes, which resolved itself. To be clear, this is the concurrent map writes error and not the offline crash after ~30 mins. |
The concurrent map writes error is coming from the Go code in libp2p. Different version of the Go compiler + libraries, maybe? If you're using Docker images, then of course, that explanation doesn't fly. |
I can confirm that the Block producer is now working without crashing. |
@vanphandinh Are you still seeing these crashes? If not, may I close the issue? |
No crash now, thanks |
I ran the (short) libp2p test with the
|
Several other data races:
|
Race conditions were fixed in #14467. Closing. |
Preliminary Checks
Description
Screenshot:
Video:
https://drive.google.com/file/d/1-iOrXoYNr0kxTOieoR3oHXUe7H-WCjhu/view?usp=sharing
Log file:
[coda_crash_report_2023-10-15_03-18-50.632638.tar.gz]
(https://github.com/MinaProtocol/mina/files/12908537/coda_crash_report_2023-10-15_03-18-50.632638.tar.gz)
The CPU was running at 100% before crashing
Steps to Reproduce
I've done as normal
Expected Result
The daemon will be restarted every 25 minutes
Actual Result
The daemon is restarted very frequently.
fatal error: concurrent map writes
How frequently do you see this issue?
Frequently
What is the impact of this issue on your ability to run a node?
High
Status
Mina daemon status ----------------------------------- Max observed block height: 1 Max observed unvalidated block height: 0 Local uptime: 5m42s Chain id: 332c8cc05ba8de9efc23a011f57015d8c9ec96fac81d5d3f7a06969faf4bce92 Git SHA-1: 55b78189c46e1811b8bdb78864cfa95409aeb96a Configuration directory: /root/.mina-config Peers: 31 User_commands sent: 0 SNARK worker: None SNARK work fee: 100000000 Sync status: Bootstrap Block producers running: 1 (B62qptU47zPwn1v4UgQxwWXKzo5XS7T6iHyw5pgbu2XMg8dSEHaQYv6) Coinbase receiver: Block producer Consensus time now: epoch=0, slot=0 Consensus mechanism: proof_of_stake Consensus configuration: Delta: 0 k: 290 Slots per epoch: 7140 Slot duration: 3m Epoch duration: 14d21h Chain start timestamp: 2023-10-17 16:01:01.000000Z Acceptable network delay: 3m Addresses and ports: External IP: 1.52.13.145 Bind IP: 0.0.0.0 Libp2p PeerID: 12D3KooWGTubUMM7h6TDG2GE13xW2abF2JQ4YF8XZGcCWDTKLzaT Libp2p port: 8303 Client port: 8301 Metrics: block_production_delay: 7 (0 0 0 0 0 0 0) transaction_pool_diff_received: 0 transaction_pool_diff_broadcasted: 0 transactions_added_to_pool: 0 transaction_pool_size: 0 snark_pool_diff_received: 0 snark_pool_diff_broadcasted: 0 pending_snark_work: 0 snark_pool_size: 0
Additional information
No response
The text was updated successfully, but these errors were encountered: