Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lotus daemon --import-chain errors out #11802

Open
5 of 11 tasks
rjan90 opened this issue Apr 2, 2024 · 4 comments · May be fixed by #12830
Open
5 of 11 tasks

Lotus daemon --import-chain errors out #11802

rjan90 opened this issue Apr 2, 2024 · 4 comments · May be fixed by #12830
Assignees
Labels
kind/bug Kind: Bug

Comments

@rjan90
Copy link
Contributor

rjan90 commented Apr 2, 2024

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus fvm/fevm - Lotus FVM and FEVM interactions
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt/WinningPoSt)
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

Lotus v1.26.0

Repro Steps

  1. Run lotus daemon --import-chain=/xxxx/xxxxx/xxxx.car.zst --halt-after-import
  2. Wait while its importing
  3. Get the error:
2024-04-02T11:26:27.456+0200	INFO	main	lotus/daemon.go:599	setting genesis
2024-04-02T11:26:27.457+0200	INFO	chainstore	store/store.go:683	New heaviest tipset! [bafy2bzacecnamqgqmifpluoeldx7zzglxcljo6oja4vrmtj7432rphldpdmm2] (height=0)
2024-04-02T11:26:27.458+0200	WARN	chainstore	store/store.go:711	no previous heaviest tipset found, using [bafy2bzacecnamqgqmifpluoeldx7zzglxcljo6oja4vrmtj7432rphldpdmm2]
2024-04-02T11:26:27.460+0200	INFO	drand	drand/drand.go:114	drand beacon without pubsub
2024-04-02T11:26:27.460+0200	WARN	chainstore	store/store.go:668	reorgWorker quit
2024-04-02T11:26:27.483+0200	INFO	badgerbs	v2@v2.2007.4/db.go:1027	Storing value log head: {Fid:115 Len:33 Offset:107228516}
---------
2024-04-02T11:26:30.040+0200	INFO	badgerbs	v2@v2.2007.4/db.go:550	Force compaction on level 0 done
ERROR: failed to construct beacon schedule: creating drand beacon: creating drand client: no points of contact specified

Describe the Bug

The lotus daemon --import-chain cmd currently fails with:

2024-04-02T11:26:27.456+0200	INFO	main	lotus/daemon.go:599	setting genesis
2024-04-02T11:26:27.457+0200	INFO	chainstore	store/store.go:683	New heaviest tipset! [bafy2bzacecnamqgqmifpluoeldx7zzglxcljo6oja4vrmtj7432rphldpdmm2] (height=0)
2024-04-02T11:26:27.458+0200	WARN	chainstore	store/store.go:711	no previous heaviest tipset found, using [bafy2bzacecnamqgqmifpluoeldx7zzglxcljo6oja4vrmtj7432rphldpdmm2]
2024-04-02T11:26:27.460+0200	INFO	drand	drand/drand.go:114	drand beacon without pubsub
2024-04-02T11:26:27.460+0200	WARN	chainstore	store/store.go:668	reorgWorker quit
2024-04-02T11:26:27.483+0200	INFO	badgerbs	v2@v2.2007.4/db.go:1027	Storing value log head: {Fid:115 Len:33 Offset:107228516}
---------
2024-04-02T11:26:30.040+0200	INFO	badgerbs	v2@v2.2007.4/db.go:550	Force compaction on level 0 done
ERROR: failed to construct beacon schedule: creating drand beacon: creating drand client: no points of contact specified

This error is unrelated to the drand quicknet change, and is pointing towards that the cmd is currently broken. The issue was discovered during nv22-testing, where I wrongly used the --import-chain cmd, when I wanted to actually use the --import-snapshot cmd.

Logging Information

N/A
@rjan90 rjan90 added the kind/bug Kind: Bug label Apr 2, 2024
@rjan90 rjan90 added this to FilOz Apr 2, 2024
@rjan90 rjan90 moved this to 📌 Triage in FilOz Apr 2, 2024
@rjan90
Copy link
Contributor Author

rjan90 commented Apr 23, 2024

@rjan90 To follow up with testing importing a Lotus exported snapshot

@rjan90 rjan90 moved this from 📌 Triage to 🐱Todo in FilOz Apr 23, 2024
@rjan90 rjan90 self-assigned this May 15, 2024
@rvagg
Copy link
Member

rvagg commented Jan 20, 2025

I bumped in to this today, but by tinkering with the lotus-shed gas-estimation tool, same issue. Some related threads about drand stuff:

Here's what's going on:

  1. Every different drand beacon needs a way of fetching the randomness, on mainnet we have 3 throughout the history of the chain
  2. In reality we only need to be using the latest beacon live, we've used the rest of the beacons to fetch randomness and baked it into the chain so we should never need to fetch fresh randomness for arbitrary historical points
  3. We still set up a beacon for each of these different points in history even though we only use one
  4. Initialising a beacon requires there to be a client, i.e. a way of talking to the remote beacon API, otherwise it fails, here: https://github.com/drand/drand/blob/v1.5.11/client/client.go#L52-L53
  5. Our configs only have HTTP endpoints set for the latest drand beacon (quicknet), the others don't, so when we initialise the other 2, we don't provide any options with clients and we get the above error.
  6. But, the important bit: it works when you're running Lotus daemon because we are able to provide a PubSub instance, which can be set up as a "client":
    opts = append(opts, gclient.WithPubsub(ps))
    } else {
    log.Info("drand beacon without pubsub")
    }
    client, err := dclient.Wrap(clients, opts...)

The other places we run BeaconScheduleFromDrandSchedule and don't provide a PubSub (last argument is nil), we encounter this problem.

These old networks don't exist anymore (at least incentinet certainly doesn't), so even pubsub should fail if it was trying to actually use it. The endpoints were removed from config because they shouldn't even work anymore.

Possible solutions:

  • I don't think we should need these old beacons, maybe if we don't have a client we just don't initialise it? If there's something about the beacon we need in order to deal with ancient tipsets, then maybe we could get this addressed on the drand side so it's more lax about needing a client to initialise?
    • There might be some deeper surgery needed here to unwind the chain of beacons, why do we have them all? Were they just not retired? There's probably a short period of time after a switch where we need the old one but now we have 2 old ones that shouldn't be used.
  • Provide a fake client just to make it setup? Alternatively a fake pubsub might do the trick?

@rvagg
Copy link
Member

rvagg commented Jan 20, 2025

Digging a bit further, I believe we need ancient beacons in order to validate old block headers. Our syncer does a ValidateBlock and part of the calls ValidateBlockValues which uses the schedule to find the beacon for that epoch and does a VerifyEntry on that which calls VerifyBeacon (in the drand code).

So we're back to this problem of no network so being unable to initialise drand beacons where we no longer want (or can) talk to them.

@AnomalRoil can you advise on the right way forward here? See my previous comment for context. Do we need to fake a network client to init? Should I open an issue over on drand to make it possible to init without a client? Would drand@2.x fix this?

@rvagg
Copy link
Member

rvagg commented Jan 20, 2025

Noop client being attempted here: #12830

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Kind: Bug
Projects
Status: 🐱 Todo
Development

Successfully merging a pull request may close this issue.

2 participants