Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue regen state #7002

Closed
twoeths opened this issue Aug 5, 2024 · 3 comments · Fixed by #7033
Closed

Performance issue regen state #7002

twoeths opened this issue Aug 5, 2024 · 3 comments · Fixed by #7033
Assignees
Labels
meta-bug Issues that identify a bug and require a fix.

Comments

@twoeths
Copy link
Contributor

twoeths commented Aug 5, 2024

Describe the bug

Reload state + regen takes ~13s on mainnet

Screenshot 2024-08-05 at 09 48 35

Received gossip block should be logged before the validation time, it showed 13.2s as validation time

126079:Jul-30 01:41:31.895[network]         debug: Received gossip block slot=9623304, root=0x0954…7537, curentSlot=9623305, peerId=16Uiu2HAmNCh2yBozfk5dujuWpUWCXzQdXbUK3VoVcHg1S8ZRW4Rt, delaySec=7.585000038146973, pending=null, haveBlobs=6, expectedBlobs=6, recvToValLatency=0.07299995422363281, recvToValidation=13.309999942779541, validationTime=13.236999988555908

it mean the validation started at ~01:41:18.658

2s was for reloading state

125805:Jul-30 01:41:19.084[chain]           debug: Reload: read state successful persistedKey=0xb796040000000000d572d4cf29fadad5ee3bd7fcfe2d9e8ab762d44d3fb58d36ac06682b36b34a42
125822:Jul-30 01:41:19.084[chain]         verbose: Reload: use block state as seed state slot=9623302
125823:Jul-30 01:41:19.084[chain]           debug: Reload: found seed state persistedKey=0xb796040000000000d572d4cf29fadad5ee3bd7fcfe2d9e8ab762d44d3fb58d36ac06682b36b34a42, seedSlot=9623302
125824:Jul-30 01:41:20.945[chain]           debug: Reload: cached state load successful persistedKey=0xb796040000000000d572d4cf29fadad5ee3bd7fcfe2d9e8ab762d44d3fb58d36ac06682b36b34a42, stateSlot=9623264, stateRoot=0xcdbc3aa5ffa8e44273870bf29cb4686bddcb4448d7b12f715a77d2b8f7d41ce7, seedSlot=9623302

8.6s was for regen state by running through 23 blocks

125825:Jul-30 01:41:20.945[chain]           debug: Replaying blocks to get state stateRoot=0x49871dd7b44f68e98c7cb0677a1b25bdac0ab05fe62dada2aa0ccdcb37f7884f, replaySlots=9623287,9623286,9623285,9623284,9623283,9623282,9623281,9623280,9623279,9623278,9623277,9623276,9623275,9623274,9623273,9623272,9623271,9623270,9623269,9623268,9623267,9623266,9623265
126064:Jul-30 01:41:29.532[chain]           debug: Replayed blocks to get state stateRoot=0x49871dd7b44f68e98c7cb0677a1b25bdac0ab05fe62dada2aa0ccdcb37f7884f, replaySlots=9623287,9623286,9623285,9623284,9623283,9623282,9623281,9623280,9623279,9623278,9623277,9623276,9623275,9623274,9623273,9623272,9623271,9623270,9623269,9623268,9623267,9623266,9623265

remaining time (~4.1s) was to run through epoch transition because block slot 9623287 and 9623304 stay on different epoch

126107:Jul-30 01:41:33.645[chain]         verbose: Block processed slot=9623304, root=0x09548c9b217164e5386eb45c9324918f4b5407670cbcd8e7fa5b806dbcc57537, delaySec=22.644999980926514

part of #5968

Expected behavior

  • Separate the log Received gossip block to Received gossip block and Validated gossip block
  • Average block processed time is 8.6s/23 = 374ms which is too much. Normally it should be <= 70ms
  • All of the above information should be tracked by metrics

Steps to reproduce

Run with minimal config of n-historical state: 'chain.nHistoricalStates: true','chain.maxBlockStates: 1','chain.maxCPStateEpochsInMemory: 0'

Additional context

No response

Operating system

Linux

Lodestar version or commit hash

v1.20.0

@twoeths twoeths added the meta-bug Issues that identify a bug and require a fix. label Aug 5, 2024
@twoeths
Copy link
Contributor Author

twoeths commented Aug 5, 2024

The block process time is not bad in the above time range

Screenshot 2024-08-05 at 10 43 13

it's likely most of the time was spent for loading block from db + deserialize it + wait for nextEventLoop()

@twoeths twoeths self-assigned this Aug 5, 2024
@twoeths
Copy link
Contributor Author

twoeths commented Aug 26, 2024

Average block processed time is 8.6s/23 = 374ms which is too much. Normally it should be <= 70ms

if we also compute hashTreeRoot() on beacon-node side, to process a block it takes ~120ms - 130ms on mainnet

this is on the stable mainnet node, v1.21.0

Screenshot 2024-08-26 at 08 35 00

@twoeths
Copy link
Contributor Author

twoeths commented Aug 26, 2024

this is getPreState() time on the feat4 mainnet node with PR #7033

Screenshot 2024-08-26 at 09 58 55

it was reduced from 13.2s (as in theDescription) to 7.29s which is likely the worse case scenario, this includes:

  • <= 1.5s for loading checkpoint state
  • <= 120ms * 31 blocks = 3.72s to reprocess blocks
  • 2s for epoch transition

this is for mainnet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta-bug Issues that identify a bug and require a fix.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant