Fix memory usage of bbolt block header index #213

guggero · 2021-03-01T13:30:35Z

Fixes #196.

With this PR we store the block index keys (hash->height) in sub-
buckets with the first two bytes of the hash as the bucket name.
Storing a large number of keys in the same bucket has a large
impact on memory usage in bbolt if small-ish batch sizes are
used (the b+ tree needs to be copied with every resize operation).
Using sub buckets is a compromise between memory usage and
access time. 2 bytes (=max 65535 sub buckets) seems to be the
sweet spot (-50% memory usage, +30% access time). We take the
bytes from the beginning of the byte-serialized hash since all
Bitcoin hashes are reverse-serialized when displayed as
strings. That means the leading zeroes of a block hash
are actually at the end of the byte slice.

As the benchmarks below show, the 2 byte prefix seems to be the sweet spot between memory usage, access speed and DB file size.

I also looked at other ways of reducing the memory footprint of the bbolt based index. The main culprit seems to be the relatively small batch size of 2k blocks per update. That number is limited by how many blocks a peer serving block headers is serving in one message and cannot easily be increased.
The only other way to increase the DB write batch size is to add a second level cache. But that would require more refactoring and could lead to possible de-synchronization of the index and the actual block file.

Benchmarks

All results are retrieved by running:

go test -v -run=. -bench=. -benchmem -memprofile=mem.out ./headerfs

System:

goos: linux
goarch: amd64
pkg: github.com/lightninglabs/neutrino/headerfs
cpu: AMD Ryzen 9 3900X 12-Core Processor

Test 1: control, no changes

Benchmark output

DB file size at cleanup: 39206912
BenchmarkWriteHeadersSmallBatch-24             1        15387606278 ns/op       7304628624 B/op  8107418 allocs/op
BenchmarkWriteHeadersMediumBatch
DB file size at cleanup: 46354432
BenchmarkWriteHeadersMediumBatch-24            1        6071342964 ns/op        5157273568 B/op  5878518 allocs/op
BenchmarkWriteHeadersLargeBatch
DB file size at cleanup: 63823872
BenchmarkWriteHeadersLargeBatch-24             1        3368511955 ns/op        2849549856 B/op  5009240 allocs/op
BenchmarkHeightLookupLatency
DB file size at cleanup: 5394432
BenchmarkHeightLookupLatency-24           979939              1676 ns/op             640 B/op         12 allocs/op

Max DB file size: ~63 MB

Memory usage (go tool pprof mem.out -> top):

Showing nodes accounting for 14.80GB, 96.93% of 15.27GB total
Dropped 41 nodes (cum <= 0.08GB)
Showing top 10 nodes out of 47
      flat  flat%   sum%        cum   cum%
    7.44GB 48.72% 48.72%     7.44GB 48.72%  go.etcd.io/bbolt.(*node).put
    4.82GB 31.56% 80.28%     4.82GB 31.56%  go.etcd.io/bbolt.(*node).read
    1.02GB  6.65% 86.93%     1.02GB  6.65%  go.etcd.io/bbolt.Open.func1
    0.40GB  2.64% 89.57%     0.40GB  2.64%  go.etcd.io/bbolt.(*Cursor).search
    0.39GB  2.55% 92.12%    14.70GB 96.26%  github.com/lightninglabs/neutrino/headerfs.writeRandomBatch
    0.25GB  1.63% 93.75%     5.07GB 33.19%  go.etcd.io/bbolt.(*Bucket).node
    0.21GB  1.36% 95.11%     0.27GB  1.79%  go.etcd.io/bbolt.(*DB).beginTx
    0.12GB   0.8% 95.90%     0.12GB   0.8%  go.etcd.io/bbolt.(*node).dereference
    0.10GB  0.64% 96.55%     1.27GB  8.34%  go.etcd.io/bbolt.(*Tx).allocate
    0.06GB  0.39% 96.93%    12.66GB 82.89%  go.etcd.io/bbolt.(*Bucket).Put

Test2: 1 byte sub bucket

Benchmark output

BenchmarkWriteHeadersSmallBatch
DB file size at cleanup: 38875136 bytes
BenchmarkWriteHeadersSmallBatch-24             1        17966398972 ns/op       7106311624 B/op 15420393 allocs/op
BenchmarkWriteHeadersMediumBatch
DB file size at cleanup: 46874624 bytes
BenchmarkWriteHeadersMediumBatch-24            1        6767009109 ns/op        5594956944 B/op  7798011 allocs/op
BenchmarkWriteHeadersLargeBatch
DB file size at cleanup: 64094208 bytes
BenchmarkWriteHeadersLargeBatch-24             1        3515226806 ns/op        2800528448 B/op  5739020 allocs/op
BenchmarkHeightLookupLatency
DB file size at cleanup: 5799936 bytes
BenchmarkHeightLookupLatency-24           829075              1985 ns/op             702 B/op         14 allocs/op

Max DB file size: ~64 MB

Memory usage (go tool pprof mem.out -> top):

Showing nodes accounting for 14.20GB, 96.02% of 14.79GB total
Dropped 37 nodes (cum <= 0.07GB)
Showing top 10 nodes out of 50
      flat  flat%   sum%        cum   cum%
    7.25GB 49.03% 49.03%     7.25GB 49.03%  go.etcd.io/bbolt.(*node).put
    4.25GB 28.71% 77.74%     4.25GB 28.71%  go.etcd.io/bbolt.(*node).read
    1.08GB  7.30% 85.04%     1.08GB  7.30%  go.etcd.io/bbolt.Open.func1
    0.39GB  2.65% 87.69%    14.34GB 96.95%  github.com/lightninglabs/neutrino/headerfs.writeRandomBatch
    0.38GB  2.59% 90.29%     0.38GB  2.59%  go.etcd.io/bbolt.(*Cursor).search
    0.32GB  2.17% 92.45%     4.57GB 30.88%  go.etcd.io/bbolt.(*Bucket).node
    0.15GB     1% 93.45%     1.38GB  9.33%  go.etcd.io/bbolt.(*Tx).allocate
    0.14GB  0.96% 94.41%     0.19GB  1.29%  go.etcd.io/bbolt.(*Bucket).openBucket
    0.14GB  0.94% 95.35%     0.20GB  1.33%  go.etcd.io/bbolt.(*DB).beginTx
    0.10GB  0.67% 96.02%     0.10GB  0.67%  go.etcd.io/bbolt.(*node).dereference

Test 3: 2 byte sub bucket

Benchmark output

DB file size at cleanup: 43966464 bytes
BenchmarkWriteHeadersSmallBatch-24             1        14826193397 ns/op       4140556080 B/op 16198594 allocs/op
BenchmarkWriteHeadersMediumBatch
DB file size at cleanup: 51122176 bytes
BenchmarkWriteHeadersMediumBatch-24            1        6134732835 ns/op        2248978232 B/op 14419852 allocs/op
BenchmarkWriteHeadersLargeBatch
DB file size at cleanup: 69414912 bytes
BenchmarkWriteHeadersLargeBatch-24             1        4380655964 ns/op        1854111016 B/op 14156142 allocs/op
BenchmarkHeightLookupLatency
DB file size at cleanup: 9207808 bytes
BenchmarkHeightLookupLatency-24           573646              2156 ns/op             835 B/op         14 allocs/op

Max DB file size: ~69 MB

Memory usage (go tool pprof mem.out -> top):

Showing nodes accounting for 8135.27MB, 91.56% of 8885.17MB total
Dropped 31 nodes (cum <= 44.43MB)
Showing top 10 nodes out of 62
      flat  flat%   sum%        cum   cum%
 2923.08MB 32.90% 32.90%  2923.08MB 32.90%  go.etcd.io/bbolt.(*node).read
 1466.33MB 16.50% 49.40%  1466.33MB 16.50%  go.etcd.io/bbolt.(*node).put
  842.56MB  9.48% 58.88%   842.56MB  9.48%  go.etcd.io/bbolt.(*Cursor).search
  607.21MB  6.83% 65.72%  3530.28MB 39.73%  go.etcd.io/bbolt.(*Bucket).node
  488.10MB  5.49% 71.21%   618.11MB  6.96%  go.etcd.io/bbolt.(*Bucket).openBucket
  472.04MB  5.31% 76.52%   472.04MB  5.31%  go.etcd.io/bbolt.(*node).dereference
  456.66MB  5.14% 81.66%   456.66MB  5.14%  go.etcd.io/bbolt.(*Bucket).write
  383.48MB  4.32% 85.98%  8458.17MB 95.19%  github.com/lightninglabs/neutrino/headerfs.writeRandomBatch
  345.33MB  3.89% 89.87%   345.33MB  3.89%  go.etcd.io/bbolt.Open.func1
  150.49MB  1.69% 91.56%  1101.62MB 12.40%  go.etcd.io/bbolt.(*Bucket).Bucket

Test 4: 3 byte sub bucket

Benchmark output

DB file size at cleanup: 75845632 bytes
BenchmarkWriteHeadersSmallBatch-24             1        17389070045 ns/op       6542484944 B/op 17463690 allocs/op
BenchmarkWriteHeadersMediumBatch
DB file size at cleanup: 83427328 bytes
BenchmarkWriteHeadersMediumBatch-24            1        7554943584 ns/op        4277629144 B/op 15690789 allocs/op
BenchmarkWriteHeadersLargeBatch
DB file size at cleanup: 106274816 bytes
BenchmarkWriteHeadersLargeBatch-24             1        5288760033 ns/op        2839629112 B/op 15431008 allocs/op
BenchmarkHeightLookupLatency
DB file size at cleanup: 10883072 bytes
BenchmarkHeightLookupLatency-24           743611              2181 ns/op             822 B/op         14 allocs/op

Max DB file size: ~106 MB

Memory usage (go tool pprof mem.out -> top):

Showing nodes accounting for 13588.32MB, 93.68% of 14504.27MB total
Dropped 29 nodes (cum <= 72.52MB)
Showing top 10 nodes out of 53
      flat  flat%   sum%        cum   cum%
 4970.84MB 34.27% 34.27%  4970.84MB 34.27%  go.etcd.io/bbolt.(*node).put
 4373.62MB 30.15% 64.43%  4373.62MB 30.15%  go.etcd.io/bbolt.(*node).read
 1043.04MB  7.19% 71.62%  1043.04MB  7.19%  go.etcd.io/bbolt.Open.func1
  911.56MB  6.28% 77.90%   911.56MB  6.28%  go.etcd.io/bbolt.(*Cursor).search
  710.76MB  4.90% 82.80%  5084.38MB 35.05%  go.etcd.io/bbolt.(*Bucket).node
  506.04MB  3.49% 86.29%   506.04MB  3.49%  go.etcd.io/bbolt.(*node).dereference
  388.40MB  2.68% 88.97% 13916.60MB 95.95%  github.com/lightninglabs/neutrino/headerfs.writeRandomBatch
  310.02MB  2.14% 91.11%   460.03MB  3.17%  go.etcd.io/bbolt.(*Bucket).openBucket
  209.52MB  1.44% 92.55%   209.52MB  1.44%  go.etcd.io/bbolt.(*Bucket).write
  164.54MB  1.13% 93.68%   227.54MB  1.57%  go.etcd.io/bbolt.(*DB).beginTx

As a preparation for memory behavior optimizations, we first add a set of benchmark tests to establish a baseline against. The benchmarks determine the speed and memory usage of writing different sized batches to the bbolt index as well as the random access latency for retrieving entries from the index.

coveralls · 2021-03-01T13:41:42Z

Coverage increased (+0.3%) to 72.252% when pulling 9ea609b on guggero:block-header-index-mem-fix into e197837 on lightninglabs:master.

Crypt-iQ

If a neutrino node has a database file that is partially synced, this PR breaks it, so would need a migration. Alternatively they could remove their filter databases.

I think it would pay dividends to implement parallel header download (#71) instead. It's a more lengthy refactor but would be worth it in the end and also allow us to perform bigger batch inserts. WDYT?

headerfs/index.go

halseth

I like the approach! Very clever and simple fix that'll probably help a lot.

I do agree with @Crypt-iQ tho, we must make sure existing user's databases keeps working. A possible solution would be to have these subbuckets be separate from the root bucket, then still fall back to fetching from the root bucket if not found.

headerfs/index.go

We want to isolate the code that reads from/writes to the index bucket within the headerIndex type. We prepare to do so by extracting re-usable code into methods.

guggero · 2021-03-10T13:59:03Z

If a neutrino node has a database file that is partially synced, this PR breaks it, so would need a migration.

Good point. I added a fallback to the root bucket like @halseth suggested and also added a test for it.

I think it would pay dividends to implement parallel header download (#71) instead.

Do you mean instead of merging this PR? I think we can still implement the parallel header download and get benefits from both improvements.

Crypt-iQ · 2021-03-10T18:52:42Z

If a neutrino node has a database file that is partially synced, this PR breaks it, so would need a migration.

Good point. I added a fallback to the root bucket like @halseth suggested and also added a test for it.

I think it would pay dividends to implement parallel header download (#71) instead.

Do you mean instead of merging this PR? I think we can still implement the parallel header download and get benefits from both improvements.

both improvements makes sense, disregard earlier comment will re-review

halseth

Wow, that turned out to be very clean! Great PR 👍

LGTM 💯

Crypt-iQ

LGTM after nits addressed

headerfs/index.go

We want to isolate the code that reads from/writes to the index bucket within the headerIndex type. We prepare to do so by extracting re-usable code into methods.

Now that we have methods for accessing the index buckets, we use those in the blockHeaderStore instead of manipulating the DB directly.

With this commit we store the index keys (hash->height) in sub- buckets with the first two bytes of the hash as the bucket name. Storing a large number of keys in the same bucket has a large impact on memory usage in bbolt if small-ish batch sizes are used (the b+ tree needs to be copied with every resize operation). Using sub buckets is a compromise between memory usage and access time. 2 bytes (=max 65535 sub buckets) seems to be the sweet spot (-50% memory usage, +30% access time). We take the bytes from the beginning of the byte-serialized hash since all Bitcoin hashes are reverse-serialized when displayed as strings. That means the leading zeroes of a block hash are actually at the end of the byte slice.

Crypt-iQ

50% is very nice

guggero added enhancement optimization persistence database neutrino labels Mar 1, 2021

guggero requested a review from Crypt-iQ March 1, 2021 13:30

halseth self-requested a review March 1, 2021 13:39

Crypt-iQ reviewed Mar 4, 2021

View reviewed changes

headerfs/index.go Outdated Show resolved Hide resolved

halseth suggested changes Mar 5, 2021

View reviewed changes

headerfs/index.go Outdated Show resolved Hide resolved

headerfs: extract heightFromHashWithTx

844a600

We want to isolate the code that reads from/writes to the index bucket within the headerIndex type. We prepare to do so by extracting re-usable code into methods.

guggero force-pushed the block-header-index-mem-fix branch from aa2c1b0 to 60380ea Compare March 10, 2021 13:56

guggero requested review from Crypt-iQ and halseth March 10, 2021 13:59

halseth approved these changes Mar 11, 2021

View reviewed changes

Crypt-iQ reviewed Mar 12, 2021

View reviewed changes

headerfs/index.go Outdated Show resolved Hide resolved

headerfs/index.go Outdated Show resolved Hide resolved

guggero added 3 commits March 12, 2021 20:06

headerfs: extract chainTipWithTx

dfe4a78

We want to isolate the code that reads from/writes to the index bucket within the headerIndex type. We prepare to do so by extracting re-usable code into methods.

headerfs: access index only through methods

430444e

Now that we have methods for accessing the index buckets, we use those in the blockHeaderStore instead of manipulating the DB directly.

guggero force-pushed the block-header-index-mem-fix branch from 60380ea to 9ea609b Compare March 12, 2021 19:07

guggero requested a review from Crypt-iQ March 12, 2021 19:07

Crypt-iQ approved these changes Mar 12, 2021

View reviewed changes

halseth merged commit d39d683 into lightninglabs:master Mar 15, 2021

guggero deleted the block-header-index-mem-fix branch March 15, 2021 09:38

joostjager mentioned this pull request Aug 25, 2022

htlcswitch: final settle signal lightningnetwork/lnd#6517

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix memory usage of bbolt block header index #213

Fix memory usage of bbolt block header index #213

guggero commented Mar 1, 2021

coveralls commented Mar 1, 2021 •

edited

Loading

Crypt-iQ left a comment

halseth left a comment

guggero commented Mar 10, 2021

Crypt-iQ commented Mar 10, 2021

halseth left a comment

Crypt-iQ left a comment

Crypt-iQ left a comment

Fix memory usage of bbolt block header index #213

Fix memory usage of bbolt block header index #213

Conversation

guggero commented Mar 1, 2021

Benchmarks

Test 1: control, no changes

Test2: 1 byte sub bucket

Test 3: 2 byte sub bucket

Test 4: 3 byte sub bucket

coveralls commented Mar 1, 2021 • edited Loading

Crypt-iQ left a comment

Choose a reason for hiding this comment

halseth left a comment

Choose a reason for hiding this comment

guggero commented Mar 10, 2021

Crypt-iQ commented Mar 10, 2021

halseth left a comment

Choose a reason for hiding this comment

Crypt-iQ left a comment

Choose a reason for hiding this comment

Crypt-iQ left a comment

Choose a reason for hiding this comment

coveralls commented Mar 1, 2021 •

edited

Loading