Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory usage of bbolt block header index #213

Merged
merged 5 commits into from
Mar 15, 2021

Conversation

guggero
Copy link
Member

@guggero guggero commented Mar 1, 2021

Fixes #196.

With this PR we store the block index keys (hash->height) in sub-
buckets with the first two bytes of the hash as the bucket name.
Storing a large number of keys in the same bucket has a large
impact on memory usage in bbolt if small-ish batch sizes are
used (the b+ tree needs to be copied with every resize operation).
Using sub buckets is a compromise between memory usage and
access time. 2 bytes (=max 65535 sub buckets) seems to be the
sweet spot (-50% memory usage, +30% access time). We take the
bytes from the beginning of the byte-serialized hash since all
Bitcoin hashes are reverse-serialized when displayed as
strings. That means the leading zeroes of a block hash
are actually at the end of the byte slice.

As the benchmarks below show, the 2 byte prefix seems to be the sweet spot between memory usage, access speed and DB file size.

I also looked at other ways of reducing the memory footprint of the bbolt based index. The main culprit seems to be the relatively small batch size of 2k blocks per update. That number is limited by how many blocks a peer serving block headers is serving in one message and cannot easily be increased.
The only other way to increase the DB write batch size is to add a second level cache. But that would require more refactoring and could lead to possible de-synchronization of the index and the actual block file.

Benchmarks

All results are retrieved by running:

go test -v -run=. -bench=. -benchmem -memprofile=mem.out ./headerfs

System:

goos: linux
goarch: amd64
pkg: github.com/lightninglabs/neutrino/headerfs
cpu: AMD Ryzen 9 3900X 12-Core Processor  

Test 1: control, no changes

Benchmark output

DB file size at cleanup: 39206912
BenchmarkWriteHeadersSmallBatch-24             1        15387606278 ns/op       7304628624 B/op  8107418 allocs/op
BenchmarkWriteHeadersMediumBatch
DB file size at cleanup: 46354432
BenchmarkWriteHeadersMediumBatch-24            1        6071342964 ns/op        5157273568 B/op  5878518 allocs/op
BenchmarkWriteHeadersLargeBatch
DB file size at cleanup: 63823872
BenchmarkWriteHeadersLargeBatch-24             1        3368511955 ns/op        2849549856 B/op  5009240 allocs/op
BenchmarkHeightLookupLatency
DB file size at cleanup: 5394432
BenchmarkHeightLookupLatency-24           979939              1676 ns/op             640 B/op         12 allocs/op

Max DB file size: ~63 MB

Memory usage (go tool pprof mem.out -> top):

Showing nodes accounting for 14.80GB, 96.93% of 15.27GB total
Dropped 41 nodes (cum <= 0.08GB)
Showing top 10 nodes out of 47
      flat  flat%   sum%        cum   cum%
    7.44GB 48.72% 48.72%     7.44GB 48.72%  go.etcd.io/bbolt.(*node).put
    4.82GB 31.56% 80.28%     4.82GB 31.56%  go.etcd.io/bbolt.(*node).read
    1.02GB  6.65% 86.93%     1.02GB  6.65%  go.etcd.io/bbolt.Open.func1
    0.40GB  2.64% 89.57%     0.40GB  2.64%  go.etcd.io/bbolt.(*Cursor).search
    0.39GB  2.55% 92.12%    14.70GB 96.26%  github.com/lightninglabs/neutrino/headerfs.writeRandomBatch
    0.25GB  1.63% 93.75%     5.07GB 33.19%  go.etcd.io/bbolt.(*Bucket).node
    0.21GB  1.36% 95.11%     0.27GB  1.79%  go.etcd.io/bbolt.(*DB).beginTx
    0.12GB   0.8% 95.90%     0.12GB   0.8%  go.etcd.io/bbolt.(*node).dereference
    0.10GB  0.64% 96.55%     1.27GB  8.34%  go.etcd.io/bbolt.(*Tx).allocate
    0.06GB  0.39% 96.93%    12.66GB 82.89%  go.etcd.io/bbolt.(*Bucket).Put

Test2: 1 byte sub bucket

Benchmark output

BenchmarkWriteHeadersSmallBatch
DB file size at cleanup: 38875136 bytes
BenchmarkWriteHeadersSmallBatch-24             1        17966398972 ns/op       7106311624 B/op 15420393 allocs/op
BenchmarkWriteHeadersMediumBatch
DB file size at cleanup: 46874624 bytes
BenchmarkWriteHeadersMediumBatch-24            1        6767009109 ns/op        5594956944 B/op  7798011 allocs/op
BenchmarkWriteHeadersLargeBatch
DB file size at cleanup: 64094208 bytes
BenchmarkWriteHeadersLargeBatch-24             1        3515226806 ns/op        2800528448 B/op  5739020 allocs/op
BenchmarkHeightLookupLatency
DB file size at cleanup: 5799936 bytes
BenchmarkHeightLookupLatency-24           829075              1985 ns/op             702 B/op         14 allocs/op

Max DB file size: ~64 MB

Memory usage (go tool pprof mem.out -> top):

Showing nodes accounting for 14.20GB, 96.02% of 14.79GB total
Dropped 37 nodes (cum <= 0.07GB)
Showing top 10 nodes out of 50
      flat  flat%   sum%        cum   cum%
    7.25GB 49.03% 49.03%     7.25GB 49.03%  go.etcd.io/bbolt.(*node).put
    4.25GB 28.71% 77.74%     4.25GB 28.71%  go.etcd.io/bbolt.(*node).read
    1.08GB  7.30% 85.04%     1.08GB  7.30%  go.etcd.io/bbolt.Open.func1
    0.39GB  2.65% 87.69%    14.34GB 96.95%  github.com/lightninglabs/neutrino/headerfs.writeRandomBatch
    0.38GB  2.59% 90.29%     0.38GB  2.59%  go.etcd.io/bbolt.(*Cursor).search
    0.32GB  2.17% 92.45%     4.57GB 30.88%  go.etcd.io/bbolt.(*Bucket).node
    0.15GB     1% 93.45%     1.38GB  9.33%  go.etcd.io/bbolt.(*Tx).allocate
    0.14GB  0.96% 94.41%     0.19GB  1.29%  go.etcd.io/bbolt.(*Bucket).openBucket
    0.14GB  0.94% 95.35%     0.20GB  1.33%  go.etcd.io/bbolt.(*DB).beginTx
    0.10GB  0.67% 96.02%     0.10GB  0.67%  go.etcd.io/bbolt.(*node).dereference

Test 3: 2 byte sub bucket

Benchmark output

DB file size at cleanup: 43966464 bytes
BenchmarkWriteHeadersSmallBatch-24             1        14826193397 ns/op       4140556080 B/op 16198594 allocs/op
BenchmarkWriteHeadersMediumBatch
DB file size at cleanup: 51122176 bytes
BenchmarkWriteHeadersMediumBatch-24            1        6134732835 ns/op        2248978232 B/op 14419852 allocs/op
BenchmarkWriteHeadersLargeBatch
DB file size at cleanup: 69414912 bytes
BenchmarkWriteHeadersLargeBatch-24             1        4380655964 ns/op        1854111016 B/op 14156142 allocs/op
BenchmarkHeightLookupLatency
DB file size at cleanup: 9207808 bytes
BenchmarkHeightLookupLatency-24           573646              2156 ns/op             835 B/op         14 allocs/op

Max DB file size: ~69 MB

Memory usage (go tool pprof mem.out -> top):

Showing nodes accounting for 8135.27MB, 91.56% of 8885.17MB total
Dropped 31 nodes (cum <= 44.43MB)
Showing top 10 nodes out of 62
      flat  flat%   sum%        cum   cum%
 2923.08MB 32.90% 32.90%  2923.08MB 32.90%  go.etcd.io/bbolt.(*node).read
 1466.33MB 16.50% 49.40%  1466.33MB 16.50%  go.etcd.io/bbolt.(*node).put
  842.56MB  9.48% 58.88%   842.56MB  9.48%  go.etcd.io/bbolt.(*Cursor).search
  607.21MB  6.83% 65.72%  3530.28MB 39.73%  go.etcd.io/bbolt.(*Bucket).node
  488.10MB  5.49% 71.21%   618.11MB  6.96%  go.etcd.io/bbolt.(*Bucket).openBucket
  472.04MB  5.31% 76.52%   472.04MB  5.31%  go.etcd.io/bbolt.(*node).dereference
  456.66MB  5.14% 81.66%   456.66MB  5.14%  go.etcd.io/bbolt.(*Bucket).write
  383.48MB  4.32% 85.98%  8458.17MB 95.19%  github.com/lightninglabs/neutrino/headerfs.writeRandomBatch
  345.33MB  3.89% 89.87%   345.33MB  3.89%  go.etcd.io/bbolt.Open.func1
  150.49MB  1.69% 91.56%  1101.62MB 12.40%  go.etcd.io/bbolt.(*Bucket).Bucket

Test 4: 3 byte sub bucket

Benchmark output

DB file size at cleanup: 75845632 bytes
BenchmarkWriteHeadersSmallBatch-24             1        17389070045 ns/op       6542484944 B/op 17463690 allocs/op
BenchmarkWriteHeadersMediumBatch
DB file size at cleanup: 83427328 bytes
BenchmarkWriteHeadersMediumBatch-24            1        7554943584 ns/op        4277629144 B/op 15690789 allocs/op
BenchmarkWriteHeadersLargeBatch
DB file size at cleanup: 106274816 bytes
BenchmarkWriteHeadersLargeBatch-24             1        5288760033 ns/op        2839629112 B/op 15431008 allocs/op
BenchmarkHeightLookupLatency
DB file size at cleanup: 10883072 bytes
BenchmarkHeightLookupLatency-24           743611              2181 ns/op             822 B/op         14 allocs/op

Max DB file size: ~106 MB

Memory usage (go tool pprof mem.out -> top):

Showing nodes accounting for 13588.32MB, 93.68% of 14504.27MB total
Dropped 29 nodes (cum <= 72.52MB)
Showing top 10 nodes out of 53
      flat  flat%   sum%        cum   cum%
 4970.84MB 34.27% 34.27%  4970.84MB 34.27%  go.etcd.io/bbolt.(*node).put
 4373.62MB 30.15% 64.43%  4373.62MB 30.15%  go.etcd.io/bbolt.(*node).read
 1043.04MB  7.19% 71.62%  1043.04MB  7.19%  go.etcd.io/bbolt.Open.func1
  911.56MB  6.28% 77.90%   911.56MB  6.28%  go.etcd.io/bbolt.(*Cursor).search
  710.76MB  4.90% 82.80%  5084.38MB 35.05%  go.etcd.io/bbolt.(*Bucket).node
  506.04MB  3.49% 86.29%   506.04MB  3.49%  go.etcd.io/bbolt.(*node).dereference
  388.40MB  2.68% 88.97% 13916.60MB 95.95%  github.com/lightninglabs/neutrino/headerfs.writeRandomBatch
  310.02MB  2.14% 91.11%   460.03MB  3.17%  go.etcd.io/bbolt.(*Bucket).openBucket
  209.52MB  1.44% 92.55%   209.52MB  1.44%  go.etcd.io/bbolt.(*Bucket).write
  164.54MB  1.13% 93.68%   227.54MB  1.57%  go.etcd.io/bbolt.(*DB).beginTx

As a preparation for memory behavior optimizations, we first add a set
of benchmark tests to establish a baseline against.
The benchmarks determine the speed and memory usage of writing different
sized batches to the bbolt index as well as the random access latency
for retrieving entries from the index.
@coveralls
Copy link

coveralls commented Mar 1, 2021

Coverage Status

Coverage increased (+0.3%) to 72.252% when pulling 9ea609b on guggero:block-header-index-mem-fix into e197837 on lightninglabs:master.

Copy link
Contributor

@Crypt-iQ Crypt-iQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a neutrino node has a database file that is partially synced, this PR breaks it, so would need a migration. Alternatively they could remove their filter databases.

I think it would pay dividends to implement parallel header download (#71) instead. It's a more lengthy refactor but would be worth it in the end and also allow us to perform bigger batch inserts. WDYT?

headerfs/index.go Outdated Show resolved Hide resolved
Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach! Very clever and simple fix that'll probably help a lot.

I do agree with @Crypt-iQ tho, we must make sure existing user's databases keeps working. A possible solution would be to have these subbuckets be separate from the root bucket, then still fall back to fetching from the root bucket if not found.

headerfs/index.go Outdated Show resolved Hide resolved
We want to isolate the code that reads from/writes to the index bucket
within the headerIndex type. We prepare to do so by extracting re-usable
code into methods.
@guggero
Copy link
Member Author

guggero commented Mar 10, 2021

If a neutrino node has a database file that is partially synced, this PR breaks it, so would need a migration.

Good point. I added a fallback to the root bucket like @halseth suggested and also added a test for it.

I think it would pay dividends to implement parallel header download (#71) instead.

Do you mean instead of merging this PR? I think we can still implement the parallel header download and get benefits from both improvements.

@Crypt-iQ
Copy link
Contributor

If a neutrino node has a database file that is partially synced, this PR breaks it, so would need a migration.

Good point. I added a fallback to the root bucket like @halseth suggested and also added a test for it.

I think it would pay dividends to implement parallel header download (#71) instead.

Do you mean instead of merging this PR? I think we can still implement the parallel header download and get benefits from both improvements.

both improvements makes sense, disregard earlier comment will re-review

Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that turned out to be very clean! Great PR 👍

LGTM 💯

Copy link
Contributor

@Crypt-iQ Crypt-iQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after nits addressed

headerfs/index.go Outdated Show resolved Hide resolved
headerfs/index.go Outdated Show resolved Hide resolved
We want to isolate the code that reads from/writes to the index bucket
within the headerIndex type. We prepare to do so by extracting re-usable
code into methods.
Now that we have methods for accessing the index buckets, we use those
in the blockHeaderStore instead of manipulating the DB directly.
With this commit we store the index keys (hash->height) in sub-
buckets with the first two bytes of the hash as the bucket name.
Storing a large number of keys in the same bucket has a large
impact on memory usage in bbolt if small-ish batch sizes are
used (the b+ tree needs to be copied with every resize operation).
Using sub buckets is a compromise between memory usage and
access time. 2 bytes (=max 65535 sub buckets) seems to be the
sweet spot (-50% memory usage, +30% access time). We take the
bytes from the beginning of the byte-serialized hash since all
Bitcoin hashes are reverse-serialized when displayed as
strings. That means the leading zeroes of a block hash
are actually at the end of the byte slice.
@guggero guggero requested a review from Crypt-iQ March 12, 2021 19:07
Copy link
Contributor

@Crypt-iQ Crypt-iQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

50% is very nice

@halseth halseth merged commit d39d683 into lightninglabs:master Mar 15, 2021
@guggero guggero deleted the block-header-index-mem-fix branch March 15, 2021 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory blowup on initial block header download
4 participants