Data size SKU for billing #2098

gefjon · 2025-01-08T17:10:06Z

Description of Changes

This PR adds the ability to compute and report the number of rows in memory, and the number of bytes used by those rows.

Part of https://github.com/clockworklabs/SpacetimeDBPrivate/issues/1229 .

Currently, reporting is accomplished by a new group of metrics, all of which are prefixed with spacetime_data_size.

Specifically, for each database, we report:

blob_store_num_blobs, the number of bytes used by large blobs in the BlobStore.
- We just report the actual size in bytes of the blobs.
  - Strings' sizes are len(str).
  - Other blobs' sizes are the BSATN size.
- Duplicated blobs are counted multiple times, despite their being deduplicated by hash in the storage layer. I believe this to be more predictable to clients, since it preserves the property that inserting a row which contains a large blob causes a string increase in the metric, and deleting such a row causes a strict decrease.
blob_store_bytes_used_by_blobs, the number of blobs in the BlobStore.
- The flat overhead per blob is likely negligible compared to the data cost of those blobs, given that they start at 1 KiB and the overhead is likely on the order of 64 bytes, but we measure and report it for completeness.
- As with blob-bytes, this counts duplicated blobs multiple times.

For each table in each database, we report:

table_num_rows, the number of rows in the table.
Flat overhead here may be more meaningful, since many tables have quite small row sizes.
table_bytes_used_by_rows, the number of bytes in Pages used by rows in the table.
- This is the sum of the fixed-len size and the granules allocated to store var-len portions for the rows.
- Granules are always 32 bytes long; fixed-len size is a per-table constant.
- Overhead due to padding, "pointer-like" contents, &c is included.
- Size of large blobs is not included here (see above).
- Free space in Pages is also not included here.
table_num_rows_in_indexes, the number of rows in indexes in the table.
- We could hypothetically partition this by index. That wouldn't be very interesting, though, because it's just table_size * num_indices.
- Again, flat overhead potentially matters here.
table_bytes_used_by_index_keys, the bytes used to store keys in indexes in the table.
- We could hypothetically partition this by index. Unlike above, that would potentially be interesting.
- See doc comments on the new KeySize trait for a precise definition of this metric.

In this PR, the new metrics are reported when committing a mutable TX, as that's when their values change. However, it's not necessary to report them this often; unlike our existing metrics they are not incremental. (Or the incremental maintenance is confined to the table crate, and not visible to the core crate where they are read and reported.) It would be reasonable to report them every N transactions for some choice of N, or every t seconds for some choice of t, or in response to an external request, or in any number of ways.

API and ABI breaking changes

N/a, unless adding metrics breaks Prometheus in some way I don't understand.

Expected complexity level and risk

3: it would be unfortunate if we misreported these, since we intend to use them for billing, and the computations for some of the new metrics are non-trivial. It's also possible (I haven't checked) that reporting the benchmarks will have meaningful overhead, causing a performance regression. That said, these changes are very unlikely to break any existing functionality.

Testing

Wrote proptests that the low-level measurements of number and bytes used by rows and indexes is as expected, compared to an obviously correct (IMO) naiive implementation.
I am not sure how to test these manually, as I don't know how to run locally while collecting Prometheus metrics.

Definition of `Page::bytes_used_by_rows` to follow. This change seemed to stand on its own enough to deserve a separate commit.

We intend to bill based on these predictable metrics, rather than the somewhat-unpredictable actual heap memory usage of the system. As such, we need a way to compute them (duh). This commit adds `Table` methods for computing the number of resident rows, and the number of bytes stored by those rows.

Centril · 2025-01-08T19:01:13Z

Still a draft, but the overall strategy here makes sense. :)

Per out-of-band discussion, I am not sure this computation will actually be useful to us, but it is the thing I can compute at this time. See comment on `BTreeIndex::num_key_bytes` in btree_index.rs for the specific counting implemented here.

jsdt

This looks good overall. Adding testing is important if we are going to use these for billing.

Is it possible to write a function that computes the size of a row s.t. we can assert that bytes_used_by_rows() is equal to the sum of the size of each row? If so, that would give us a good path toward being able to write tests.

crates/table/src/btree_index.rs

jsdt · 2025-01-20T21:10:14Z

crates/core/src/db/datastore/locking_tx_datastore/committed_state.rs

@@ -644,6 +644,39 @@ impl CommittedState {
        let index = table.indexes.get(col_list)?;
        Some(&index.key_type)
    }
+
+    pub(super) fn report_data_size(&self, database_identity: Identity) {


If this causes a performance issue because of time spent in with_label_values, we could store these metric handles in the Table and/or ExecutionContext, so reporting values would just be an atomic number operation.

jsdt · 2025-01-20T21:11:30Z

crates/core/src/db/datastore/locking_tx_datastore/datastore.rs

@@ -681,6 +681,10 @@ pub(super) fn record_metrics(
                .inc_by(deletes.len() as u64);
        }
    }
+
+    if let Some(committed_state) = committed_state {


In the code immediately above this, we are also updating a bunch of stats (counters and table size gauges). We should probably unify those at some point.

Slow reconstructions of `num_rows` and `bytes_used_by_rows`. Still to follow: index usage reporting.

gefjon · 2025-01-27T21:51:26Z

@jsdt I have written some unit proptests in table.rs, row_size_reporting_matches_slow_implementations, index_size_reporting_matches_slow_implementations_single_column and ibid _two_column. I have left comments that I am not sure there is a meaningful way to proptest the blob store reporting, since it's already as stupid and obvious an impl as possible. Is there anything else you'd like to see?

jsdt

These tests look good, thanks for adding them. For the blob store, do you think recomputing it every time we report is going to be a performance issue? If so, it might be worth keeping track of it as modifications happen.

gefjon · 2025-01-28T02:16:11Z

These tests look good, thanks for adding them. For the blob store, do you think recomputing it every time we report is going to be a performance issue? If so, it might be worth keeping track of it as modifications happen.

I would assume that the majority of modules have very small blob stores. If this turns out not to be the case we could easily memoize it in the same way as this PR is doing for other measures.

gefjon added 2 commits January 8, 2025 12:02

Page: track number of allocated var-len granules

36ee587

Definition of `Page::bytes_used_by_rows` to follow. This change seemed to stand on its own enough to deserve a separate commit.

Centril self-requested a review January 8, 2025 19:00

gefjon added 5 commits January 14, 2025 12:40

Operator to compute index data size

66a4270

Per out-of-band discussion, I am not sure this computation will actually be useful to us, but it is the thing I can compute at this time. See comment on `BTreeIndex::num_key_bytes` in btree_index.rs for the specific counting implemented here.

Move KeySize to its own file; export and document it

d49dd80

Blob store usages; hook up index usages

c7130ee

clippy

c4e35cb

Add and report data size metrics for CommittedState

4eb083c

gefjon marked this pull request as ready for review January 17, 2025 15:56

gefjon requested a review from jsdt January 17, 2025 15:56

Merge remote-tracking branch 'origin/master' into phoebe/data-size-sku

b663cd0

joshua-spacetime mentioned this pull request Jan 17, 2025

Track query and datastore cpu usage metrics #2140

Open

jsdt reviewed Jan 20, 2025

View reviewed changes

bfops added release-1.0 backward-compatible labels Jan 21, 2025

gefjon added 2 commits January 24, 2025 11:58

Merge remote-tracking branch 'origin/master' into phoebe/data-size-sku

97f7685

First pass at testing

f938df2

Slow reconstructions of `num_rows` and `bytes_used_by_rows`. Still to follow: index usage reporting.

gefjon self-assigned this Jan 27, 2025

gefjon added 2 commits January 27, 2025 14:44

Test that single-column indexes report usage as expected

7b4a4e5

Also test for two-column indexes

93cbda0

jsdt reviewed Jan 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data size SKU for billing #2098

Data size SKU for billing #2098

gefjon commented Jan 8, 2025 •

edited

Loading

Centril commented Jan 8, 2025

jsdt left a comment

jsdt Jan 20, 2025

jsdt Jan 20, 2025

gefjon commented Jan 27, 2025

jsdt left a comment

gefjon commented Jan 28, 2025

Data size SKU for billing #2098

Are you sure you want to change the base?

Data size SKU for billing #2098

Conversation

gefjon commented Jan 8, 2025 • edited Loading

Description of Changes

API and ABI breaking changes

Expected complexity level and risk

Testing

Centril commented Jan 8, 2025

jsdt left a comment

Choose a reason for hiding this comment

jsdt Jan 20, 2025

Choose a reason for hiding this comment

jsdt Jan 20, 2025

Choose a reason for hiding this comment

gefjon commented Jan 27, 2025

jsdt left a comment

Choose a reason for hiding this comment

gefjon commented Jan 28, 2025

gefjon commented Jan 8, 2025 •

edited

Loading