Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data size SKU for billing #2098

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

Data size SKU for billing #2098

wants to merge 12 commits into from

Conversation

gefjon
Copy link
Contributor

@gefjon gefjon commented Jan 8, 2025

Description of Changes

This PR adds the ability to compute and report the number of rows in memory, and the number of bytes used by those rows.

Part of https://github.com/clockworklabs/SpacetimeDBPrivate/issues/1229 .

Currently, reporting is accomplished by a new group of metrics, all of which are prefixed with spacetime_data_size.

Specifically, for each database, we report:

  • blob_store_num_blobs, the number of bytes used by large blobs in the BlobStore.
    • We just report the actual size in bytes of the blobs.
      • Strings' sizes are len(str).
      • Other blobs' sizes are the BSATN size.
    • Duplicated blobs are counted multiple times, despite their being deduplicated by hash in the storage layer. I believe this to be more predictable to clients, since it preserves the property that inserting a row which contains a large blob causes a string increase in the metric, and deleting such a row causes a strict decrease.
  • blob_store_bytes_used_by_blobs, the number of blobs in the BlobStore.
    • The flat overhead per blob is likely negligible compared to the data cost of those blobs, given that they start at 1 KiB and the overhead is likely on the order of 64 bytes, but we measure and report it for completeness.
    • As with blob-bytes, this counts duplicated blobs multiple times.

For each table in each database, we report:

  • table_num_rows, the number of rows in the table.
  • Flat overhead here may be more meaningful, since many tables have quite small row sizes.
  • table_bytes_used_by_rows, the number of bytes in Pages used by rows in the table.
    • This is the sum of the fixed-len size and the granules allocated to store var-len portions for the rows.
    • Granules are always 32 bytes long; fixed-len size is a per-table constant.
    • Overhead due to padding, "pointer-like" contents, &c is included.
    • Size of large blobs is not included here (see above).
    • Free space in Pages is also not included here.
  • table_num_rows_in_indexes, the number of rows in indexes in the table.
    • We could hypothetically partition this by index. That wouldn't be very interesting, though, because it's just table_size * num_indices.
    • Again, flat overhead potentially matters here.
  • table_bytes_used_by_index_keys, the bytes used to store keys in indexes in the table.
    • We could hypothetically partition this by index. Unlike above, that would potentially be interesting.
    • See doc comments on the new KeySize trait for a precise definition of this metric.

In this PR, the new metrics are reported when committing a mutable TX, as that's when their values change. However, it's not necessary to report them this often; unlike our existing metrics they are not incremental. (Or the incremental maintenance is confined to the table crate, and not visible to the core crate where they are read and reported.) It would be reasonable to report them every N transactions for some choice of N, or every t seconds for some choice of t, or in response to an external request, or in any number of ways.

API and ABI breaking changes

N/a, unless adding metrics breaks Prometheus in some way I don't understand.

Expected complexity level and risk

3: it would be unfortunate if we misreported these, since we intend to use them for billing, and the computations for some of the new metrics are non-trivial. It's also possible (I haven't checked) that reporting the benchmarks will have meaningful overhead, causing a performance regression. That said, these changes are very unlikely to break any existing functionality.

Testing

  • Wrote proptests that the low-level measurements of number and bytes used by rows and indexes is as expected, compared to an obviously correct (IMO) naiive implementation.
  • I am not sure how to test these manually, as I don't know how to run locally while collecting Prometheus metrics.

gefjon added 2 commits January 8, 2025 12:02
Definition of `Page::bytes_used_by_rows` to follow.
This change seemed to stand on its own enough to deserve a separate commit.
We intend to bill based on these predictable metrics,
rather than the somewhat-unpredictable actual heap memory usage of the system.
As such, we need a way to compute them (duh).
This commit adds `Table` methods for computing the number of resident rows,
and the number of bytes stored by those rows.
@Centril Centril self-requested a review January 8, 2025 19:00
@Centril
Copy link
Contributor

Centril commented Jan 8, 2025

Still a draft, but the overall strategy here makes sense. :)

Per out-of-band discussion, I am not sure this computation will actually be useful to us,
but it is the thing I can compute at this time.

See comment on `BTreeIndex::num_key_bytes` in btree_index.rs
for the specific counting implemented here.
@gefjon gefjon marked this pull request as ready for review January 17, 2025 15:56
@gefjon gefjon requested a review from jsdt January 17, 2025 15:56
Copy link
Contributor

@jsdt jsdt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall. Adding testing is important if we are going to use these for billing.

Is it possible to write a function that computes the size of a row s.t. we can assert that bytes_used_by_rows() is equal to the sum of the size of each row? If so, that would give us a good path toward being able to write tests.

crates/table/src/btree_index.rs Show resolved Hide resolved
@@ -644,6 +644,39 @@ impl CommittedState {
let index = table.indexes.get(col_list)?;
Some(&index.key_type)
}

pub(super) fn report_data_size(&self, database_identity: Identity) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this causes a performance issue because of time spent in with_label_values, we could store these metric handles in the Table and/or ExecutionContext, so reporting values would just be an atomic number operation.

@@ -681,6 +681,10 @@ pub(super) fn record_metrics(
.inc_by(deletes.len() as u64);
}
}

if let Some(committed_state) = committed_state {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the code immediately above this, we are also updating a bunch of stats (counters and table size gauges). We should probably unify those at some point.

Slow reconstructions of `num_rows` and `bytes_used_by_rows`.

Still to follow: index usage reporting.
@gefjon gefjon self-assigned this Jan 27, 2025
@gefjon
Copy link
Contributor Author

gefjon commented Jan 27, 2025

@jsdt I have written some unit proptests in table.rs, row_size_reporting_matches_slow_implementations, index_size_reporting_matches_slow_implementations_single_column and ibid _two_column. I have left comments that I am not sure there is a meaningful way to proptest the blob store reporting, since it's already as stupid and obvious an impl as possible. Is there anything else you'd like to see?

Copy link
Contributor

@jsdt jsdt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests look good, thanks for adding them. For the blob store, do you think recomputing it every time we report is going to be a performance issue? If so, it might be worth keeping track of it as modifications happen.

@gefjon
Copy link
Contributor Author

gefjon commented Jan 28, 2025

These tests look good, thanks for adding them. For the blob store, do you think recomputing it every time we report is going to be a performance issue? If so, it might be worth keeping track of it as modifications happen.

I would assume that the majority of modules have very small blob stores. If this turns out not to be the case we could easily memoize it in the same way as this PR is doing for other measures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants