Skip to content

Conversation

@RaduBerinde
Copy link
Member

go.mod: update datadriven

objstorageprovider: clean up datadriven tests

Change TestProvider to use key=value format for arguments, making the
test easier to read and simplifying the test code.

objstorageprovider: add dual-write for cold metadata

This commit adds logic to write the metadata portion of cold blob
files on both tiers, and to transparently read the metadata from the
hot tier.

We assume that the metadata is a suffix of the file; the metadata
filenames encode the start offset, for example "000001.blobmeta.1234"
means that the file contains the data from "00001.blob" starting at
offset 1234.

We add a Writable implementation which writes the metadata portion
of a file to both cold and hot storage and a Readable implementation
which reads data from a cold tier Readable but uses a vfs.File for
the metadata.

The provider keeps track internally of which hot metadata files
exist and their start offsets.

@RaduBerinde RaduBerinde requested a review from a team as a code owner November 24, 2025 20:15
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@RaduBerinde RaduBerinde force-pushed the cold-meta-on-hot branch 3 times, most recently from 6ea2777 to 938d5c6 Compare November 25, 2025 01:37
@github-actions
Copy link

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumeerbhola reviewed 2 of 2 files at r1, 11 of 11 files at r2, 12 of 13 files at r3.
Reviewable status: 24 of 25 files reviewed, 7 unresolved discussions (waiting on @jbowens and @RaduBerinde)


objstorage/objstorageprovider/cold_readable.go line 87 at r3 (raw file):

// NewReadHandle is part of the objstorage.Readable interface.
func (r *coldReadable) NewReadHandle(
	readBeforeSize objstorage.ReadBeforeSize,

This readBeforeSize was for reading metadata for files stored in object storage. So there is some relation to the fact that for these cold files the metadata is already in a local hot tier. A code comment here would be helpful. Perhaps something like:

// The readBeforeSize was decided to optimize reading the metadata suffix of a file, for cases where small reads are expensive. In our case, the metadata is specifically in a hot tier file, for which small reads are cheap.
// We continue to pass on the readBeforeSize to the cold file, since the readBeforeSize is ignored by callees except when the file is in remote storage. Currently cold files are in local storage. We may need to revisit this when cold files are in remote storage.


objstorage/objstorageprovider/cold_writable.go line 82 at r3 (raw file):

					return w.err
				}
			}

It's a bit odd that we don't necessarily call flushMeta in this coldWritable.Write call and accumulate in this buf explicitly. I suppose this is explicitly trying to account for the fact that any normal user of objstorage.Writable has wrapped it in a fileBufferedWritable, so we are doing the same for these metadata writes.
The decision to do this buffering without being asked to is fine, but can we use a bufio.Writer instead of rolling our own logic?


internal/base/filenames.go line 136 at r3 (raw file):

	// <offset> indicates that the file mirrors the contents of the corresponding
	// blob file starting at this offset.
	FileTypeBlobMeta

is the lifetime of these files fully hidden inside the provider?


objstorage/objstorage.go line 112 at r3 (raw file):

	//
	// If Finish fails, it is expected that the caller will delete the created
	// object. If the process crashes during Finish, it is expected that the file

"the caller will delete the created object". It isn't clear to me how, given the Writable interface itself does not have a delete method.
I see two paths that have to handle errors on Finish: in layout.go (during sstable writing) and blob.FileWriter.Close (during blob file writing). The latter calls Abort when Finish returns an error (and fileBufferedWritable.Abort doesn't delete anything), and the former just returns it upstream (I don't see anything that is necessarily cleaning up files).

Would be good to make this clearer. Should the caller call Provider.Remove?


objstorage/objstorage.go line 123 at r3 (raw file):

	//
	// An error means that we won't be able to successfully finish this object.
	// - Any constraints on when this can be called relative to Write()

Is the "Any constraints ..." a forgotten comment?


objstorage/objstorageprovider/vfs.go line 135 at r3 (raw file):

		return nil, err
	}
	r, err := newFileReadable(file, p.st.Local.FS, p.st.Local.ReadaheadConfig, filename)

why is this using p.st.Local.FS as a parameter, given this could be a cold tier file?


objstorage/objstorageprovider/vfs.go line 142 at r3 (raw file):

		if startOffset, ok := p.getColdObjectMetaFile(fileType, fileNum); ok {
			metaPath := p.metaPath(fileType, fileNum, startOffset)
			return newColdReadable(r, p.st.Local.FS, metaPath, startOffset), nil

should this be called newColdReadableWithHotMeta since if there isn't a hot metadata file it will not do this wrapping?

Change TestProvider to use key=value format for arguments, making the
test easier to read and simplifying the test code.
This commit adds logic to write the metadata portion of cold blob
files on both tiers, and to transparently read the metadata from the
hot tier.

We assume that the metadata is a suffix of the file; the metadata
filenames encode the start offset, for example "000001.blobmeta.1234"
means that the file contains the data from "00001.blob" starting at
offset 1234.

We add a `Writable` implementation which writes the metadata portion
of a file to both cold and hot storage and a `Readable` implementation
which reads data from a cold tier Readable but uses a `vfs.File` for
the metadata.

The provider keeps track internally of which hot metadata files
exist and their start offsets.
Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: 2 of 25 files reviewed, 7 unresolved discussions (waiting on @jbowens and @sumeerbhola)


internal/base/filenames.go line 136 at r3 (raw file):

Previously, sumeerbhola wrote…

is the lifetime of these files fully hidden inside the provider?

Yes, added a comment.


objstorage/objstorage.go line 112 at r3 (raw file):

Previously, sumeerbhola wrote…

"the caller will delete the created object". It isn't clear to me how, given the Writable interface itself does not have a delete method.
I see two paths that have to handle errors on Finish: in layout.go (during sstable writing) and blob.FileWriter.Close (during blob file writing). The latter calls Abort when Finish returns an error (and fileBufferedWritable.Abort doesn't delete anything), and the former just returns it upstream (I don't see anything that is necessarily cleaning up files).

Would be good to make this clearer. Should the caller call Provider.Remove?

Yes, clarified.


objstorage/objstorage.go line 123 at r3 (raw file):

Previously, sumeerbhola wrote…

Is the "Any constraints ..." a forgotten comment?

Yes, removed.


objstorage/objstorageprovider/cold_readable.go line 87 at r3 (raw file):

Previously, sumeerbhola wrote…

This readBeforeSize was for reading metadata for files stored in object storage. So there is some relation to the fact that for these cold files the metadata is already in a local hot tier. A code comment here would be helpful. Perhaps something like:

// The readBeforeSize was decided to optimize reading the metadata suffix of a file, for cases where small reads are expensive. In our case, the metadata is specifically in a hot tier file, for which small reads are cheap.
// We continue to pass on the readBeforeSize to the cold file, since the readBeforeSize is ignored by callees except when the file is in remote storage. Currently cold files are in local storage. We may need to revisit this when cold files are in remote storage.

Added a comment and now passing NoReadBefore to the cold handle.


objstorage/objstorageprovider/cold_writable.go line 82 at r3 (raw file):

Previously, sumeerbhola wrote…

It's a bit odd that we don't necessarily call flushMeta in this coldWritable.Write call and accumulate in this buf explicitly. I suppose this is explicitly trying to account for the fact that any normal user of objstorage.Writable has wrapped it in a fileBufferedWritable, so we are doing the same for these metadata writes.
The decision to do this buffering without being asked to is fine, but can we use a bufio.Writer instead of rolling our own logic?

I improved the comments a bit. This was explained in Write() - we cannot use bufio.Writer because it can call Write with the buffer which can mangle it. We have to first copy, then pass the unchanged buffer to coldWritable.Write().


objstorage/objstorageprovider/vfs.go line 135 at r3 (raw file):

Previously, sumeerbhola wrote…

why is this using p.st.Local.FS as a parameter, given this could be a cold tier file?

Fixed.


objstorage/objstorageprovider/vfs.go line 142 at r3 (raw file):

Previously, sumeerbhola wrote…

should this be called newColdReadableWithHotMeta since if there isn't a hot metadata file it will not do this wrapping?

Done.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@sumeerbhola reviewed 10 of 11 files at r5, 13 of 13 files at r6, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @jbowens)

@RaduBerinde RaduBerinde merged commit eec4041 into cockroachdb:master Dec 4, 2025
9 checks passed
@RaduBerinde RaduBerinde deleted the cold-meta-on-hot branch December 4, 2025 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants