Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilevel cache #1064

Merged
merged 2 commits into from
Nov 6, 2024
Merged

Multilevel cache #1064

merged 2 commits into from
Nov 6, 2024

Conversation

vladem
Copy link
Contributor

@vladem vladem commented Oct 15, 2024

Description of change

Allow using both caches when --cache-express <bucket> --cache <directory> options are specified, local cache is queried first.

Relevant issues: No

Does this change impact existing behavior?

No.

Does this change need a changelog entry in any of the crates?

Yes, will add in one of the future PRs.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

@vladem vladem temporarily deployed to PR integration tests October 15, 2024 14:22 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests October 15, 2024 14:22 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests October 15, 2024 14:22 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests October 15, 2024 14:22 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests October 15, 2024 14:22 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests October 15, 2024 14:22 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests October 15, 2024 14:22 — with GitHub Actions Inactive
mountpoint-s3-client/src/mock_client.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/cli.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/cli.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/cli.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/cli.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/express_data_cache.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/multilevel_cache.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/multilevel_cache.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/multilevel_cache.rs Outdated Show resolved Hide resolved
mountpoint-s3-client/src/mock_client.rs Show resolved Hide resolved
mountpoint-s3/src/cli.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/cli.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/express_data_cache.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/express_data_cache.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/multilevel_cache.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/multilevel_cache.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/multilevel_cache.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/cli.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/multilevel_cache.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/multilevel_cache.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/multilevel_cache.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/data_cache/multilevel_cache.rs Outdated Show resolved Hide resolved
@vladem vladem temporarily deployed to PR integration tests November 1, 2024 18:03 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests November 1, 2024 18:03 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests November 1, 2024 18:03 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests November 1, 2024 18:03 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests November 1, 2024 18:03 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests November 1, 2024 18:03 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests November 3, 2024 16:45 — with GitHub Actions Inactive
Copy link
Contributor

@passaro passaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of comments.

mountpoint-s3/src/cli.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/cli.rs Show resolved Hide resolved
mountpoint-s3/src/data_cache/express_data_cache.rs Outdated Show resolved Hide resolved
Signed-off-by: Vlad Volodkin <vlaad@amazon.com>
MultilevelDataCache<DiskCache, ExpressCache, Runtime>
{
pub fn new(disk_cache: Arc<DiskCache>, express_cache: ExpressCache, runtime: Runtime) -> Self {
// Method `MultilevelDataCache::block_size` relies on block sizes of both caches to be equal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit convoluted. We use the same blocks at both levels, so they need to have the same size. I'd mention it as a requirement in this method rustdoc.

@@ -298,10 +309,10 @@ pub struct CliArgs {
#[cfg(feature = "block_size")]
#[clap(
long,
help = "Size of a cache block in KiB [Default: 1024 (1 MiB) for disk cache, 512 (512 KiB) for S3 Express cache]",
help = "Size of a cache block in KiB [Default: 1024 (1 MiB) for disk cache and for S3 Express cache]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help = "Size of a cache block in KiB [Default: 1024 (1 MiB) for disk cache and for S3 Express cache]",
help = "Size of a cache block in KiB [Default: 1024 (1 MiB)]",

Signed-off-by: Vlad Volodkin <vlaad@amazon.com>
@vladem vladem temporarily deployed to PR integration tests November 6, 2024 13:53 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests November 6, 2024 13:53 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests November 6, 2024 13:53 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests November 6, 2024 13:53 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests November 6, 2024 13:53 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests November 6, 2024 13:53 — with GitHub Actions Inactive
@vladem vladem temporarily deployed to PR integration tests November 6, 2024 13:53 — with GitHub Actions Inactive
@vladem vladem requested a review from passaro November 6, 2024 14:38
Copy link
Contributor

@passaro passaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -89,6 +91,7 @@ where
}
buffer.extend_from_slice(&body);

// Ensure the flow-control window is large enough.
result.as_mut().increment_read_window(self.block_size as usize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary now, doesn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to keep it for now and review when we optimize for the single chunk case (see TODO above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to account for a case when the block object is larger than block_size for some reason. If we've just removed this line the read may freeze. If we've kept it as it is now MP may attempt to read an unbounded amount of data to RAM.

This requires a bit more thinking, so I agree that it's better to address in the following PR.

@vladem vladem added this pull request to the merge queue Nov 6, 2024
Merged via the queue into awslabs:main with commit 53197c9 Nov 6, 2024
23 checks passed
@vladem vladem deleted the multilevel-cache branch November 6, 2024 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants