Skip to content

Commit

Permalink
Update Rustdoc comments in the readdir module (#928)
Browse files Browse the repository at this point in the history
* Update Rustdoc comments in the readdir module

Signed-off-by: Daniel Carl Jones <djonesoa@amazon.com>

* Move rustdoc above macros

Signed-off-by: Daniel Carl Jones <djonesoa@amazon.com>

* Add clarification on Readdir deduplication behavior

Signed-off-by: Daniel Carl Jones <djonesoa@amazon.com>

---------

Signed-off-by: Daniel Carl Jones <djonesoa@amazon.com>
  • Loading branch information
dannycjones authored Jul 2, 2024
1 parent 08aa2b8 commit 0b4c14d
Showing 1 changed file with 31 additions and 9 deletions.
40 changes: 31 additions & 9 deletions mountpoint-s3/src/inode/readdir.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
//! it complicated:
//!
//! 1. It's possible for a directory to contain both a subdirectory and a file of the same name, in
//! which case we need to implement shadowing in the right way.
//! which case we should implement shadowing in the right way where possible.
//! 2. A directory can also contain local files/subdirectories, which we want to have appear in the
//! readdir stream, but be shadowed by remote files/subdirectories.
//! 3. ListObjectsV2 returns common prefixes including the trailing '/', which messes up ordering
Expand All @@ -25,13 +25,19 @@
//! inodes for them, achieving point 4. It also has a [ReaddirHandle::readd] method to handle
//! point 5.
//! * [ReaddirIter] is an iterator over [ReaddirEntry]s, which are entries that may not yet have
//! inodes created for them. [ReaddirIter] merges together two streams, [RemoteIter] and
//! [LocalIter], to handle point 2. While merging, [ReaddirIter] also deduplicates the entries it
//! returns to handle point 1.
//! inodes created for them.
//! Addressing point 2, [ReaddirIter] merges together entries from two sources:
//! remotely from S3 using [RemoteIter] and locally from a snapshot of the parent's local children.
//! While merging, [ReaddirIter] makes a best effort to deduplicate entries returned to address point 1.
//! Notably, the [unordered] implementation does not address duplicate remote entries
//! as reported in [#725](https://github.com/awslabs/mountpoint-s3/issues/725).
//! [ReaddirIter] itself delegates to two different iterator implementations,
//! depending on if the S3 implementation returns ordered or unordered list results.
//! * [RemoteIter] is an iterator over [ReaddirEntry]s returned by paginated calls to ListObjectsV2.
//! Rather than directly streaming the entries out of the list call, it collects them in memory
//! and re-sorts them to handle point 3.
//! * [LocalIter] is an iterator over [ReaddirEntry]s that are local children of the directory.
//! * A collection or iterator of [ReaddirEntry]s is built up and used by [ReaddirIter],
//! representing the local children of the directory.
//! These children are listed only once, at the start of the readdir operation, and so are a
//! snapshot in time of the directory.

Expand Down Expand Up @@ -274,6 +280,10 @@ impl Ord for ReaddirEntry {
}
}

/// Iterator over [ReaddirEntry] items, which are entries that may not yet have inodes created for them.
///
/// This iterator delegates to one of two iterators,
/// depending on if the S3 implementation returns ordered results or not.
#[allow(clippy::large_enum_variant)]
#[derive(Debug)]
enum ReaddirIter {
Expand Down Expand Up @@ -306,17 +316,22 @@ enum RemoteIterState {
Finished,
}

/// An iterator over [ReaddirEntry]s returned by paginated ListObjects calls to S3. This iterator
/// handles combining directories (common prefixes) and files (objects) into a single stream,
/// and re-sorting that stream to account for common prefixes not being in lexicographic order (see
/// the module comment).
/// An iterator over [ReaddirEntry]s returned by paginated ListObjects calls to S3.
/// This iterator combines directories (common prefixes) and files (objects) into a single stream.
///
/// If the S3 implementation returns ordered results, this iterator will re-sort the stream to
/// account for common prefixes not being in lexicographic order (see the module comment).
#[derive(Debug)]
struct RemoteIter {
/// Prepared entries in order to be returned by the iterator.
entries: VecDeque<ReaddirEntry>,
bucket: String,
/// S3 prefix for the [RemoteIter], used when listing objects in S3.
full_path: String,
/// The maximum number of keys to be returned by a single S3 ListObjectsV2 request.
page_size: usize,
state: RemoteIterState,
/// Does the S3 implementation return ordered results?
ordered: bool,
}

Expand Down Expand Up @@ -394,6 +409,8 @@ impl RemoteIter {
}

/// Iterator implementation for S3 implementations that provide lexicographically ordered LIST.
///
/// See [self::ReaddirIter] for exact behavior differences.
mod ordered {
use super::*;

Expand Down Expand Up @@ -496,6 +513,8 @@ mod ordered {

/// Iterator implementation for S3 implementations that do not provide lexicographically ordered
/// LIST (i.e., S3 Express One Zone).
///
/// See [self::ReaddirIter] for exact behavior differences.
mod unordered {
use std::collections::HashMap;

Expand All @@ -507,7 +526,10 @@ mod unordered {
#[derive(Debug)]
pub struct ReaddirIter {
remote: RemoteIter,
/// Local entries to be returned.
/// Entries may be removed from this collection if entries of the same name are returned by [Self::remote].
local: HashMap<String, ReaddirEntry>,
/// Queue of local entries to be returned, prepared based on the contents of [Self::local].
local_iter: VecDeque<ReaddirEntry>,
}

Expand Down

0 comments on commit 0b4c14d

Please sign in to comment.