Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor sharded serialization to remove code duplication #245

Merged
merged 1 commit into from
Jul 23, 2024

Conversation

mihaimaruseac
Copy link
Collaborator

@mihaimaruseac mihaimaruseac commented Jul 22, 2024

Summary

Similar to #241, there is a duplication in the directory traversal between serializing to a digest and serializing to a manifest. This time, both supported parallelism, so there is really no need for the duplication.

We make an abstract ShardedFilesSerializer class to contain the logic for the directory traversal and then create the better named DigestSerializer and ManifestSerializer for the two serializing classes.

This time, instead of trying extremely hard to match the old behavior for digest serialization, we just update the goldens. This means that this depends on #244.

We still had to update some other tests: since the hashes are computed only for files, we no longer differentiate between a model with an empty directory and a model where that empty directory is completely removed. This is a corner case and it is ok to do this.

In fact, ignoring empty directories is part of the optimization hinted at in #197.

Release Note

NONE

Documentation

NONE

@mihaimaruseac mihaimaruseac requested review from a team as code owners July 22, 2024 16:08
@mihaimaruseac mihaimaruseac added this to the V1 release milestone Jul 22, 2024
@mihaimaruseac mihaimaruseac force-pushed the refactor_shard_dfs branch 8 times, most recently from ac455cd to 11e46d5 Compare July 22, 2024 20:10
Similar to sigstore#241, there is a duplication in the directory traversal between serializing to a digest and serializing to a manifest. This time, both supported parallelism, so there is really no need for the duplication.

We make an abstract `ShardedFilesSerializer` class to contain the logic for the directory traversal and then create the better named `DigestSerializer` and `ManifestSerializer` for the two serializing classes.

This time, instead of trying extremely hard to match the old behavior for digest serialization, we just update the goldens.

We still had to update some other tests: since the hashes are computed only for files, we no longer differentiate between a model with an empty directory and a model where that empty directory is completely removed. This is a corner case and it is ok to do this.

In fact, ignoring empty directories is part of the optimization hinted at in sigstore#197.

Signed-off-by: Mihai Maruseac <mihaimaruseac@google.com>
@mihaimaruseac mihaimaruseac merged commit b2e8213 into sigstore:main Jul 23, 2024
20 checks passed
@mihaimaruseac mihaimaruseac deleted the refactor_shard_dfs branch July 23, 2024 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants