Refactor sharded serialization to remove code duplication #245

mihaimaruseac · 2024-07-22T16:08:19Z

Summary

Similar to #241, there is a duplication in the directory traversal between serializing to a digest and serializing to a manifest. This time, both supported parallelism, so there is really no need for the duplication.

We make an abstract ShardedFilesSerializer class to contain the logic for the directory traversal and then create the better named DigestSerializer and ManifestSerializer for the two serializing classes.

This time, instead of trying extremely hard to match the old behavior for digest serialization, we just update the goldens. This means that this depends on #244.

We still had to update some other tests: since the hashes are computed only for files, we no longer differentiate between a model with an empty directory and a model where that empty directory is completely removed. This is a corner case and it is ok to do this.

In fact, ignoring empty directories is part of the optimization hinted at in #197.

Release Note

NONE

Documentation

NONE

Similar to sigstore#241, there is a duplication in the directory traversal between serializing to a digest and serializing to a manifest. This time, both supported parallelism, so there is really no need for the duplication. We make an abstract `ShardedFilesSerializer` class to contain the logic for the directory traversal and then create the better named `DigestSerializer` and `ManifestSerializer` for the two serializing classes. This time, instead of trying extremely hard to match the old behavior for digest serialization, we just update the goldens. We still had to update some other tests: since the hashes are computed only for files, we no longer differentiate between a model with an empty directory and a model where that empty directory is completely removed. This is a corner case and it is ok to do this. In fact, ignoring empty directories is part of the optimization hinted at in sigstore#197. Signed-off-by: Mihai Maruseac <mihaimaruseac@google.com>

mihaimaruseac requested review from a team as code owners July 22, 2024 16:08

mihaimaruseac added this to the V1 release milestone Jul 22, 2024

mihaimaruseac force-pushed the refactor_shard_dfs branch 8 times, most recently from ac455cd to 11e46d5 Compare July 22, 2024 20:10

mihaimaruseac force-pushed the refactor_shard_dfs branch from 11e46d5 to fc19342 Compare July 23, 2024 15:36

mihaimaruseac mentioned this pull request Jul 23, 2024

fix --update_goldens for serialization module #250

Merged

spencerschrock approved these changes Jul 23, 2024

View reviewed changes

mihaimaruseac merged commit b2e8213 into sigstore:main Jul 23, 2024
20 checks passed

mihaimaruseac deleted the refactor_shard_dfs branch July 23, 2024 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor sharded serialization to remove code duplication #245

Refactor sharded serialization to remove code duplication #245

mihaimaruseac commented Jul 22, 2024 •

edited

Loading

Refactor sharded serialization to remove code duplication #245

Refactor sharded serialization to remove code duplication #245

Conversation

mihaimaruseac commented Jul 22, 2024 • edited Loading

Summary

Release Note

Documentation

mihaimaruseac commented Jul 22, 2024 •

edited

Loading