Refactor sharded serialization to remove code duplication #245
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Similar to #241, there is a duplication in the directory traversal between serializing to a digest and serializing to a manifest. This time, both supported parallelism, so there is really no need for the duplication.
We make an abstract
ShardedFilesSerializer
class to contain the logic for the directory traversal and then create the better namedDigestSerializer
andManifestSerializer
for the two serializing classes.This time, instead of trying extremely hard to match the old behavior for digest serialization, we just update the goldens. This means that this depends on #244.
We still had to update some other tests: since the hashes are computed only for files, we no longer differentiate between a model with an empty directory and a model where that empty directory is completely removed. This is a corner case and it is ok to do this.
In fact, ignoring empty directories is part of the optimization hinted at in #197.
Release Note
NONE
Documentation
NONE