Deep comparison #1

JTimothyKing · 2014-01-07T20:53:47Z

A method in Data::Dedup::Engine that would deeply compare multiple objects in each block, to conclusively dedup them. This cannot be accomplished through a digest function, because a digest comprised of the entire contents large objects (e.g., files) would probably stress most systems' RAM past its limits.

It may be possible to implement such a feature through by having the digest function return a key object that can compare itself to other such objects by comparing the underlying file contents. However, Data::Dedup::Engine::BlockKeyStore currently deduplicates keys by storing them as the key of a Perl hash; so in order to support this sort of deep comparison, (I believe) Data::Dedup::Engine::BlockKeyStore would have to use an alternative hash implementation.

This feature could conceivably be implemented by means of a more flexible blocking function. If Data::Dedup::Engine were to support more powerful blocking functions that can allocate objects to blocks according to whatever algorithm (and not just by computing a digest), conceptually such a blocking function could use deep comparison to deduplicate those objects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deep comparison #1

Deep comparison #1

JTimothyKing commented Jan 7, 2014

Deep comparison #1

Deep comparison #1

Comments

JTimothyKing commented Jan 7, 2014