Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deep comparison #1

Open
JTimothyKing opened this issue Jan 7, 2014 · 0 comments
Open

Deep comparison #1

JTimothyKing opened this issue Jan 7, 2014 · 0 comments

Comments

@JTimothyKing
Copy link
Owner

A method in Data::Dedup::Engine that would deeply compare multiple objects in each block, to conclusively dedup them. This cannot be accomplished through a digest function, because a digest comprised of the entire contents large objects (e.g., files) would probably stress most systems' RAM past its limits.

It may be possible to implement such a feature through by having the digest function return a key object that can compare itself to other such objects by comparing the underlying file contents. However, Data::Dedup::Engine::BlockKeyStore currently deduplicates keys by storing them as the key of a Perl hash; so in order to support this sort of deep comparison, (I believe) Data::Dedup::Engine::BlockKeyStore would have to use an alternative hash implementation.

This feature could conceivably be implemented by means of a more flexible blocking function. If Data::Dedup::Engine were to support more powerful blocking functions that can allocate objects to blocks according to whatever algorithm (and not just by computing a digest), conceptually such a blocking function could use deep comparison to deduplicate those objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant