-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature-95: HashStoreConverter & FileHashStoreLinks #96
Conversation
…oreLinks' with basic constructor
…toreLinksInitTest'
…'FileHashStore', and add new method 'generateChecksums' in 'FileHashStoreLinks'
…d new junit test for 'generateChecksums'
…method 'getHashStoreLinksDataObjectPath' and junit test for 'storeHardLink'
… add new junit tests
* @throws NoSuchAlgorithmException An algorithm defined is not supported | ||
* @throws InterruptedException Issue with synchronizing storing metadata | ||
*/ | ||
public ObjectMetadata convert(Path filePath, String pid, InputStream sysmetaStream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method doesn't handle the use case - objects only have system metadata without real bytes. I mentioned that the data objects on cn don't have bytes; they only have system metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @taojing2002 - I missed this requirement. I have refactored the convert
method to account for this - and also swapped the order of operations to store sysmeta first (because it can never be null).
* @throws InterruptedException Issue with synchronizing storing metadata | ||
*/ | ||
public ObjectMetadata convert(Path filePath, String pid, InputStream sysmetaStream) | ||
throws IOException, NoSuchAlgorithmException, InterruptedException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method doesn't check the checksum match between the calculated checksums and ones in the system metadata. Do you think we should add two extra parameters - checksum and checksum_algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think it should be included, and I have updated the convert
signature to account for it (along with the relevant code to check + junit tests).
When there is a mismatch, the custom exception NonMatchingChecksumException
is thrown.
…ysmeta, which may not always be available
…ver be null) and add junit test
…t, and add new junit tests
…reLinks' into 'FileHashStoreUtility'
…Link method signatures with 'checksum' and 'checksumAlgorithm', and revise junit tests
|
||
// Store the sysmeta first - this can never be null and is always required. | ||
try { | ||
fileHashStoreLinks.storeMetadata(sysmetaStream, pid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking the order of creating. Which should be first, system metadata or hard links? I prefer to hard links since this is the main purpose. Failure of storing system metadata may disrupt the creation of the hard link, which may succeed. What do you think? If we create hard link first, I would like the method storeMetadata will throw a new exception, which can tell the failure happens in the storeMetadata method and indicate the hard link was created successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I initially reversed the order, I had overlooked the hard linking process being the main purpose. After reviewing your comment, I agree with your suggestion.
With the current order, if storeMetadata
fails, an exception will be thrown. So we will never reach the code to create a hard link - and won't be able to tell if there is a latent bug or issue affecting storing a hard link.
I have reverted my change, thank you!
… provide access to 'FileHashStoreLinks'
… new changes from
* @throws InterruptedException Sync issue when tagging pid and cid | ||
*/ | ||
public ObjectMetadata storeHardLink( | ||
Path filePath, InputStream fileStream, String pid, String checksum, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filePath and fileStream parameters seem like redundant. The fileStream must be read from the filePath. We can move it to a local variable. The redundant parameters may introduce an error - the fileStream may not be read from the file path.
generateAddAlgo = shouldCalculateAlgorithm(additionalAlgorithm); | ||
} | ||
|
||
MessageDigest md5 = MessageDigest.getInstance(DefaultHashAlgorithms.MD5.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This a low priority issue. In the FileHashStore.writeToTmpFileAndGenerateChecksums method, it has the same hard-coded to create MessageDigest. So if we need to change default checksum algorithms, we have two places to take care of. Is it possible to store the default checksum algorithm in a list and both code just loop the variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for bringing this up! I have refactored both FileHashStore
and FileHashStoreLinks
to calculate checksums based on the DefaultHashAlgorithms
enum object. So all we have to do in the future should we desire a change is update the enum object.
…Path and update signature and affected junit tests
…ation and digestion of 'MessageDigest' objects based on the DefaultHashAlgorithms enum object
… InputStream object
…odifying it, rather than using a class variable and accessing it directly
…iables to protected
Summary of Changes:
HashStoreConverter
andFileHashStoreLinks
HashStoreConverter
has 1 public API method:convert
which will create a hard link to an existing data object and return ObjectMetadata for the data object, along with storing the system metadata from the sysmeta stream suppliedtagObject
andstoreMetadata
)