Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature-95: HashStoreConverter & FileHashStoreLinks #96

Merged
merged 45 commits into from
Aug 19, 2024

Conversation

doulikecookiedough
Copy link
Contributor

Summary of Changes:

  • Add two new classes: HashStoreConverter and FileHashStoreLinks
  • HashStoreConverter has 1 public API method: convert which will create a hard link to an existing data object and return ObjectMetadata for the data object, along with storing the system metadata from the sysmeta stream supplied
    • This method is thread safe, along with the subsequent methods called (ex. tagObject and storeMetadata)

…'FileHashStore', and add new method 'generateChecksums' in 'FileHashStoreLinks'
…method 'getHashStoreLinksDataObjectPath' and junit test for 'storeHardLink'
* @throws NoSuchAlgorithmException An algorithm defined is not supported
* @throws InterruptedException Issue with synchronizing storing metadata
*/
public ObjectMetadata convert(Path filePath, String pid, InputStream sysmetaStream)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method doesn't handle the use case - objects only have system metadata without real bytes. I mentioned that the data objects on cn don't have bytes; they only have system metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @taojing2002 - I missed this requirement. I have refactored the convert method to account for this - and also swapped the order of operations to store sysmeta first (because it can never be null).

* @throws InterruptedException Issue with synchronizing storing metadata
*/
public ObjectMetadata convert(Path filePath, String pid, InputStream sysmetaStream)
throws IOException, NoSuchAlgorithmException, InterruptedException {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method doesn't check the checksum match between the calculated checksums and ones in the system metadata. Do you think we should add two extra parameters - checksum and checksum_algorithm?

Copy link
Contributor Author

@doulikecookiedough doulikecookiedough Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think it should be included, and I have updated the convert signature to account for it (along with the relevant code to check + junit tests).

When there is a mismatch, the custom exception NonMatchingChecksumException is thrown.


// Store the sysmeta first - this can never be null and is always required.
try {
fileHashStoreLinks.storeMetadata(sysmetaStream, pid);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking the order of creating. Which should be first, system metadata or hard links? I prefer to hard links since this is the main purpose. Failure of storing system metadata may disrupt the creation of the hard link, which may succeed. What do you think? If we create hard link first, I would like the method storeMetadata will throw a new exception, which can tell the failure happens in the storeMetadata method and indicate the hard link was created successfully.

Copy link
Contributor Author

@doulikecookiedough doulikecookiedough Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I initially reversed the order, I had overlooked the hard linking process being the main purpose. After reviewing your comment, I agree with your suggestion.

With the current order, if storeMetadata fails, an exception will be thrown. So we will never reach the code to create a hard link - and won't be able to tell if there is a latent bug or issue affecting storing a hard link.

I have reverted my change, thank you!

@doulikecookiedough doulikecookiedough changed the base branch from feature-57-deleteobj-refactor to develop August 16, 2024 16:18
* @throws InterruptedException Sync issue when tagging pid and cid
*/
public ObjectMetadata storeHardLink(
Path filePath, InputStream fileStream, String pid, String checksum,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filePath and fileStream parameters seem like redundant. The fileStream must be read from the filePath. We can move it to a local variable. The redundant parameters may introduce an error - the fileStream may not be read from the file path.

generateAddAlgo = shouldCalculateAlgorithm(additionalAlgorithm);
}

MessageDigest md5 = MessageDigest.getInstance(DefaultHashAlgorithms.MD5.getName());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a low priority issue. In the FileHashStore.writeToTmpFileAndGenerateChecksums method, it has the same hard-coded to create MessageDigest. So if we need to change default checksum algorithms, we have two places to take care of. Is it possible to store the default checksum algorithm in a list and both code just loop the variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for bringing this up! I have refactored both FileHashStore and FileHashStoreLinks to calculate checksums based on the DefaultHashAlgorithms enum object. So all we have to do in the future should we desire a change is update the enum object.

@doulikecookiedough doulikecookiedough merged commit 0b684e1 into develop Aug 19, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants