Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature-11: storeMetadata Public API method #27

Merged
merged 27 commits into from
Jun 27, 2023

Conversation

doulikecookiedough
Copy link
Contributor

Summary of Changes:

  • Implement storeMetadata()Public API method in FileHashStore (plus other supporting protected methods)
  • Add junit tests to ensure all new methods are behaving as expected, including confirming that header and body sections are correctly written and that calling storeMetadata() on the same hash(id+formatId) only writes one file in total without any collisions (pid synchronization)
  • Bug fix and updated junit tests for storeObject where checksumAlgorithm is supplied but never utilized in writeToTmpFileAndGenerateChecksums

… additional algorithm checksum value was omitted
…ithm is supplied but never calculated, and revise junit tests
…oreMetadataNamespace' to prepare for 'store_metadata()' implementation and update all junit test classes
…dditional junit test for 'writeToTempFileAndGenerateChecksums'
… null character before writing body content 'metadata'
… overwrite files (previously always throws exception when target file exists)
…es multiple calls to store the same pid+formatId successfully
…ata() to explicitly call .move() with entity as 'metadata'
…tExistsException' over 'FileAlreadyExistsException'
…in change from 'FileAlreadyExistsException' to 'pidObjectExistsException'
…ns being thrown in 'storeObject_objectLockedIds' test
@doulikecookiedough
Copy link
Contributor Author

doulikecookiedough commented Jun 27, 2023

After reviewing the test logs from my last PR Feature-20: Customized Exceptions, the threads in the storeObject_objectLockedIds test encountered RuntimeException exceptions - which would not have triggered the AssertionError seen here in this PR for the commits above.

This AssertionError was caused due to a PidObjectExistsException being thrown instead of a FileAlreadyExistsException that was defined in the test (which is an error in hindsight, and not identified initially because I did not have customized exceptions to provide better context).

To ensure confidence in the storeObject() pid synchronization, I have refactored the existing test storeObject_objectLockedIds to storeObject_objectLockedIds_FiveThreads, which is my attempt at forcing a race condition to cause the pid synchronization process to behave unexpectedly - and ensuring that the two exceptions that I expect to be thrown are always thrown.

  • If a call is made to 'storeObject' for a pid that is already in progress of being stored (pid is found in objectLockedIds), a RunTimeException should be thrown (caught by the pid synchronization block).
if (objectLockedIds.contains(pid)) {
    String errMsg = "FileHashStore.storeObject - Duplicate object request encountered for pid: " + pid + ". Already in progress.";
    logFileHashStore.warn(errMsg);
    throw new RuntimeException(errMsg);
}
  • If another thread calls 'storeObject' for an object that has already been stored, it will eventually call putObject, which will check for the existence of a given data object before it attempts to generate a temp file (write to it, generate checksums, etc.), and consequently encounter a PidObjectExistsException since the object already exists in the store.
// If file (pid hash) exists, reject request immediately
if (Files.exists(objHashAddressPath)) {
    String errMsg = "FileHashStore.putObject - File already exists for pid: " + pid + ". Object address: " + objHashAddressString + ". Aborting request.";
    logFileHashStore.warn(errMsg);
    throw new PidObjectExistsException(errMsg);
}
// Generate tmpfile, write and get hex digests, etc.
  • If the FileAlreadyExistsException is thrown, it would be at the move level, which would mean that we would have already calculated the default hex digest map (and is inefficient for an object that has already been stored). It should technically be never thrown.
  • The FileAlreadyExistException can potentially be thrown because it is the result of an additional check I added to the very last step of the storeObject process move() to be safe and double check that the target destination for a tmp data file does not already exist before moving.

I will review the tests once more with fresh eyes tomorrow morning, then merge this branch into develop if I don't see any further issues.

@doulikecookiedough
Copy link
Contributor Author

Confirming that the change from FileAlreadyExistsException to PidObjectExistsException was an error on my part that was not caught with the last PR due to only RuntimeExceptions being encountered in the affected test. Additional review into previous logs show that the expected method putObject() was throwing FileAlreadyExistsException, which was refactored to be PidObjectExistsException but not updated in the test method, and caused an AssertionError here. Merging this branch into develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant