-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature-57: deleteObject
Refactor
#58
Conversation
…HashStoreUtility class
…ll files, then delete at the very end)
…for the given pid and update junit test
…sary and add new junit tests
src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java
Outdated
Show resolved
Hide resolved
… potential deadlock
…hStore to utility class
… with 'tagObject'
…ctly formed (was only using formatId, instead of pid + formatId), fix affected test and rename variables for improved clarity
…nd revise all junit tests and affected code
…ctInfo' object and update junit tests
…d of .trim() and then .isEmpty()
README.md
Outdated
identifier (pid)). This process produces reference files, which allow objects to be found and | ||
retrieved with a given identifier. | ||
- Note 1: An identifier can only be used once | ||
- Note 2: Objects are stored once and only once using its content identifier (a checksum generated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent plural/singular: Objects are [...] using its
Suggest: Each object is stored once and only once using its content identifier
README.md
Outdated
// Validate object, if the parameters do not match, the data object associated with the objInfo | ||
// supplied will be deleted | ||
- deleteInvalidObject(objInfo, checksum, checksumAlgorithn, objSize) | ||
deleteInvalidObject(objInfo, checksum, checksumAlgorithn, objSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
IfInvalidObject
* disk using a given InputStream. Upon successful storage, the method returns a | ||
* (ObjectMetadata) object containing relevant file information, such as the file's id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* The {@code storeObject} method is responsible for the atomic storage of objects to | ||
* disk using a given InputStream. Upon successful storage, the method returns a | ||
* (ObjectMetadata) object containing relevant file information, such as the file's id | ||
* (which can be used to locate the object on disk), the file's size, and a hex digest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(which can be used to locate the object on disk)
(which can be used
by a system administrator -- but not by an API client -- to locate the object on disk)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOVE seeing a whole file be collapsed down to one line 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too, it's so clean now!
objectInfo.getHexDigests(), "objectInfo.getHexDigests()", "deleteInvalidObject"); | ||
if (objectInfo.getHexDigests().isEmpty()) { | ||
objectInfo.hexDigests(), "objectInfo.getHexDigests()", "deleteInvalidObject"); | ||
if (objectInfo.hexDigests().isEmpty()) { | ||
throw new MissingHexDigestsException("Missing hexDigests in supplied ObjectMetadata"); | ||
} | ||
FileHashStoreUtility.ensureNotNull(checksum, "checksum", "deleteInvalidObject"); | ||
FileHashStoreUtility.ensureNotNull(checksumAlgorithm, "checksumAlgorithm", "deleteInvalidObject"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
IfInvalidObject
objectInfo.getHexDigests(), "objectInfo.getHexDigests()", "deleteInvalidObject"); | ||
if (objectInfo.getHexDigests().isEmpty()) { | ||
objectInfo.hexDigests(), "objectInfo.getHexDigests()", "deleteInvalidObject"); | ||
if (objectInfo.hexDigests().isEmpty()) { | ||
throw new MissingHexDigestsException("Missing hexDigests in supplied ObjectMetadata"); | ||
} | ||
FileHashStoreUtility.ensureNotNull(checksum, "checksum", "deleteInvalidObject"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
IfInvalidObject
@@ -521,12 +542,10 @@ public ObjectMetadata storeObject(InputStream object) throws NoSuchAlgorithmExce | |||
// call 'deleteInvalidObject' (optional) to check that the object is valid, and then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
IfInvalidObject
synchronizeObjectLockedCids(cid); | ||
synchronizeReferenceLockedPids(pid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a downside to moving it, but do see plenty of upside - unless I'm missing something. Your calling code would be simplified to this:
try {
storeHashStoreRefsFiles(pid, cid);
} catch (HashStoreRefsAlreadyExistException hsrfae) {
// * * * cid and pid already released * * *
// This exception is thrown when the pid and cid are already tagged appropriately
String errMsg =
"HashStore refs files already exist for pid " + pid + " and cid: " + cid;
throw new HashStoreRefsAlreadyExistException(errMsg);
} catch (PidRefsFileExistsException prfe) {
// * * * cid and pid already released * * *
String errMsg = "pid: " + pid + " already references another cid."
+ " A pid can only reference one cid.";
throw new PidRefsFileExistsException(errMsg);
} catch (Exception e) {
// * * * cid and pid already released * * *
// Revert the process for all other exceptions
unTagObject(pid, cid);
throw e;
}
...which I believe would work in exactly the same way. The only real difference is that if storeHashStoreRefsFiles()
throws an exception, it will release the locks before your catch
blocks, above, instead of after (assuming storeHashStoreRefsFiles()
has a finally
block, of course).
Looking at the code, it looks like that re-ordering of calls should not be a problem, but I LMK if I'm missing anything important
public HashStoreRunnable(HashStore hashstore, int publicAPIMethod, InputStream objStream, | ||
String pid) { | ||
FileHashStoreUtility.ensureNotNull(hashstore, "hashstore", | ||
"HashStoreServiceRequestConstructor"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see (and have flagged) more inaccuracies. You're setting your future self up for a lifelong game of whack-a-mole :-)
Try this - I think it solves all your problems, without needing to pass the method name:
// example excerpt, from checkForNotEmptyAndValidString()
//
if (string.isBlank()) {
StackTraceElement[] stackTraceElements = Thread.currentThread().getStackTrace();
String msg = "Calling Method: " + stackTraceElements[2].getMethodName()
+ "(): argument cannot be empty, etc, etc...";
throw new IllegalArgumentException(msg);
}
link to original thread, since GH does its best to confuse us
…ashStoreRefUpdateTypes'
…ode requiring it in 'storeHashStoreRefsFiles'
…ad and update signature to remove 'method' argument
…nature to remove 'method' argument
…nature to remove 'method' argument
Thank you again @artntek for reviewing my PR. I believe I have addressed all your latest feedback. Regarding moving of the synchronization code... GitHub does want to confuse me indeed. I have made the change, thank you! Please let me know if you have any other feedback, otherwise I will apply some auto-formatting and get this merged! |
Thanks - this looks a lot cleaner. However, I would urge you to think through the process carefully one more time... This execution path matches the one from your original code, i.e.:
} catch (Exception e) {
// Revert the process for all other exceptions
// We must first release the cid and pid since 'unTagObject' is synchronized
// If not, we will run into a deadlock.
releaseObjectLockedCids(cid);
releaseReferenceLockedPids(pid);
unTagObject(pid, cid);
throw e;
} finally {
...etc My question:Assuming a different thread could hijack the lock on this object between you releasing it (step 1, above) and re-applying it in |
* (which can be used to locate the object on disk), the file's size, and a hex digest | ||
* dict of algorithms and checksums. Storing an object with {@code store_object} also | ||
* tags an object (creating references) which allow the object to be discoverable. | ||
* (@Code ObjectMetadata) object containing relevant file information, such as the file's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still need to change a
to an
ObjectMetadata
storeHashStoreRefsFiles(pid, cid); | ||
|
||
} catch (HashStoreRefsAlreadyExistException hsrfae) { | ||
// *** cid and pid already released *** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove unless you really like it (I just included it in the example to provide a bit of explanation. It seems superfluous in the actual code)
// This exception is thrown when the pid and cid are already tagged appropriately | ||
String errMsg = | ||
"HashStore refs files already exist for pid " + pid + " and cid: " + cid; | ||
throw new HashStoreRefsAlreadyExistException(errMsg); | ||
|
||
} catch (PidRefsFileExistsException prfe) { | ||
// *** cid and pid already released *** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove unless you really like it (I just included it in the example to provide a bit of explanation. It seems superfluous in the actual code)
String errMsg = "pid: " + pid + " already references another cid." | ||
+ " A pid can only reference one cid."; | ||
throw new PidRefsFileExistsException(errMsg); | ||
|
||
} catch (Exception e) { | ||
// *** cid and pid already released *** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove unless you really like it (I just included it in the example to provide a bit of explanation. It seems superfluous in the actual code)
Thank you @artntek! My thoughts on your question posed above:
The purpose of All calls to Since we are using a combination here, beginning with the
Should an unexpected exception occur, even if two threads are competing for the lock, synchronization ensures that only one thread can proceed at a time with operations on the same
The worst case scenario is that clients will have to re-upload their data object, but if there is an unexpected exception occurring where we aren't storing as expected - we probably want to stop and take a look at that before proceeding any further. What do you think? |
Here's a scenario I was thinking of:
LMK if this is not possible/not an issue |
try { | ||
// If no exceptions are thrown, we proceed to synchronization based on the `cid` | ||
synchronizeObjectLockedCids(cid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch :-)
@artntek Your example is accurate - an orphaned data object can be produced. I have created a new issue to discuss and address it here. I added a potential solution and additional context - when you have a chance, can you please take a look and let me know what you think? Many thanks again for your feedback and review comments! I am going to merge this feature into |
Summary of Changes:
deleteObject(String pid)
to delete an object, its reference files and all associated metadata documentsdeleteMetadata(String pid)
to delete all associated metadata documents for the given pidstoreMetadata
to store metadata for a givenpid
andformatId
in a directory formed by calculating the hash of the givenpid
, with the document name being the hash of theformatId
.pid
in its constructor and updatedstoreObject
methods with a piddeleteObject(String idType, String id)
to enable deletion of object, references files and metadata documents - or just the object itself