Feature-57: `deleteObject` Refactor #58

doulikecookiedough · 2024-01-31T20:30:09Z

Summary of Changes:

Refactored existing deleteObject(String pid) to delete an object, its reference files and all associated metadata documents
Refactored existing deleteMetadata(String pid) to delete all associated metadata documents for the given pid
Refactored storeMetadata to store metadata for a given pid and formatId in a directory formed by calculating the hash of the given pid, with the document name being the hash of the formatId.
Refactored 'ObjectMetadata' class to include a pid in its constructor and updated storeObject methods with a pid
Added new overload method deleteObject(String idType, String id) to enable deletion of object, references files and metadata documents - or just the object itself
Added and revised junit tests
Updated HashStore interface javadocs and README
Cleaned up code to improve clarity

…HashStoreUtility class

…ll files, then delete at the very end)

…structor

…for the given pid and update junit test

…sary and add new junit tests

src/main/java/org/dataone/hashstore/ObjectMetadata.java

src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java

…ove' method

…thread safety

src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java

… potential deadlock

…hStore to utility class

…cenarios

…eLockedPids'

… with 'tagObject'

…ctly formed (was only using formatId, instead of pid + formatId), fix affected test and rename variables for improved clarity

…debugging

…nd revise all junit tests and affected code

…ctInfo' object and update junit tests

…d of .trim() and then .isEmpty()

artntek · 2024-08-14T21:03:59Z

README.md

+identifier (pid)). This process produces reference files, which allow objects to be found and
+retrieved with a given identifier.
+- Note 1: An identifier can only be used once
+- Note 2: Objects are stored once and only once using its content identifier (a checksum generated


Inconsistent plural/singular: Objects are [...] using its

Suggest: Each object is stored once and only once using its content identifier

artntek · 2024-08-14T21:05:07Z

README.md

 // Validate object, if the parameters do not match, the data object associated with the objInfo
 // supplied will be deleted
- deleteInvalidObject(objInfo, checksum, checksumAlgorithn, objSize)
+deleteInvalidObject(objInfo, checksum, checksumAlgorithn, objSize);


deleteIfInvalidObject

artntek · 2024-08-14T21:08:46Z

src/main/java/org/dataone/hashstore/HashStore.java

+         * disk using a given InputStream. Upon successful storage, the method returns a
+         * (ObjectMetadata) object containing relevant file information, such as the file's id


* [...] returns a * (ObjectMetadata) object [...]

suggest:

returns an {@code ObjectMetadata} object

artntek · 2024-08-14T21:11:19Z

src/main/java/org/dataone/hashstore/HashStore.java

+         * The {@code storeObject} method is responsible for the atomic storage of objects to
+         * disk using a given InputStream. Upon successful storage, the method returns a
+         * (ObjectMetadata) object containing relevant file information, such as the file's id
+         * (which can be used to locate the object on disk), the file's size, and a hex digest


(which can be used to locate the object on disk)

(which can be used by a system administrator -- but not by an API client -- to locate the object on disk)

artntek · 2024-08-14T21:13:06Z

src/main/java/org/dataone/hashstore/ObjectMetadata.java

LOVE seeing a whole file be collapsed down to one line 🤣

Me too, it's so clean now!

artntek · 2024-08-14T22:02:11Z

src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java

-            objectInfo.getHexDigests(), "objectInfo.getHexDigests()", "deleteInvalidObject");
-        if (objectInfo.getHexDigests().isEmpty()) {
+            objectInfo.hexDigests(), "objectInfo.getHexDigests()", "deleteInvalidObject");
+        if (objectInfo.hexDigests().isEmpty()) {
            throw new MissingHexDigestsException("Missing hexDigests in supplied ObjectMetadata");
        }
        FileHashStoreUtility.ensureNotNull(checksum, "checksum", "deleteInvalidObject");
        FileHashStoreUtility.ensureNotNull(checksumAlgorithm, "checksumAlgorithm", "deleteInvalidObject");


deleteIfInvalidObject

artntek · 2024-08-14T22:02:35Z

src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java

-            objectInfo.getHexDigests(), "objectInfo.getHexDigests()", "deleteInvalidObject");
-        if (objectInfo.getHexDigests().isEmpty()) {
+            objectInfo.hexDigests(), "objectInfo.getHexDigests()", "deleteInvalidObject");
+        if (objectInfo.hexDigests().isEmpty()) {
            throw new MissingHexDigestsException("Missing hexDigests in supplied ObjectMetadata");
        }
        FileHashStoreUtility.ensureNotNull(checksum, "checksum", "deleteInvalidObject");


deleteIfInvalidObject

artntek · 2024-08-14T22:03:07Z

src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java

@@ -521,12 +542,10 @@ public ObjectMetadata storeObject(InputStream object) throws NoSuchAlgorithmExce
        // call 'deleteInvalidObject' (optional) to check that the object is valid, and then


deleteIfInvalidObject

artntek · 2024-08-14T22:23:56Z

src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java

+            synchronizeObjectLockedCids(cid);
+            synchronizeReferenceLockedPids(pid);


I don't see a downside to moving it, but do see plenty of upside - unless I'm missing something. Your calling code would be simplified to this:

try { storeHashStoreRefsFiles(pid, cid); } catch (HashStoreRefsAlreadyExistException hsrfae) { // * * * cid and pid already released * * * // This exception is thrown when the pid and cid are already tagged appropriately String errMsg = "HashStore refs files already exist for pid " + pid + " and cid: " + cid; throw new HashStoreRefsAlreadyExistException(errMsg); } catch (PidRefsFileExistsException prfe) { // * * * cid and pid already released * * * String errMsg = "pid: " + pid + " already references another cid." + " A pid can only reference one cid."; throw new PidRefsFileExistsException(errMsg); } catch (Exception e) { // * * * cid and pid already released * * * // Revert the process for all other exceptions unTagObject(pid, cid); throw e; }

...which I believe would work in exactly the same way. The only real difference is that if storeHashStoreRefsFiles() throws an exception, it will release the locks before your catch blocks, above, instead of after (assuming storeHashStoreRefsFiles() has a finally block, of course).

Looking at the code, it looks like that re-ordering of calls should not be a problem, but I LMK if I'm missing anything important

artntek · 2024-08-15T00:26:33Z

src/test/java/org/dataone/hashstore/HashStoreRunnable.java

+    public HashStoreRunnable(HashStore hashstore, int publicAPIMethod, InputStream objStream,
+                             String pid) {
+        FileHashStoreUtility.ensureNotNull(hashstore, "hashstore",
+                                           "HashStoreServiceRequestConstructor");


I still see (and have flagged) more inaccuracies. You're setting your future self up for a lifelong game of whack-a-mole :-)

Try this - I think it solves all your problems, without needing to pass the method name:

// example excerpt, from checkForNotEmptyAndValidString() // if (string.isBlank()) { StackTraceElement[] stackTraceElements = Thread.currentThread().getStackTrace(); String msg = "Calling Method: " + stackTraceElements[2].getMethodName() + "(): argument cannot be empty, etc, etc..."; throw new IllegalArgumentException(msg); }

link to original thread, since GH does its best to confuse us

…ashStoreRefUpdateTypes'

…ode requiring it in 'storeHashStoreRefsFiles'

…ad and update signature to remove 'method' argument

…nature to remove 'method' argument

doulikecookiedough · 2024-08-15T17:01:10Z

Thank you again @artntek for reviewing my PR. I believe I have addressed all your latest feedback. Regarding moving of the synchronization code... GitHub does want to confuse me indeed. I have made the change, thank you!

Please let me know if you have any other feedback, otherwise I will apply some auto-formatting and get this merged!

artntek · 2024-08-15T17:33:50Z

Thank you again @artntek for reviewing my PR. I believe I have addressed all your latest feedback. Regarding moving of the synchronization code... GitHub does want to confuse me indeed. I have made the change, thank you!

Thanks - this looks a lot cleaner.

However, I would urge you to think through the process carefully one more time...

This execution path matches the one from your original code, i.e.:

you released the pid lock
then you call unTagObject()
unTagObject() applies a new pid lock
...like this:

  } catch (Exception e) {
      // Revert the process for all other exceptions
      // We must first release the cid and pid since 'unTagObject' is synchronized
      // If not, we will run into a deadlock.
      releaseObjectLockedCids(cid);
      releaseReferenceLockedPids(pid);
      unTagObject(pid, cid);
      throw e;
  } finally {
      ...etc

My question:

Assuming a different thread could hijack the lock on this object between you releasing it (step 1, above) and re-applying it in unTagObject() (step 3, above) -- are there any scenarios where this could pose a problem? Could you end up untagging a valid tag written by the other thread, for example? or?

artntek · 2024-08-15T17:05:09Z

src/main/java/org/dataone/hashstore/HashStore.java

-         * (which can be used to locate the object on disk), the file's size, and a hex digest
-         * dict of algorithms and checksums. Storing an object with {@code store_object} also
-         * tags an object (creating references) which allow the object to be discoverable.
+         * (@Code ObjectMetadata) object containing relevant file information, such as the file's


still need to change a to an ObjectMetadata

artntek · 2024-08-15T17:15:41Z

src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java

            storeHashStoreRefsFiles(pid, cid);

        } catch (HashStoreRefsAlreadyExistException hsrfae) {
+            // *** cid and pid already released ***


remove unless you really like it (I just included it in the example to provide a bit of explanation. It seems superfluous in the actual code)

artntek · 2024-08-15T17:15:46Z

src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java

            // This exception is thrown when the pid and cid are already tagged appropriately
            String errMsg =
                "HashStore refs files already exist for pid " + pid + " and cid: " + cid;
            throw new HashStoreRefsAlreadyExistException(errMsg);

        } catch (PidRefsFileExistsException prfe) {
+            // *** cid and pid already released ***


remove unless you really like it (I just included it in the example to provide a bit of explanation. It seems superfluous in the actual code)

artntek · 2024-08-15T17:15:50Z

src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java

            String errMsg = "pid: " + pid + " already references another cid."
                + " A pid can only reference one cid.";
            throw new PidRefsFileExistsException(errMsg);

        } catch (Exception e) {
+            // *** cid and pid already released ***


remove unless you really like it (I just included it in the example to provide a bit of explanation. It seems superfluous in the actual code)

doulikecookiedough · 2024-08-15T20:57:26Z

Thank you @artntek! My thoughts on your question posed above:

My question:
Assuming a different thread could hijack the lock on this object between you releasing it (step 1, above) and re-applying it in unTagObject() (step 3, above) -- are there any scenarios where this could pose a problem? Could you end up untagging a valid tag written by the other thread, for example? or?

The purpose of unTagObject is to undo the tagging process if any unexpected exception occurs - if it is called, it must be executed no matter what. The only time unTagObject is called at this moment is through tagObject itself.

All calls to tagObject are synchronized (thread-safe) based on the cid and pid combination, but this doesn't mean the order in which they execute can be guaranteed if two threads are competing for the same lock.

Since we are using a combination here, beginning with the cid, then the pid, ultimately every action on a cid or pid will be executed. Whether we tagObject or unTagObject, we will always perform each action based on the state of the reference files (which are accounted for).

Note 1: tagObject also does not store duplicate pid references in the cid reference file, so there should never be a situation in which a pid appears twice.

Should an unexpected exception occur, even if two threads are competing for the lock, synchronization ensures that only one thread can proceed at a time with operations on the same cid and pid combination. Therefore, unTagObject will safely undo the process without affecting the state of HashStore.

Note 2: the most important reference file of all is the cid reference file, and access to this tracking document is always synchronized.

The worst case scenario is that clients will have to re-upload their data object, but if there is an unexpected exception occurring where we aren't storing as expected - we probably want to stop and take a look at that before proceeding any further.

What do you think?

…prove flow

artntek · 2024-08-15T22:58:45Z

The worst case scenario is that clients will have to re-upload their data object, but if there is an unexpected exception occurring where we aren't storing as expected - we probably want to stop and take a look at that before proceeding any further.

What do you think?

Here's a scenario I was thinking of:

Thread A                    Thread B
========                    ========
   :                            :
uploads object X            uploads object X
   :                            :
(gets lock)                 (awaiting lock)
   :                            :
tagging obj X,                  :
FAILS ❌                        :
(returns lock)                  :
   :                        (gets lock)
(awaiting lock)                 :
   :                        TAGS obj X ✅
   :                        (returns lock)
(gets lock)                     :
UNTAGS obj X ✅                 :
(returns lock)                  :
   :                            :
   :                            :
               DONE
* * * Obj X is orphaned/not tagged ?? * * *

LMK if this is not possible/not an issue

artntek · 2024-08-15T23:03:25Z

src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java

                try {
+                    // If no exceptions are thrown, we proceed to synchronization based on the `cid`
+                    synchronizeObjectLockedCids(cid);


good catch :-)

…nsistenty

doulikecookiedough · 2024-08-16T04:34:27Z

@artntek Your example is accurate - an orphaned data object can be produced. I have created a new issue to discuss and address it here. I added a potential solution and additional context - when you have a chance, can you please take a look and let me know what you think?

Many thanks again for your feedback and review comments! I am going to merge this feature into develop.

doulikecookiedough added 9 commits February 5, 2024 10:27

Clean-up code

d1aaf32

Add new methods 'renamePathForDeletion' and 'deleteListItems' in File…

d56b9b1

…HashStoreUtility class

Refactor 'deleteObject' to improve atomicity of the process (rename a…

20a2092

…ll files, then delete at the very end)

Update 'HashStore' interface javadocs and clean up code

c86bb10

Clean up code, javadocs and fix minor bug in ObjectMetadata class con…

c66b5a1

…structor

Refactor 'deleteMetadata(String pid)' to remove all metadata related …

97cc255

…for the given pid and update junit test

Update README.md

5297773

Fix typo in 'renamePathForDeletion' javadoc

90b4d5a

Refactor 'tagObject' to handle scenarios where exceptions are unneces…

f3e4fea

…sary and add new junit tests

taojing2002 reviewed Feb 9, 2024

View reviewed changes

src/main/java/org/dataone/hashstore/ObjectMetadata.java Outdated Show resolved Hide resolved

taojing2002 reviewed Feb 9, 2024

View reviewed changes

src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java Show resolved Hide resolved

taojing2002 reviewed Feb 9, 2024

View reviewed changes

src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java Show resolved Hide resolved

doulikecookiedough added 8 commits February 15, 2024 09:34

Add TODO item

786f81e

Fix inaccurate javadoc return description in 'ObjectMetadata' class

fd605a8

Swallow file already exists exception when creating directories in 'm…

7a343cb

…ove' method

Synchronize 'deleteObjectByCid' method with other delete methods for …

519f9b6

…thread safety

Fix typos in 'FileHashStore' class

6fb8abe

Remove TODO item

4c9348a

Improve clarity in comments for 'getExpectedPath' method

0c3fc66

Update comment formatting, revise javadoc and fix typo

5fa19f8

taojing2002 reviewed May 10, 2024

View reviewed changes

src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java Outdated Show resolved Hide resolved

doulikecookiedough added 9 commits May 10, 2024 10:48

Move initial synchronization block to within try statement to prevent…

bc0be54

… potential deadlock

Revise comments and extract 'checkObjectEquality' method from FileHas…

1a3835d

…hStore to utility class

Refactor 'deleteObject' to delete metadata in respective blocks for s…

9aef583

…cenarios

Refactor 'tagObject' to synchronize based on new array list 'referenc…

645d3e7

…eLockedPids'

Refactor 'deleteObject' with pids to synchronize based on pids shared…

644eae5

… with 'tagObject'

Fix bug in 'getExpectedPath' where metadata document id was not corre…

14062e5

…ctly formed (was only using formatId, instead of pid + formatId), fix affected test and rename variables for improved clarity

Add comments to catch blocks in 'deleteObject' to help with debugging

3189118

Revert 'deleteObject' order of operations to debug changes

a76fe7a

Revert changes to synchronization for tagObject and deleteObject for …

6261abc

…debugging

doulikecookiedough added 5 commits August 13, 2024 14:43

Refactor 'ObjectMetadata' to be a record instead of a custom class, a…

906ea73

…nd revise all junit tests and affected code

Cleanup 'ObjectMetadata' class

ce07581

Add new record 'objectInfo', refactor 'findObject' to return an 'obje…

5bce0ce

…ctInfo' object and update junit tests

Refactor 'checkForNotEmptyAndValidString' to call '.isBlank()' instea…

75ba6c0

…d of .trim() and then .isEmpty()

Refactor 'HashStoreRunnable' run's switch case per formatter suggestion

f5ca23d

artntek requested changes Aug 15, 2024

View reviewed changes

doulikecookiedough added 10 commits August 15, 2024 08:11

Update README.md

84846f4

Update javadocs in 'HashStore' interface

f369441

Refactor and simplify usage of enum objects 'HashStoreIdTypes' and 'H…

dc6a6fb

…ashStoreRefUpdateTypes'

Add missing javadocs for enum objects to add clarity

7ba3b73

Rename 'objectInfo' record to 'ObjectInfo'

1eabe79

Rename references of 'deleteInvalidObject' to 'deleteIfInvalidObject'

0bd9a3f

Refactor 'tagObject' by moving synchronization code to be closer to c…

189eaf3

…ode requiring it in 'storeHashStoreRefsFiles'

Refactor 'checkForNotEmptyAndValidString' to get method name via thre…

8ceb641

…ad and update signature to remove 'method' argument

Refactor 'ensureNotNull' to get method name via thread and update sig…

b94eb2c

…nature to remove 'method' argument

Refactor 'checkPositive' to get method name via thread and update sig…

4561f79

…nature to remove 'method' argument

artntek requested changes Aug 15, 2024

View reviewed changes

doulikecookiedough added 2 commits August 15, 2024 13:27

Fix typo in 'HashStore' interface

962ed5b

Revise comments in 'tagObject'

a9fce58

Move synchronized call to within try statement in 'unTagObject' to im…

6a0c32b

…prove flow

artntek reviewed Aug 15, 2024

View reviewed changes

artntek approved these changes Aug 15, 2024

View reviewed changes

Apply IntelliJ automatic formatting to entire codebase for linting co…

d61d9c3

…nsistenty

doulikecookiedough merged commit 21bcd62 into develop Aug 16, 2024
1 check passed

doulikecookiedough mentioned this pull request Aug 19, 2024

Refactor deleteObject for new requirements #57

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature-57: `deleteObject` Refactor #58

Feature-57: `deleteObject` Refactor #58

doulikecookiedough commented Jan 31, 2024 •

edited

Loading

artntek Aug 14, 2024

artntek Aug 14, 2024

artntek Aug 14, 2024

artntek Aug 14, 2024

artntek Aug 14, 2024

doulikecookiedough Aug 15, 2024

artntek Aug 14, 2024

artntek Aug 14, 2024

artntek Aug 14, 2024

artntek Aug 14, 2024

artntek Aug 15, 2024

doulikecookiedough commented Aug 15, 2024

artntek commented Aug 15, 2024

artntek Aug 15, 2024

artntek Aug 15, 2024

artntek Aug 15, 2024

artntek Aug 15, 2024

doulikecookiedough commented Aug 15, 2024 •

edited

Loading

artntek commented Aug 15, 2024

artntek Aug 15, 2024

doulikecookiedough commented Aug 16, 2024

		* disk using a given InputStream. Upon successful storage, the method returns a
		* (ObjectMetadata) object containing relevant file information, such as the file's id

		@@ -521,12 +542,10 @@ public ObjectMetadata storeObject(InputStream object) throws NoSuchAlgorithmExce
		// call 'deleteInvalidObject' (optional) to check that the object is valid, and then

		synchronizeObjectLockedCids(cid);
		synchronizeReferenceLockedPids(pid);

Feature-57: deleteObject Refactor #58

Feature-57: deleteObject Refactor #58

Conversation

doulikecookiedough commented Jan 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

doulikecookiedough commented Aug 15, 2024

artntek commented Aug 15, 2024

My question:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

doulikecookiedough commented Aug 15, 2024 • edited Loading

artntek commented Aug 15, 2024

Choose a reason for hiding this comment

doulikecookiedough commented Aug 16, 2024

Feature-57: `deleteObject` Refactor #58

Feature-57: `deleteObject` Refactor #58

doulikecookiedough commented Jan 31, 2024 •

edited

Loading

doulikecookiedough commented Aug 15, 2024 •

edited

Loading