diff --git a/README.md b/README.md
index b9cf3cb4..3cac7b18 100644
--- a/README.md
+++ b/README.md
@@ -15,6 +15,142 @@ DataONE in general, and HashStore in particular, are open source, community proj
Documentation is a work in progress, and can be found on the [Metacat repository](https://github.com/NCEAS/metacat/blob/feature-1436-storage-and-indexing/docs/user/metacat/source/storage-subsystem.rst#physical-file-layout) as part of the storage redesign planning. Future updates will include documentation here as the package matures.
+## HashStore Overview
+
+HashStore is a content-addressable file management system that utilizes the content identifier of an object to address files. The system stores both objects, references (refs) and metadata in its respective directories and provides an API for interacting with the store. HashStore storage classes (like `FileHashStore`) must implement the HashStore interface to ensure the expected usage of HashStore.
+
+###### Public API Methods
+- storeObject
+- verifyObject
+- tagObject
+- findObject
+- storeMetadata
+- retrieveObject
+- retrieveMetadata
+- deleteObject
+- deleteMetadata
+- getHexDigest
+
+For details, please see the HashStore interface (HashStore.java)
+
+
+###### How do I create a HashStore?
+
+To create or interact with a HashStore, instantiate a HashStore object with the following set of properties:
+- storePath
+- storeDepth
+- storeWidth
+- storeAlgorithm
+- storeMetadataNamespace
+
+```java
+String classPackage = "org.dataone.hashstore.filehashstore.FileHashStore";
+Path rootDirectory = tempFolder.resolve("metacat");
+
+Properties storeProperties = new Properties();
+storeProperties.setProperty("storePath", rootDirectory.toString());
+storeProperties.setProperty("storeDepth", "3");
+storeProperties.setProperty("storeWidth", "2");
+storeProperties.setProperty("storeAlgorithm", "SHA-256");
+storeProperties.setProperty(
+ "storeMetadataNamespace", "http://ns.dataone.org/service/types/v2.0"
+);
+
+// Instantiate a HashStore
+HashStore hashStore = HashStoreFactory.getHashStore(classPackage, storeProperties);
+
+// Store an object
+hashStore.storeObject(stream, pid)
+// ...
+```
+
+
+###### Working with objects (store, retrieve, delete)
+
+In HashStore, objects are first saved as temporary files while their content identifiers are calculated. Once the default hash algorithm list and their hashes are generated, objects are stored in their permanent location using the store's algorithm's corresponding hash value, the store depth and the store width. Lastly, reference files are created for the object so that they can be found and retrieved given an identifier (ex. persistent identifier (pid)). Note: Objects are also stored once and only once.
+
+By calling the various interface methods for `storeObject`, the calling app/client can validate, store and tag an object simultaneously if the relevant data is available. In the absence of an identfiier (ex. persistent identifier (pid)), `storeObject` can be called to solely store an object. The client is then expected to call `verifyObject` when the relevant metadata is available to confirm that the object has been stored as expected. And to finalize the process (to make the object discoverable), the client calls `tagObject``. In summary, there are two expected paths to store an object:
+```java
+// All-in-one process which stores, validates and tags an object
+objectMetadata objInfo = storeObject(InputStream, pid, additionalAlgorithm, checksum, checksumAlgorithm, objSize)
+
+// Manual Process
+// Store object
+objectMetadata objInfo = storeObject(InputStream)
+// Validate object, throws exceptions if there is a mismatch and deletes the associated file
+verifyObject(objInfo, checksum, checksumAlgorithn, objSize)
+// Tag object, makes the object discoverable (find, retrieve, delete)
+tagObject(pid, cid)
+```
+
+**How do I retrieve an object if I have the pid?**
+- To retrieve an object, call the Public API method `retrieveObject` which opens a stream to the object if it exists.
+
+**How do I find an object or check that it exists if I have the pid?**
+- To find the location of the object, call the Public API method `findObject` which will return the content identifier (cid) of the object.
+- This cid can then be used to locate the object on disk by following HashStore's store configuration.
+
+**How do I delete an object if I have the pid?**
+- To delete an object, call the Public API method `deleteObject` which will delete the object and its associated references and reference files where relevant.
+- Note, `deleteObject` and `tagObject` calls are synchronized on their content identifier values so that the shared reference files are not unintentionally modified concurrently. An object that is in the process of being deleted should not be tagged, and vice versa. These calls have been implemented to occur sequentially to improve clarity in the event of an unexpected conflict or issue.
+
+
+###### Working with metadata (store, retrieve, delete)
+
+HashStore's '/metadata' directory holds all metadata for objects stored in HashStore. To differentiate between metadata documents for a given object, HashStore includes the 'formatId' (format or namespace of the metadata) when generating the address of the metadata document to store (the hash of the 'pid' + 'formatId'). By default, calling `storeMetadata` will use HashStore's default metadata namespace as the 'formatId' when storing metadata. Should the calling app wish to store multiple metadata files about an object, the client app is expected to provide a 'formatId' that represents an object format for the metadata type (ex. `storeMetadata(stream, pid, formatId)`).
+
+**How do I retrieve a metadata file?**
+- To find a metadata object, call the Public API method `retrieveMetadata` which returns a stream to the metadata file that's been stored with the default metadata namespace if it exists.
+- If there are multiple metadata objects, a 'formatId' must be specified when calling `retrieveMetadata` (ex. `retrieveMetadata(pid, formatId)`)
+
+**How do I delete a metadata file?**
+- Like `retrieveMetadata`, call the Public API method `deleteMetadata` which will delete the metadata object associated with the given pid.
+- If there are multiple metadata objects, a 'formatId' must be specified when calling `deleteMetadata` to ensure the expected metadata object is deleted.
+
+
+###### What are HashStore reference files?
+
+HashStore assumes that every object to store has a respective identifier. This identifier is then used when storing, retrieving and deleting an object. In order to facilitate this process, we create two types of reference files:
+- pid (persistent identifier) reference files
+- cid (content identifier) reference files
+
+These reference files are implemented in HashStore underneath the hood with no expectation for modification from the calling app/client. The one and only exception to this process when the calling client/app does not have an identifier, and solely stores an objects raw bytes in HashStore (calling `storeObject(InputStream)`).
+
+**'pid' Reference Files**
+- Pid (persistent identifier) reference files are created when storing an object with an identifier.
+- Pid reference files are located in HashStores '/refs/pid' directory
+- If an identifier is not available at the time of storing an object, the calling app/client must create this association between a pid and the object it represents by calling `tagObject` separately.
+- Each pid reference file contains a string that represents the content identifier of the object it references
+- Like how objects are stored once and only once, there is also only one pid reference file for each object.
+
+**'cid' Reference Files**
+- Cid (content identifier) reference files are created at the same time as pid reference files when storing an object with an identifier.
+- Cid reference files are located in HashStore's '/refs/cid' directory
+- A cid reference file is a list of all the pids that reference a cid, delimited by a new line ("\n") character
+
+
+###### What does HashStore look like?
+
+```
+# Example layout in HashStore with a single file stored along with its metadata and reference files.
+# This uses a store depth of 3, with a width of 2 and "SHA-256" as its default store algorithm
+## Notes:
+## - Objects are stored using their content identifier as the file address
+## - The reference file for each pid contains a single cid
+## - The reference file for each cid contains multiple pids each on its own line
+
+.../metacat/hashstore/
+└─ objects
+ └─ /d5/95/3b/d802fa74edea72eb941...00d154a727ed7c2
+└─ metadata
+ └─ /15/8d/7e/55c36a810d7c14479c9...b20d7df66768b04
+└─ refs
+ └─ pid/0d/55/5e/d77052d7e166017f779...7230bcf7abcef65e
+ └─ cid/d5/95/3b/d802fa74edea72eb941...00d154a727ed7c2
+hashstore.yaml
+```
+
+
## Development build
HashStore is a Java package, and built using the [Maven](https://maven.apache.org/) build tool.
@@ -44,6 +180,9 @@ $ java -cp ./target/hashstore-1.0-SNAPSHOT.jar org.dataone.hashstore.HashStoreCl
# Get the checksum of a data object
$ java -cp ./target/hashstore-1.0-SNAPSHOT.jar org.dataone.hashstore.HashStoreClient -store /path/to/store -getchecksum -pid testpid1 -algo SHA-256
+# Find an object in HashStore (returns its content identifer if it exists)
+$ java -cp ./target/hashstore-1.0-SNAPSHOT.jar org.dataone.hashstore.HashStoreClient -store /path/to/store -findobject -pid testpid1
+
# Store a data object
$ java -cp ./target/hashstore-1.0-SNAPSHOT.jar org.dataone.hashstore.HashStoreClient -store /path/to/store -storeobject -path /path/to/data.ext -pid testpid1
diff --git a/pom.xml b/pom.xml
index 796c5bae..532e1c7b 100644
--- a/pom.xml
+++ b/pom.xml
@@ -14,8 +14,8 @@
UTF-8
- 1.8
- 1.8
+ 17
+ 17
diff --git a/src/main/java/org/dataone/hashstore/HashStore.java b/src/main/java/org/dataone/hashstore/HashStore.java
index 0693d1b3..98b6dd5c 100644
--- a/src/main/java/org/dataone/hashstore/HashStore.java
+++ b/src/main/java/org/dataone/hashstore/HashStore.java
@@ -5,39 +5,48 @@
import java.io.InputStream;
import java.security.NoSuchAlgorithmException;
-import org.dataone.hashstore.exceptions.PidObjectExistsException;
+import org.dataone.hashstore.exceptions.OrphanPidRefsFileException;
+import org.dataone.hashstore.exceptions.PidNotFoundInCidRefsFileException;
+import org.dataone.hashstore.exceptions.PidRefsFileExistsException;
/**
- * HashStore is a content-addressable file management system that utilizes the hash/hex digest of a
- * given persistent identifier (PID) to address files. The system stores both objects and metadata
- * in its respective directories and provides an API for interacting with the store. HashStore
- * storage classes (like `FileHashStore`) must implement the HashStore interface to ensure proper
+ * HashStore is a content-addressable file management system that utilizes the content identifier of
+ * an object to address files. The system stores both objects, references (refs) and metadata in its
+ * respective directories and provides an API for interacting with the store. HashStore storage
+ * classes (like `FileHashStore`) must implement the HashStore interface to ensure the expected
* usage of the system.
*/
public interface HashStore {
/**
- * Atomically stores objects to HashStore using a given InputStream and a persistent
- * identifier (pid). Upon successful storage, the method returns an 'ObjectInfo' object
- * containing the object's file information, such as the id, file size, and hex digest map
- * of algorithms and hex digests/checksums. An object is stored once and only once - and
- * `storeObject` also enforces this rule by synchronizing multiple calls and rejecting calls
- * to store duplicate objects.
+ * The `storeObject` method is responsible for the atomic storage of objects to disk using a
+ * given InputStream. Upon successful storage, the method returns a (ObjectMetadata) object
+ * containing relevant file information, such as the file's id (which can be used to locate
+ * the object on disk), the file's size, and a hex digest dict of algorithms and checksums.
+ * Storing an object with `store_object` also tags an object (creating references) which
+ * allow the object to be discoverable.
*
- * The file's id is determined by calculating the SHA-256 hex digest of the provided pid,
- * which is also used as the permanent address of the file. The file's identifier is then
- * sharded using a depth of 3 and width of 2, delimited by '/' and concatenated to produce
- * the final permanent address, which is stored in the object store directory (ex.
- * `./[storePath]/objects/`).
+ * `storeObject` also ensures that an object is stored only once by synchronizing multiple
+ * calls and rejecting calls to store duplicate objects. Note, calling `storeObject` without
+ * a pid is a possibility, but should only store the object without tagging the object. It
+ * is then the caller's responsibility to finalize the process by calling `tagObject` after
+ * verifying the correct object is stored.
+ *
+ * The file's id is determined by calculating the object's content identifier based on the
+ * store's default algorithm, which is also used as the permanent address of the file. The
+ * file's identifier is then sharded using the store's configured depth and width, delimited
+ * by '/' and concatenated to produce the final permanent address and is stored in the
+ * `./[storePath]/objects/` directory.
*
* By default, the hex digest map includes the following hash algorithms: MD5, SHA-1,
- * SHA-256, SHA-384 and SHA-512, which are the most commonly used algorithms in dataset
+ * SHA-256, SHA-384, SHA-512 - which are the most commonly used algorithms in dataset
* submissions to DataONE and the Arctic Data Center. If an additional algorithm is
- * provided, the `storeObject` method checks if it is supported and adds it to the map along
- * with its corresponding hex digest. An algorithm is considered "supported" if it is
- * recognized as a valid hash algorithm in the `java.security.MessageDigest` class.
+ * provided, the `storeObject` method checks if it is supported and adds it to the hex
+ * digests dict along with its corresponding hex digest. An algorithm is considered
+ * "supported" if it is recognized as a valid hash algorithm in
+ * `java.security.MessageDigest` class.
*
- * Similarly, if a checksum and a checksumAlgorithm or an object size value is provided,
- * `storeObject` validates the object to ensure it matches what is provided before moving
+ * Similarly, if a file size and/or checksum & checksumAlgorithm value are provided,
+ * `storeObject` validates the object to ensure it matches the given arguments before moving
* the file to its permanent address.
*
* @param object Input stream to file
@@ -46,39 +55,114 @@ public interface HashStore {
* @param checksum Value of checksum to validate against
* @param checksumAlgorithm Algorithm of checksum submitted
* @param objSize Expected size of object to validate after storing
- * @return ObjectInfo object encapsulating file information
- * @throws NoSuchAlgorithmException When additionalAlgorithm or checksumAlgorithm is invalid
- * @throws IOException I/O Error when writing file, generating checksums and/or
- * moving file
- * @throws PidObjectExistsException When duplicate pid object is found
- * @throws RuntimeException Thrown when there is an issue with permissions, illegal
- * arguments (ex. empty pid) or null pointers
- */
- ObjectInfo storeObject(
+ * @return ObjectMetadata object encapsulating file information
+ * @throws NoSuchAlgorithmException When additionalAlgorithm or checksumAlgorithm is
+ * invalid
+ * @throws IOException I/O Error when writing file, generating checksums
+ * and/or moving file
+ * @throws PidRefsFileExistsException If a pid refs file already exists, meaning the pid is
+ * already referencing a file.
+ * @throws RuntimeException Thrown when there is an issue with permissions,
+ * illegal arguments (ex. empty pid) or null pointers
+ * @throws InterruptedException When tagging pid and cid process is interrupted
+ */
+ public ObjectMetadata storeObject(
InputStream object, String pid, String additionalAlgorithm, String checksum,
String checksumAlgorithm, long objSize
- ) throws NoSuchAlgorithmException, IOException, PidObjectExistsException, RuntimeException;
+ ) throws NoSuchAlgorithmException, IOException, PidRefsFileExistsException,
+ RuntimeException, InterruptedException;
+
+ /**
+ * @see #storeObject(InputStream, String, String, String, String, long)
+ */
+ public ObjectMetadata storeObject(InputStream object) throws NoSuchAlgorithmException,
+ IOException, PidRefsFileExistsException, RuntimeException, InterruptedException;
/**
* @see #storeObject(InputStream, String, String, String, String, long)
*/
- ObjectInfo storeObject(
+ public ObjectMetadata storeObject(
+ InputStream object, String pid, String checksum, String checksumAlgorithm,
+ long objSize
+ ) throws NoSuchAlgorithmException, IOException, PidRefsFileExistsException,
+ RuntimeException, InterruptedException;
+
+ /**
+ * @see #storeObject(InputStream, String, String, String, String, long)
+ */
+ public ObjectMetadata storeObject(
InputStream object, String pid, String checksum, String checksumAlgorithm
- ) throws NoSuchAlgorithmException, IOException, PidObjectExistsException, RuntimeException;
+ ) throws NoSuchAlgorithmException, IOException, PidRefsFileExistsException,
+ RuntimeException, InterruptedException;
/**
* @see #storeObject(InputStream, String, String, String, String, long)
*/
- ObjectInfo storeObject(InputStream object, String pid, String additionalAlgorithm)
- throws NoSuchAlgorithmException, IOException, PidObjectExistsException,
- RuntimeException;
+ public ObjectMetadata storeObject(
+ InputStream object, String pid, String additionalAlgorithm
+ ) throws NoSuchAlgorithmException, IOException, PidRefsFileExistsException,
+ RuntimeException, InterruptedException;
/**
* @see #storeObject(InputStream, String, String, String, String, long)
*/
- ObjectInfo storeObject(InputStream object, String pid, long objSize)
- throws NoSuchAlgorithmException, IOException, PidObjectExistsException,
- RuntimeException;
+ public ObjectMetadata storeObject(InputStream object, String pid, long objSize)
+ throws NoSuchAlgorithmException, IOException, PidRefsFileExistsException,
+ RuntimeException, InterruptedException;
+
+ /**
+ * Creates references that allow objects stored in HashStore to be discoverable. Retrieving,
+ * deleting or calculating a hex digest of an object is based on a pid argument; and to
+ * proceed, we must be able to find the object associated with the pid.
+ *
+ * @param pid Authority-based identifier
+ * @param cid Content-identifier (hash identifier)
+ * @throws IOException Failure to create tmp file
+ * @throws PidRefsFileExistsException When pid refs file already exists
+ * @throws NoSuchAlgorithmException When algorithm used to calculate pid refs address
+ * does not exist
+ * @throws FileNotFoundException If refs file is missing during verification
+ * @throws InterruptedException When tagObject is waiting to execute but is
+ * interrupted
+ */
+ public void tagObject(String pid, String cid) throws IOException,
+ PidRefsFileExistsException, NoSuchAlgorithmException, FileNotFoundException,
+ InterruptedException;
+
+ /**
+ * Confirms that an ObjectMetadata's content is equal to the given values. If it is not
+ * equal, it will return False - otherwise True.
+ *
+ * @param objectInfo ObjectMetadata object with values
+ * @param checksum Value of checksum to validate against
+ * @param checksumAlgorithm Algorithm of checksum submitted
+ * @param objSize Expected size of object to validate after storing
+ * @throws IOException An issue with deleting the object when there is a
+ * mismatch
+ * @throws NoSuchAlgorithmException If checksum algorithm (and its respective checksum) is
+ * not in objectInfo
+ * @throws IllegalArgumentException An expected value does not match
+ */
+ public boolean verifyObject(
+ ObjectMetadata objectInfo, String checksum, String checksumAlgorithm, long objSize
+ ) throws IOException, NoSuchAlgorithmException, IllegalArgumentException;
+
+ /**
+ * Checks whether an object referenced by a pid exists and returns the content identifier.
+ *
+ * @param pid Authority-based identifier
+ * @return Content identifier (cid)
+ * @throws NoSuchAlgorithmException When algorithm used to calculate pid refs
+ * file's absolute address is not valid
+ * @throws IOException Unable to read from a pid refs file or pid refs
+ * file does not exist
+ * @throws OrphanPidRefsFileException When pid refs file exists and the cid found
+ * inside does not exist.
+ * @throws PidNotFoundInCidRefsFileException When pid and cid ref files exists but the
+ * expected pid is not found in the cid refs file.
+ */
+ public String findObject(String pid) throws NoSuchAlgorithmException, IOException,
+ OrphanPidRefsFileException, PidNotFoundInCidRefsFileException;
/**
* Adds/updates metadata (ex. `sysmeta`) to the HashStore by using a given InputStream, a
@@ -101,14 +185,14 @@ ObjectInfo storeObject(InputStream object, String pid, long objSize)
* @throws NoSuchAlgorithmException Algorithm used to calculate permanent address is not
* supported
*/
- String storeMetadata(InputStream metadata, String pid, String formatId) throws IOException,
- IllegalArgumentException, FileNotFoundException, InterruptedException,
- NoSuchAlgorithmException;
+ public String storeMetadata(InputStream metadata, String pid, String formatId)
+ throws IOException, IllegalArgumentException, FileNotFoundException,
+ InterruptedException, NoSuchAlgorithmException;
/**
* @see #storeMetadata(InputStream, String, String)
*/
- String storeMetadata(InputStream metadata, String pid) throws IOException,
+ public String storeMetadata(InputStream metadata, String pid) throws IOException,
IllegalArgumentException, FileNotFoundException, InterruptedException,
NoSuchAlgorithmException;
@@ -123,7 +207,7 @@ String storeMetadata(InputStream metadata, String pid) throws IOException,
* @throws NoSuchAlgorithmException When algorithm used to calculate object address is not
* supported
*/
- InputStream retrieveObject(String pid) throws IllegalArgumentException,
+ public InputStream retrieveObject(String pid) throws IllegalArgumentException,
FileNotFoundException, IOException, NoSuchAlgorithmException;
/**
@@ -139,7 +223,14 @@ InputStream retrieveObject(String pid) throws IllegalArgumentException,
* @throws NoSuchAlgorithmException When algorithm used to calculate metadata address is not
* supported
*/
- InputStream retrieveMetadata(String pid, String formatId) throws IllegalArgumentException,
+ public InputStream retrieveMetadata(String pid, String formatId)
+ throws IllegalArgumentException, FileNotFoundException, IOException,
+ NoSuchAlgorithmException;
+
+ /**
+ * @see #retrieveMetadata(String, String)
+ */
+ public InputStream retrieveMetadata(String pid) throws IllegalArgumentException,
FileNotFoundException, IOException, NoSuchAlgorithmException;
/**
@@ -149,12 +240,27 @@ InputStream retrieveMetadata(String pid, String formatId) throws IllegalArgument
* @param pid Authority-based identifier
* @throws IllegalArgumentException When pid is null or empty
* @throws FileNotFoundException When requested pid has no associated object
- * @throws IOException I/O error when deleting empty directories
+ * @throws IOException I/O error when deleting empty directories,
+ * modifying/deleting reference files
* @throws NoSuchAlgorithmException When algorithm used to calculate object address is not
* supported
+ * @throws InterruptedException When deletion synchronization is interrupted
*/
- void deleteObject(String pid) throws IllegalArgumentException, FileNotFoundException,
- IOException, NoSuchAlgorithmException;
+ public void deleteObject(String pid) throws IllegalArgumentException, FileNotFoundException,
+ IOException, NoSuchAlgorithmException, InterruptedException;
+
+ /**
+ * Delete an object based on its content identifier, with a flag to confirm intention.
+ *
+ * Note: This overload method should only be called when an issue arises during the storage
+ * of an object without a pid, and after verifying (via `verifyObject`) that the object is
+ * not what is expected.
+ *
+ * @param cid Content identifier
+ * @param deleteCid Boolean to confirm
+ */
+ public void deleteObject(String cid, boolean deleteCid) throws IllegalArgumentException,
+ FileNotFoundException, IOException, NoSuchAlgorithmException;
/**
* Deletes a metadata document (ex. `sysmeta`) permanently from HashStore using a given
@@ -163,12 +269,17 @@ void deleteObject(String pid) throws IllegalArgumentException, FileNotFoundExcep
* @param pid Authority-based identifier
* @param formatId Metadata namespace/format
* @throws IllegalArgumentException When pid or formatId is null or empty
- * @throws FileNotFoundException When requested pid has no metadata
* @throws IOException I/O error when deleting empty directories
* @throws NoSuchAlgorithmException When algorithm used to calculate object address is not
* supported
*/
- void deleteMetadata(String pid, String formatId) throws IllegalArgumentException,
+ public void deleteMetadata(String pid, String formatId) throws IllegalArgumentException,
+ FileNotFoundException, IOException, NoSuchAlgorithmException;
+
+ /**
+ * @see #deleteMetadata(String, String)
+ */
+ public void deleteMetadata(String pid) throws IllegalArgumentException,
FileNotFoundException, IOException, NoSuchAlgorithmException;
/**
@@ -184,6 +295,6 @@ void deleteMetadata(String pid, String formatId) throws IllegalArgumentException
* @throws NoSuchAlgorithmException When algorithm used to calculate object address is not
* supported
*/
- String getHexDigest(String pid, String algorithm) throws IllegalArgumentException,
+ public String getHexDigest(String pid, String algorithm) throws IllegalArgumentException,
FileNotFoundException, IOException, NoSuchAlgorithmException;
}
diff --git a/src/main/java/org/dataone/hashstore/HashStoreClient.java b/src/main/java/org/dataone/hashstore/HashStoreClient.java
index c12100a5..ceb31efb 100644
--- a/src/main/java/org/dataone/hashstore/HashStoreClient.java
+++ b/src/main/java/org/dataone/hashstore/HashStoreClient.java
@@ -29,7 +29,7 @@
import org.apache.commons.cli.Options;
import org.apache.commons.cli.ParseException;
import org.dataone.hashstore.exceptions.HashStoreFactoryException;
-import org.dataone.hashstore.exceptions.PidObjectExistsException;
+import org.dataone.hashstore.exceptions.PidRefsFileExistsException;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.dataformat.yaml.YAMLFactory;
@@ -124,13 +124,14 @@ public static void main(String[] args) throws Exception {
String objType = cmd.getOptionValue("stype");
String originDirectory = cmd.getOptionValue("sdir");
String numObjects = cmd.getOptionValue("nobj");
+ String sizeOfFilesToSkip = cmd.getOptionValue("gbskip");
FileHashStoreUtility.ensureNotNull(objType, "-stype", "HashStoreClient");
FileHashStoreUtility.ensureNotNull(originDirectory, "-sdir", "HashStoreClient");
FileHashStoreUtility.ensureNotNull(
action, "-sts, -rav, -dfs", "HashStoreClient"
);
- testWithKnbvm(action, objType, originDirectory, numObjects);
+ testWithKnbvm(action, objType, originDirectory, numObjects, sizeOfFilesToSkip);
} else if (cmd.hasOption("getchecksum")) {
String pid = cmd.getOptionValue("pid");
@@ -141,6 +142,13 @@ public static void main(String[] args) throws Exception {
String hexDigest = hashStore.getHexDigest(pid, algo);
System.out.println(hexDigest);
+ } else if (cmd.hasOption("findobject")) {
+ String pid = cmd.getOptionValue("pid");
+ FileHashStoreUtility.ensureNotNull(pid, "-pid", "HashStoreClient");
+
+ String cid = hashStore.findObject(pid);
+ System.out.println(cid);
+
} else if (cmd.hasOption("storeobject")) {
System.out.println("Storing object");
String pid = cmd.getOptionValue("pid");
@@ -160,7 +168,7 @@ public static void main(String[] args) throws Exception {
if (cmd.hasOption("checksum_algo")) {
checksum_algo = cmd.getOptionValue("checksum_algo");
}
- long size = 0;
+ long size;
if (cmd.hasOption("size")) {
size = Long.parseLong(cmd.getOptionValue("size"));
} else {
@@ -168,7 +176,7 @@ public static void main(String[] args) throws Exception {
}
InputStream pidObjStream = Files.newInputStream(path);
- ObjectInfo objInfo = hashStore.storeObject(
+ ObjectMetadata objInfo = hashStore.storeObject(
pidObjStream, pid, additional_algo, checksum, checksum_algo, size
);
pidObjStream.close();
@@ -274,6 +282,10 @@ private static Options addHashStoreClientOptions() {
"getchecksum", "client_getchecksum", false,
"Flag to get the hex digest of a data object in a HashStore."
);
+ options.addOption(
+ "findobject", "client_findobject", false,
+ "Flag to get the hex digest of a data object in a HashStore."
+ );
options.addOption(
"storeobject", "client_storeobject", false, "Flag to store objs to a HashStore."
);
@@ -316,9 +328,12 @@ private static Options addHashStoreClientOptions() {
"knbvm", "knbvmtestadc", false, "(knbvm) Flag to specify testing with knbvm."
);
options.addOption(
- "nobj", "numberofobj", false,
+ "nobj", "numberofobj", true,
"(knbvm) Option to specify number of objects to retrieve from a Metacat db."
);
+ options.addOption(
+ "gbskip", "gbsizetoskip", true, "(knbvm) Option to specify the size of objects to skip."
+ );
options.addOption(
"sdir", "storedirectory", true,
"(knbvm) Option to specify the directory of objects to convert."
@@ -435,14 +450,17 @@ private static void initializeHashStore(Path storePath) throws HashStoreFactoryE
/**
* Entry point for working with test data found in knbvm (test.arcticdata.io)
*
- * @param actionFlag String representing a knbvm test-related method to call.
- * @param objType "data" (objects) or "documents" (metadata).
- * @param numObjects Number of rows to retrieve from metacat db,
- * if null, will retrieve all rows.
+ * @param actionFlag String representing a knbvm test-related method to call.
+ * @param objType "data" (objects) or "documents" (metadata).
+ * @param originDir Directory path of given objType
+ * @param numObjects Number of rows to retrieve from metacat db,
+ * if null, will retrieve all rows.
+ * @param sizeOfFilesToSkip Size of files in GB to skip
* @throws IOException Related to accessing config files or objects
*/
private static void testWithKnbvm(
- String actionFlag, String objType, String originDir, String numObjects
+ String actionFlag, String objType, String originDir, String numObjects,
+ String sizeOfFilesToSkip
) throws IOException {
// Load metacat db yaml
// Note: In order to test with knbvm, you must manually create a `pgdb.yaml` file with the
@@ -464,15 +482,22 @@ private static void testWithKnbvm(
try {
System.out.println("Connecting to metacat db.");
+ if (!objType.equals("object")) {
+ if (!objType.equals("metadata")) {
+ String errMsg = "HashStoreClient - objType must be 'object' or 'metadata'";
+ throw new IllegalArgumentException(errMsg);
+ }
+ }
+
// Setup metacat db access
Class.forName("org.postgresql.Driver"); // Force driver to register itself
Connection connection = DriverManager.getConnection(url, user, password);
Statement statement = connection.createStatement();
String sqlQuery = "SELECT identifier.guid, identifier.docid, identifier.rev,"
+ " systemmetadata.object_format, systemmetadata.checksum,"
- + " systemmetadata.checksum_algorithm FROM identifier INNER JOIN systemmetadata"
- + " ON identifier.guid = systemmetadata.guid ORDER BY identifier.guid"
- + sqlLimitQuery + ";";
+ + " systemmetadata.checksum_algorithm, systemmetadata.size FROM identifier"
+ + " INNER JOIN systemmetadata ON identifier.guid = systemmetadata.guid"
+ + " ORDER BY identifier.guid" + sqlLimitQuery + ";";
ResultSet resultSet = statement.executeQuery(sqlQuery);
// For each row, get guid, docid, rev, checksum and checksum_algorithm
@@ -486,26 +511,32 @@ private static void testWithKnbvm(
String checksumAlgorithm = resultSet.getString("checksum_algorithm");
String formattedChecksumAlgo = formatAlgo(checksumAlgorithm);
String formatId = resultSet.getString("object_format");
-
- if (!objType.equals("object")) {
- if (!objType.equals("metadata")) {
- String errMsg = "HashStoreClient - objType must be 'object' or 'metadata'";
- throw new IllegalArgumentException(errMsg);
+ long setItemSize = resultSet.getLong("size");
+
+ boolean skipFile = false;
+ if (sizeOfFilesToSkip != null) {
+ // Calculate the size of requested gb to skip in bytes
+ long gbFilesToSkip = Integer.parseInt(sizeOfFilesToSkip) * (1024L * 1024
+ * 1024);
+ if (setItemSize > gbFilesToSkip) {
+ skipFile = true;
}
}
- Path setItemFilePath = Paths.get(originDir + "/" + docid + "." + rev);
- if (Files.exists(setItemFilePath)) {
- System.out.println(
- "File exists (" + setItemFilePath + ")! Adding to resultObjList."
- );
- Map resultObj = new HashMap<>();
- resultObj.put("pid", guid);
- resultObj.put("algorithm", formattedChecksumAlgo);
- resultObj.put("checksum", checksum);
- resultObj.put("path", setItemFilePath.toString());
- resultObj.put("namespace", formatId);
- resultObjList.add(resultObj);
+ if (!skipFile) {
+ Path setItemFilePath = Paths.get(originDir + "/" + docid + "." + rev);
+ if (Files.exists(setItemFilePath)) {
+ System.out.println(
+ "File exists (" + setItemFilePath + ")! Adding to resultObjList."
+ );
+ Map resultObj = new HashMap<>();
+ resultObj.put("pid", guid);
+ resultObj.put("algorithm", formattedChecksumAlgo);
+ resultObj.put("checksum", checksum);
+ resultObj.put("path", setItemFilePath.toString());
+ resultObj.put("namespace", formatId);
+ resultObjList.add(resultObj);
+ }
}
}
@@ -558,10 +589,12 @@ private static void storeObjsWithChecksumFromDb(List