diff --git a/README.md b/README.md index b9cf3cb4..3cac7b18 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,142 @@ DataONE in general, and HashStore in particular, are open source, community proj Documentation is a work in progress, and can be found on the [Metacat repository](https://github.com/NCEAS/metacat/blob/feature-1436-storage-and-indexing/docs/user/metacat/source/storage-subsystem.rst#physical-file-layout) as part of the storage redesign planning. Future updates will include documentation here as the package matures. +## HashStore Overview + +HashStore is a content-addressable file management system that utilizes the content identifier of an object to address files. The system stores both objects, references (refs) and metadata in its respective directories and provides an API for interacting with the store. HashStore storage classes (like `FileHashStore`) must implement the HashStore interface to ensure the expected usage of HashStore. + +###### Public API Methods +- storeObject +- verifyObject +- tagObject +- findObject +- storeMetadata +- retrieveObject +- retrieveMetadata +- deleteObject +- deleteMetadata +- getHexDigest + +For details, please see the HashStore interface (HashStore.java) + + +###### How do I create a HashStore? + +To create or interact with a HashStore, instantiate a HashStore object with the following set of properties: +- storePath +- storeDepth +- storeWidth +- storeAlgorithm +- storeMetadataNamespace + +```java +String classPackage = "org.dataone.hashstore.filehashstore.FileHashStore"; +Path rootDirectory = tempFolder.resolve("metacat"); + +Properties storeProperties = new Properties(); +storeProperties.setProperty("storePath", rootDirectory.toString()); +storeProperties.setProperty("storeDepth", "3"); +storeProperties.setProperty("storeWidth", "2"); +storeProperties.setProperty("storeAlgorithm", "SHA-256"); +storeProperties.setProperty( + "storeMetadataNamespace", "http://ns.dataone.org/service/types/v2.0" +); + +// Instantiate a HashStore +HashStore hashStore = HashStoreFactory.getHashStore(classPackage, storeProperties); + +// Store an object +hashStore.storeObject(stream, pid) +// ... +``` + + +###### Working with objects (store, retrieve, delete) + +In HashStore, objects are first saved as temporary files while their content identifiers are calculated. Once the default hash algorithm list and their hashes are generated, objects are stored in their permanent location using the store's algorithm's corresponding hash value, the store depth and the store width. Lastly, reference files are created for the object so that they can be found and retrieved given an identifier (ex. persistent identifier (pid)). Note: Objects are also stored once and only once. + +By calling the various interface methods for `storeObject`, the calling app/client can validate, store and tag an object simultaneously if the relevant data is available. In the absence of an identfiier (ex. persistent identifier (pid)), `storeObject` can be called to solely store an object. The client is then expected to call `verifyObject` when the relevant metadata is available to confirm that the object has been stored as expected. And to finalize the process (to make the object discoverable), the client calls `tagObject``. In summary, there are two expected paths to store an object: +```java +// All-in-one process which stores, validates and tags an object +objectMetadata objInfo = storeObject(InputStream, pid, additionalAlgorithm, checksum, checksumAlgorithm, objSize) + +// Manual Process +// Store object +objectMetadata objInfo = storeObject(InputStream) +// Validate object, throws exceptions if there is a mismatch and deletes the associated file +verifyObject(objInfo, checksum, checksumAlgorithn, objSize) +// Tag object, makes the object discoverable (find, retrieve, delete) +tagObject(pid, cid) +``` + +**How do I retrieve an object if I have the pid?** +- To retrieve an object, call the Public API method `retrieveObject` which opens a stream to the object if it exists. + +**How do I find an object or check that it exists if I have the pid?** +- To find the location of the object, call the Public API method `findObject` which will return the content identifier (cid) of the object. +- This cid can then be used to locate the object on disk by following HashStore's store configuration. + +**How do I delete an object if I have the pid?** +- To delete an object, call the Public API method `deleteObject` which will delete the object and its associated references and reference files where relevant. +- Note, `deleteObject` and `tagObject` calls are synchronized on their content identifier values so that the shared reference files are not unintentionally modified concurrently. An object that is in the process of being deleted should not be tagged, and vice versa. These calls have been implemented to occur sequentially to improve clarity in the event of an unexpected conflict or issue. + + +###### Working with metadata (store, retrieve, delete) + +HashStore's '/metadata' directory holds all metadata for objects stored in HashStore. To differentiate between metadata documents for a given object, HashStore includes the 'formatId' (format or namespace of the metadata) when generating the address of the metadata document to store (the hash of the 'pid' + 'formatId'). By default, calling `storeMetadata` will use HashStore's default metadata namespace as the 'formatId' when storing metadata. Should the calling app wish to store multiple metadata files about an object, the client app is expected to provide a 'formatId' that represents an object format for the metadata type (ex. `storeMetadata(stream, pid, formatId)`). + +**How do I retrieve a metadata file?** +- To find a metadata object, call the Public API method `retrieveMetadata` which returns a stream to the metadata file that's been stored with the default metadata namespace if it exists. +- If there are multiple metadata objects, a 'formatId' must be specified when calling `retrieveMetadata` (ex. `retrieveMetadata(pid, formatId)`) + +**How do I delete a metadata file?** +- Like `retrieveMetadata`, call the Public API method `deleteMetadata` which will delete the metadata object associated with the given pid. +- If there are multiple metadata objects, a 'formatId' must be specified when calling `deleteMetadata` to ensure the expected metadata object is deleted. + + +###### What are HashStore reference files? + +HashStore assumes that every object to store has a respective identifier. This identifier is then used when storing, retrieving and deleting an object. In order to facilitate this process, we create two types of reference files: +- pid (persistent identifier) reference files +- cid (content identifier) reference files + +These reference files are implemented in HashStore underneath the hood with no expectation for modification from the calling app/client. The one and only exception to this process when the calling client/app does not have an identifier, and solely stores an objects raw bytes in HashStore (calling `storeObject(InputStream)`). + +**'pid' Reference Files** +- Pid (persistent identifier) reference files are created when storing an object with an identifier. +- Pid reference files are located in HashStores '/refs/pid' directory +- If an identifier is not available at the time of storing an object, the calling app/client must create this association between a pid and the object it represents by calling `tagObject` separately. +- Each pid reference file contains a string that represents the content identifier of the object it references +- Like how objects are stored once and only once, there is also only one pid reference file for each object. + +**'cid' Reference Files** +- Cid (content identifier) reference files are created at the same time as pid reference files when storing an object with an identifier. +- Cid reference files are located in HashStore's '/refs/cid' directory +- A cid reference file is a list of all the pids that reference a cid, delimited by a new line ("\n") character + + +###### What does HashStore look like? + +``` +# Example layout in HashStore with a single file stored along with its metadata and reference files. +# This uses a store depth of 3, with a width of 2 and "SHA-256" as its default store algorithm +## Notes: +## - Objects are stored using their content identifier as the file address +## - The reference file for each pid contains a single cid +## - The reference file for each cid contains multiple pids each on its own line + +.../metacat/hashstore/ +└─ objects + └─ /d5/95/3b/d802fa74edea72eb941...00d154a727ed7c2 +└─ metadata + └─ /15/8d/7e/55c36a810d7c14479c9...b20d7df66768b04 +└─ refs + └─ pid/0d/55/5e/d77052d7e166017f779...7230bcf7abcef65e + └─ cid/d5/95/3b/d802fa74edea72eb941...00d154a727ed7c2 +hashstore.yaml +``` + + ## Development build HashStore is a Java package, and built using the [Maven](https://maven.apache.org/) build tool. @@ -44,6 +180,9 @@ $ java -cp ./target/hashstore-1.0-SNAPSHOT.jar org.dataone.hashstore.HashStoreCl # Get the checksum of a data object $ java -cp ./target/hashstore-1.0-SNAPSHOT.jar org.dataone.hashstore.HashStoreClient -store /path/to/store -getchecksum -pid testpid1 -algo SHA-256 +# Find an object in HashStore (returns its content identifer if it exists) +$ java -cp ./target/hashstore-1.0-SNAPSHOT.jar org.dataone.hashstore.HashStoreClient -store /path/to/store -findobject -pid testpid1 + # Store a data object $ java -cp ./target/hashstore-1.0-SNAPSHOT.jar org.dataone.hashstore.HashStoreClient -store /path/to/store -storeobject -path /path/to/data.ext -pid testpid1 diff --git a/pom.xml b/pom.xml index 796c5bae..532e1c7b 100644 --- a/pom.xml +++ b/pom.xml @@ -14,8 +14,8 @@ UTF-8 - 1.8 - 1.8 + 17 + 17 diff --git a/src/main/java/org/dataone/hashstore/HashStore.java b/src/main/java/org/dataone/hashstore/HashStore.java index 0693d1b3..98b6dd5c 100644 --- a/src/main/java/org/dataone/hashstore/HashStore.java +++ b/src/main/java/org/dataone/hashstore/HashStore.java @@ -5,39 +5,48 @@ import java.io.InputStream; import java.security.NoSuchAlgorithmException; -import org.dataone.hashstore.exceptions.PidObjectExistsException; +import org.dataone.hashstore.exceptions.OrphanPidRefsFileException; +import org.dataone.hashstore.exceptions.PidNotFoundInCidRefsFileException; +import org.dataone.hashstore.exceptions.PidRefsFileExistsException; /** - * HashStore is a content-addressable file management system that utilizes the hash/hex digest of a - * given persistent identifier (PID) to address files. The system stores both objects and metadata - * in its respective directories and provides an API for interacting with the store. HashStore - * storage classes (like `FileHashStore`) must implement the HashStore interface to ensure proper + * HashStore is a content-addressable file management system that utilizes the content identifier of + * an object to address files. The system stores both objects, references (refs) and metadata in its + * respective directories and provides an API for interacting with the store. HashStore storage + * classes (like `FileHashStore`) must implement the HashStore interface to ensure the expected * usage of the system. */ public interface HashStore { /** - * Atomically stores objects to HashStore using a given InputStream and a persistent - * identifier (pid). Upon successful storage, the method returns an 'ObjectInfo' object - * containing the object's file information, such as the id, file size, and hex digest map - * of algorithms and hex digests/checksums. An object is stored once and only once - and - * `storeObject` also enforces this rule by synchronizing multiple calls and rejecting calls - * to store duplicate objects. + * The `storeObject` method is responsible for the atomic storage of objects to disk using a + * given InputStream. Upon successful storage, the method returns a (ObjectMetadata) object + * containing relevant file information, such as the file's id (which can be used to locate + * the object on disk), the file's size, and a hex digest dict of algorithms and checksums. + * Storing an object with `store_object` also tags an object (creating references) which + * allow the object to be discoverable. * - * The file's id is determined by calculating the SHA-256 hex digest of the provided pid, - * which is also used as the permanent address of the file. The file's identifier is then - * sharded using a depth of 3 and width of 2, delimited by '/' and concatenated to produce - * the final permanent address, which is stored in the object store directory (ex. - * `./[storePath]/objects/`). + * `storeObject` also ensures that an object is stored only once by synchronizing multiple + * calls and rejecting calls to store duplicate objects. Note, calling `storeObject` without + * a pid is a possibility, but should only store the object without tagging the object. It + * is then the caller's responsibility to finalize the process by calling `tagObject` after + * verifying the correct object is stored. + * + * The file's id is determined by calculating the object's content identifier based on the + * store's default algorithm, which is also used as the permanent address of the file. The + * file's identifier is then sharded using the store's configured depth and width, delimited + * by '/' and concatenated to produce the final permanent address and is stored in the + * `./[storePath]/objects/` directory. * * By default, the hex digest map includes the following hash algorithms: MD5, SHA-1, - * SHA-256, SHA-384 and SHA-512, which are the most commonly used algorithms in dataset + * SHA-256, SHA-384, SHA-512 - which are the most commonly used algorithms in dataset * submissions to DataONE and the Arctic Data Center. If an additional algorithm is - * provided, the `storeObject` method checks if it is supported and adds it to the map along - * with its corresponding hex digest. An algorithm is considered "supported" if it is - * recognized as a valid hash algorithm in the `java.security.MessageDigest` class. + * provided, the `storeObject` method checks if it is supported and adds it to the hex + * digests dict along with its corresponding hex digest. An algorithm is considered + * "supported" if it is recognized as a valid hash algorithm in + * `java.security.MessageDigest` class. * - * Similarly, if a checksum and a checksumAlgorithm or an object size value is provided, - * `storeObject` validates the object to ensure it matches what is provided before moving + * Similarly, if a file size and/or checksum & checksumAlgorithm value are provided, + * `storeObject` validates the object to ensure it matches the given arguments before moving * the file to its permanent address. * * @param object Input stream to file @@ -46,39 +55,114 @@ public interface HashStore { * @param checksum Value of checksum to validate against * @param checksumAlgorithm Algorithm of checksum submitted * @param objSize Expected size of object to validate after storing - * @return ObjectInfo object encapsulating file information - * @throws NoSuchAlgorithmException When additionalAlgorithm or checksumAlgorithm is invalid - * @throws IOException I/O Error when writing file, generating checksums and/or - * moving file - * @throws PidObjectExistsException When duplicate pid object is found - * @throws RuntimeException Thrown when there is an issue with permissions, illegal - * arguments (ex. empty pid) or null pointers - */ - ObjectInfo storeObject( + * @return ObjectMetadata object encapsulating file information + * @throws NoSuchAlgorithmException When additionalAlgorithm or checksumAlgorithm is + * invalid + * @throws IOException I/O Error when writing file, generating checksums + * and/or moving file + * @throws PidRefsFileExistsException If a pid refs file already exists, meaning the pid is + * already referencing a file. + * @throws RuntimeException Thrown when there is an issue with permissions, + * illegal arguments (ex. empty pid) or null pointers + * @throws InterruptedException When tagging pid and cid process is interrupted + */ + public ObjectMetadata storeObject( InputStream object, String pid, String additionalAlgorithm, String checksum, String checksumAlgorithm, long objSize - ) throws NoSuchAlgorithmException, IOException, PidObjectExistsException, RuntimeException; + ) throws NoSuchAlgorithmException, IOException, PidRefsFileExistsException, + RuntimeException, InterruptedException; + + /** + * @see #storeObject(InputStream, String, String, String, String, long) + */ + public ObjectMetadata storeObject(InputStream object) throws NoSuchAlgorithmException, + IOException, PidRefsFileExistsException, RuntimeException, InterruptedException; /** * @see #storeObject(InputStream, String, String, String, String, long) */ - ObjectInfo storeObject( + public ObjectMetadata storeObject( + InputStream object, String pid, String checksum, String checksumAlgorithm, + long objSize + ) throws NoSuchAlgorithmException, IOException, PidRefsFileExistsException, + RuntimeException, InterruptedException; + + /** + * @see #storeObject(InputStream, String, String, String, String, long) + */ + public ObjectMetadata storeObject( InputStream object, String pid, String checksum, String checksumAlgorithm - ) throws NoSuchAlgorithmException, IOException, PidObjectExistsException, RuntimeException; + ) throws NoSuchAlgorithmException, IOException, PidRefsFileExistsException, + RuntimeException, InterruptedException; /** * @see #storeObject(InputStream, String, String, String, String, long) */ - ObjectInfo storeObject(InputStream object, String pid, String additionalAlgorithm) - throws NoSuchAlgorithmException, IOException, PidObjectExistsException, - RuntimeException; + public ObjectMetadata storeObject( + InputStream object, String pid, String additionalAlgorithm + ) throws NoSuchAlgorithmException, IOException, PidRefsFileExistsException, + RuntimeException, InterruptedException; /** * @see #storeObject(InputStream, String, String, String, String, long) */ - ObjectInfo storeObject(InputStream object, String pid, long objSize) - throws NoSuchAlgorithmException, IOException, PidObjectExistsException, - RuntimeException; + public ObjectMetadata storeObject(InputStream object, String pid, long objSize) + throws NoSuchAlgorithmException, IOException, PidRefsFileExistsException, + RuntimeException, InterruptedException; + + /** + * Creates references that allow objects stored in HashStore to be discoverable. Retrieving, + * deleting or calculating a hex digest of an object is based on a pid argument; and to + * proceed, we must be able to find the object associated with the pid. + * + * @param pid Authority-based identifier + * @param cid Content-identifier (hash identifier) + * @throws IOException Failure to create tmp file + * @throws PidRefsFileExistsException When pid refs file already exists + * @throws NoSuchAlgorithmException When algorithm used to calculate pid refs address + * does not exist + * @throws FileNotFoundException If refs file is missing during verification + * @throws InterruptedException When tagObject is waiting to execute but is + * interrupted + */ + public void tagObject(String pid, String cid) throws IOException, + PidRefsFileExistsException, NoSuchAlgorithmException, FileNotFoundException, + InterruptedException; + + /** + * Confirms that an ObjectMetadata's content is equal to the given values. If it is not + * equal, it will return False - otherwise True. + * + * @param objectInfo ObjectMetadata object with values + * @param checksum Value of checksum to validate against + * @param checksumAlgorithm Algorithm of checksum submitted + * @param objSize Expected size of object to validate after storing + * @throws IOException An issue with deleting the object when there is a + * mismatch + * @throws NoSuchAlgorithmException If checksum algorithm (and its respective checksum) is + * not in objectInfo + * @throws IllegalArgumentException An expected value does not match + */ + public boolean verifyObject( + ObjectMetadata objectInfo, String checksum, String checksumAlgorithm, long objSize + ) throws IOException, NoSuchAlgorithmException, IllegalArgumentException; + + /** + * Checks whether an object referenced by a pid exists and returns the content identifier. + * + * @param pid Authority-based identifier + * @return Content identifier (cid) + * @throws NoSuchAlgorithmException When algorithm used to calculate pid refs + * file's absolute address is not valid + * @throws IOException Unable to read from a pid refs file or pid refs + * file does not exist + * @throws OrphanPidRefsFileException When pid refs file exists and the cid found + * inside does not exist. + * @throws PidNotFoundInCidRefsFileException When pid and cid ref files exists but the + * expected pid is not found in the cid refs file. + */ + public String findObject(String pid) throws NoSuchAlgorithmException, IOException, + OrphanPidRefsFileException, PidNotFoundInCidRefsFileException; /** * Adds/updates metadata (ex. `sysmeta`) to the HashStore by using a given InputStream, a @@ -101,14 +185,14 @@ ObjectInfo storeObject(InputStream object, String pid, long objSize) * @throws NoSuchAlgorithmException Algorithm used to calculate permanent address is not * supported */ - String storeMetadata(InputStream metadata, String pid, String formatId) throws IOException, - IllegalArgumentException, FileNotFoundException, InterruptedException, - NoSuchAlgorithmException; + public String storeMetadata(InputStream metadata, String pid, String formatId) + throws IOException, IllegalArgumentException, FileNotFoundException, + InterruptedException, NoSuchAlgorithmException; /** * @see #storeMetadata(InputStream, String, String) */ - String storeMetadata(InputStream metadata, String pid) throws IOException, + public String storeMetadata(InputStream metadata, String pid) throws IOException, IllegalArgumentException, FileNotFoundException, InterruptedException, NoSuchAlgorithmException; @@ -123,7 +207,7 @@ String storeMetadata(InputStream metadata, String pid) throws IOException, * @throws NoSuchAlgorithmException When algorithm used to calculate object address is not * supported */ - InputStream retrieveObject(String pid) throws IllegalArgumentException, + public InputStream retrieveObject(String pid) throws IllegalArgumentException, FileNotFoundException, IOException, NoSuchAlgorithmException; /** @@ -139,7 +223,14 @@ InputStream retrieveObject(String pid) throws IllegalArgumentException, * @throws NoSuchAlgorithmException When algorithm used to calculate metadata address is not * supported */ - InputStream retrieveMetadata(String pid, String formatId) throws IllegalArgumentException, + public InputStream retrieveMetadata(String pid, String formatId) + throws IllegalArgumentException, FileNotFoundException, IOException, + NoSuchAlgorithmException; + + /** + * @see #retrieveMetadata(String, String) + */ + public InputStream retrieveMetadata(String pid) throws IllegalArgumentException, FileNotFoundException, IOException, NoSuchAlgorithmException; /** @@ -149,12 +240,27 @@ InputStream retrieveMetadata(String pid, String formatId) throws IllegalArgument * @param pid Authority-based identifier * @throws IllegalArgumentException When pid is null or empty * @throws FileNotFoundException When requested pid has no associated object - * @throws IOException I/O error when deleting empty directories + * @throws IOException I/O error when deleting empty directories, + * modifying/deleting reference files * @throws NoSuchAlgorithmException When algorithm used to calculate object address is not * supported + * @throws InterruptedException When deletion synchronization is interrupted */ - void deleteObject(String pid) throws IllegalArgumentException, FileNotFoundException, - IOException, NoSuchAlgorithmException; + public void deleteObject(String pid) throws IllegalArgumentException, FileNotFoundException, + IOException, NoSuchAlgorithmException, InterruptedException; + + /** + * Delete an object based on its content identifier, with a flag to confirm intention. + * + * Note: This overload method should only be called when an issue arises during the storage + * of an object without a pid, and after verifying (via `verifyObject`) that the object is + * not what is expected. + * + * @param cid Content identifier + * @param deleteCid Boolean to confirm + */ + public void deleteObject(String cid, boolean deleteCid) throws IllegalArgumentException, + FileNotFoundException, IOException, NoSuchAlgorithmException; /** * Deletes a metadata document (ex. `sysmeta`) permanently from HashStore using a given @@ -163,12 +269,17 @@ void deleteObject(String pid) throws IllegalArgumentException, FileNotFoundExcep * @param pid Authority-based identifier * @param formatId Metadata namespace/format * @throws IllegalArgumentException When pid or formatId is null or empty - * @throws FileNotFoundException When requested pid has no metadata * @throws IOException I/O error when deleting empty directories * @throws NoSuchAlgorithmException When algorithm used to calculate object address is not * supported */ - void deleteMetadata(String pid, String formatId) throws IllegalArgumentException, + public void deleteMetadata(String pid, String formatId) throws IllegalArgumentException, + FileNotFoundException, IOException, NoSuchAlgorithmException; + + /** + * @see #deleteMetadata(String, String) + */ + public void deleteMetadata(String pid) throws IllegalArgumentException, FileNotFoundException, IOException, NoSuchAlgorithmException; /** @@ -184,6 +295,6 @@ void deleteMetadata(String pid, String formatId) throws IllegalArgumentException * @throws NoSuchAlgorithmException When algorithm used to calculate object address is not * supported */ - String getHexDigest(String pid, String algorithm) throws IllegalArgumentException, + public String getHexDigest(String pid, String algorithm) throws IllegalArgumentException, FileNotFoundException, IOException, NoSuchAlgorithmException; } diff --git a/src/main/java/org/dataone/hashstore/HashStoreClient.java b/src/main/java/org/dataone/hashstore/HashStoreClient.java index c12100a5..ceb31efb 100644 --- a/src/main/java/org/dataone/hashstore/HashStoreClient.java +++ b/src/main/java/org/dataone/hashstore/HashStoreClient.java @@ -29,7 +29,7 @@ import org.apache.commons.cli.Options; import org.apache.commons.cli.ParseException; import org.dataone.hashstore.exceptions.HashStoreFactoryException; -import org.dataone.hashstore.exceptions.PidObjectExistsException; +import org.dataone.hashstore.exceptions.PidRefsFileExistsException; import com.fasterxml.jackson.databind.ObjectMapper; import com.fasterxml.jackson.dataformat.yaml.YAMLFactory; @@ -124,13 +124,14 @@ public static void main(String[] args) throws Exception { String objType = cmd.getOptionValue("stype"); String originDirectory = cmd.getOptionValue("sdir"); String numObjects = cmd.getOptionValue("nobj"); + String sizeOfFilesToSkip = cmd.getOptionValue("gbskip"); FileHashStoreUtility.ensureNotNull(objType, "-stype", "HashStoreClient"); FileHashStoreUtility.ensureNotNull(originDirectory, "-sdir", "HashStoreClient"); FileHashStoreUtility.ensureNotNull( action, "-sts, -rav, -dfs", "HashStoreClient" ); - testWithKnbvm(action, objType, originDirectory, numObjects); + testWithKnbvm(action, objType, originDirectory, numObjects, sizeOfFilesToSkip); } else if (cmd.hasOption("getchecksum")) { String pid = cmd.getOptionValue("pid"); @@ -141,6 +142,13 @@ public static void main(String[] args) throws Exception { String hexDigest = hashStore.getHexDigest(pid, algo); System.out.println(hexDigest); + } else if (cmd.hasOption("findobject")) { + String pid = cmd.getOptionValue("pid"); + FileHashStoreUtility.ensureNotNull(pid, "-pid", "HashStoreClient"); + + String cid = hashStore.findObject(pid); + System.out.println(cid); + } else if (cmd.hasOption("storeobject")) { System.out.println("Storing object"); String pid = cmd.getOptionValue("pid"); @@ -160,7 +168,7 @@ public static void main(String[] args) throws Exception { if (cmd.hasOption("checksum_algo")) { checksum_algo = cmd.getOptionValue("checksum_algo"); } - long size = 0; + long size; if (cmd.hasOption("size")) { size = Long.parseLong(cmd.getOptionValue("size")); } else { @@ -168,7 +176,7 @@ public static void main(String[] args) throws Exception { } InputStream pidObjStream = Files.newInputStream(path); - ObjectInfo objInfo = hashStore.storeObject( + ObjectMetadata objInfo = hashStore.storeObject( pidObjStream, pid, additional_algo, checksum, checksum_algo, size ); pidObjStream.close(); @@ -274,6 +282,10 @@ private static Options addHashStoreClientOptions() { "getchecksum", "client_getchecksum", false, "Flag to get the hex digest of a data object in a HashStore." ); + options.addOption( + "findobject", "client_findobject", false, + "Flag to get the hex digest of a data object in a HashStore." + ); options.addOption( "storeobject", "client_storeobject", false, "Flag to store objs to a HashStore." ); @@ -316,9 +328,12 @@ private static Options addHashStoreClientOptions() { "knbvm", "knbvmtestadc", false, "(knbvm) Flag to specify testing with knbvm." ); options.addOption( - "nobj", "numberofobj", false, + "nobj", "numberofobj", true, "(knbvm) Option to specify number of objects to retrieve from a Metacat db." ); + options.addOption( + "gbskip", "gbsizetoskip", true, "(knbvm) Option to specify the size of objects to skip." + ); options.addOption( "sdir", "storedirectory", true, "(knbvm) Option to specify the directory of objects to convert." @@ -435,14 +450,17 @@ private static void initializeHashStore(Path storePath) throws HashStoreFactoryE /** * Entry point for working with test data found in knbvm (test.arcticdata.io) * - * @param actionFlag String representing a knbvm test-related method to call. - * @param objType "data" (objects) or "documents" (metadata). - * @param numObjects Number of rows to retrieve from metacat db, - * if null, will retrieve all rows. + * @param actionFlag String representing a knbvm test-related method to call. + * @param objType "data" (objects) or "documents" (metadata). + * @param originDir Directory path of given objType + * @param numObjects Number of rows to retrieve from metacat db, + * if null, will retrieve all rows. + * @param sizeOfFilesToSkip Size of files in GB to skip * @throws IOException Related to accessing config files or objects */ private static void testWithKnbvm( - String actionFlag, String objType, String originDir, String numObjects + String actionFlag, String objType, String originDir, String numObjects, + String sizeOfFilesToSkip ) throws IOException { // Load metacat db yaml // Note: In order to test with knbvm, you must manually create a `pgdb.yaml` file with the @@ -464,15 +482,22 @@ private static void testWithKnbvm( try { System.out.println("Connecting to metacat db."); + if (!objType.equals("object")) { + if (!objType.equals("metadata")) { + String errMsg = "HashStoreClient - objType must be 'object' or 'metadata'"; + throw new IllegalArgumentException(errMsg); + } + } + // Setup metacat db access Class.forName("org.postgresql.Driver"); // Force driver to register itself Connection connection = DriverManager.getConnection(url, user, password); Statement statement = connection.createStatement(); String sqlQuery = "SELECT identifier.guid, identifier.docid, identifier.rev," + " systemmetadata.object_format, systemmetadata.checksum," - + " systemmetadata.checksum_algorithm FROM identifier INNER JOIN systemmetadata" - + " ON identifier.guid = systemmetadata.guid ORDER BY identifier.guid" - + sqlLimitQuery + ";"; + + " systemmetadata.checksum_algorithm, systemmetadata.size FROM identifier" + + " INNER JOIN systemmetadata ON identifier.guid = systemmetadata.guid" + + " ORDER BY identifier.guid" + sqlLimitQuery + ";"; ResultSet resultSet = statement.executeQuery(sqlQuery); // For each row, get guid, docid, rev, checksum and checksum_algorithm @@ -486,26 +511,32 @@ private static void testWithKnbvm( String checksumAlgorithm = resultSet.getString("checksum_algorithm"); String formattedChecksumAlgo = formatAlgo(checksumAlgorithm); String formatId = resultSet.getString("object_format"); - - if (!objType.equals("object")) { - if (!objType.equals("metadata")) { - String errMsg = "HashStoreClient - objType must be 'object' or 'metadata'"; - throw new IllegalArgumentException(errMsg); + long setItemSize = resultSet.getLong("size"); + + boolean skipFile = false; + if (sizeOfFilesToSkip != null) { + // Calculate the size of requested gb to skip in bytes + long gbFilesToSkip = Integer.parseInt(sizeOfFilesToSkip) * (1024L * 1024 + * 1024); + if (setItemSize > gbFilesToSkip) { + skipFile = true; } } - Path setItemFilePath = Paths.get(originDir + "/" + docid + "." + rev); - if (Files.exists(setItemFilePath)) { - System.out.println( - "File exists (" + setItemFilePath + ")! Adding to resultObjList." - ); - Map resultObj = new HashMap<>(); - resultObj.put("pid", guid); - resultObj.put("algorithm", formattedChecksumAlgo); - resultObj.put("checksum", checksum); - resultObj.put("path", setItemFilePath.toString()); - resultObj.put("namespace", formatId); - resultObjList.add(resultObj); + if (!skipFile) { + Path setItemFilePath = Paths.get(originDir + "/" + docid + "." + rev); + if (Files.exists(setItemFilePath)) { + System.out.println( + "File exists (" + setItemFilePath + ")! Adding to resultObjList." + ); + Map resultObj = new HashMap<>(); + resultObj.put("pid", guid); + resultObj.put("algorithm", formattedChecksumAlgo); + resultObj.put("checksum", checksum); + resultObj.put("path", setItemFilePath.toString()); + resultObj.put("namespace", formatId); + resultObjList.add(resultObj); + } } } @@ -558,10 +589,12 @@ private static void storeObjsWithChecksumFromDb(List> result System.out.println("Storing object for guid: " + guid); hashStore.storeObject(objStream, guid, checksum, algorithm); - } catch (PidObjectExistsException poee) { + } catch (PidRefsFileExistsException poee) { String errMsg = "Unexpected Error: " + poee.fillInStackTrace(); try { - logExceptionToFile(guid, errMsg, "java/store_obj_errors/pidobjectexists"); + logExceptionToFile( + guid, errMsg, "java/store_obj_errors/PidRefsFileExistsException" + ); } catch (Exception e) { e.printStackTrace(); } diff --git a/src/main/java/org/dataone/hashstore/ObjectInfo.java b/src/main/java/org/dataone/hashstore/ObjectMetadata.java similarity index 55% rename from src/main/java/org/dataone/hashstore/ObjectInfo.java rename to src/main/java/org/dataone/hashstore/ObjectMetadata.java index db9fef17..9347a7c7 100644 --- a/src/main/java/org/dataone/hashstore/ObjectInfo.java +++ b/src/main/java/org/dataone/hashstore/ObjectMetadata.java @@ -4,49 +4,49 @@ /** * ObjectMetadata is a class that models a unique identifier for an object in the HashStore. It - * encapsulates information about a file's id, size, and associated hash digest values. By using - * ObjectMetadata objects, client code can easily obtain metadata of a store object in HashStore - * without needing to know the underlying file system details. + * encapsulates information about a file's content identifier (cid), size, and associated hash + * digest values. By using ObjectMetadata objects, client code can easily obtain metadata of a store + * object in HashStore without needing to know the underlying file system details. */ -public class ObjectInfo { - private final String id; +public class ObjectMetadata { + private final String cid; private final long size; private final Map hexDigests; /** * Creates a new instance of ObjectMetadata with the given properties. * - * @param id Unique identifier for the file + * @param cid Unique identifier for the file * @param size Size of stored file * @param hexDigests A map of hash algorithm names to their hex-encoded digest values for the * file */ - public ObjectInfo(String id, long size, Map hexDigests) { - this.id = id; + public ObjectMetadata(String cid, long size, Map hexDigests) { + this.cid = cid; this.size = size; this.hexDigests = hexDigests; } /** - * Return the id (address) of the file + * Return the cid (content identifier) of the file * - * @return id + * @return cid */ - public String getId() { - return id; + public String getCid() { + return cid; } /** * Return the size of the file * - * @return id + * @return size */ public long getSize() { return size; } /** - * Return a map of hex digests + * Return a map of hex digests (checksums) * * @return hexDigests */ diff --git a/src/main/java/org/dataone/hashstore/exceptions/OrphanPidRefsFileException.java b/src/main/java/org/dataone/hashstore/exceptions/OrphanPidRefsFileException.java new file mode 100644 index 00000000..dd42e99f --- /dev/null +++ b/src/main/java/org/dataone/hashstore/exceptions/OrphanPidRefsFileException.java @@ -0,0 +1,14 @@ +package org.dataone.hashstore.exceptions; + +import java.io.IOException; + +/** + * Custom exception class for FileHashStore when a pid reference file is found and the + * cid refs file that it is referencing does not contain the pid. + */ +public class OrphanPidRefsFileException extends IOException { + public OrphanPidRefsFileException(String message) { + super(message); + } + +} diff --git a/src/main/java/org/dataone/hashstore/exceptions/PidNotFoundInCidRefsFileException.java b/src/main/java/org/dataone/hashstore/exceptions/PidNotFoundInCidRefsFileException.java new file mode 100644 index 00000000..2cd9d4b6 --- /dev/null +++ b/src/main/java/org/dataone/hashstore/exceptions/PidNotFoundInCidRefsFileException.java @@ -0,0 +1,13 @@ +package org.dataone.hashstore.exceptions; + +import java.io.IOException; + +/** + * Custom exception class for FileHashStore when a pid is not found in a cid refs file. + */ +public class PidNotFoundInCidRefsFileException extends IOException { + public PidNotFoundInCidRefsFileException(String message) { + super(message); + } + +} diff --git a/src/main/java/org/dataone/hashstore/exceptions/PidObjectExistsException.java b/src/main/java/org/dataone/hashstore/exceptions/PidRefsFileExistsException.java similarity index 58% rename from src/main/java/org/dataone/hashstore/exceptions/PidObjectExistsException.java rename to src/main/java/org/dataone/hashstore/exceptions/PidRefsFileExistsException.java index a0f7b7f7..586d0f1f 100644 --- a/src/main/java/org/dataone/hashstore/exceptions/PidObjectExistsException.java +++ b/src/main/java/org/dataone/hashstore/exceptions/PidRefsFileExistsException.java @@ -5,8 +5,8 @@ /** * Custom exception class for FileHashStore pidObjects */ -public class PidObjectExistsException extends IOException { - public PidObjectExistsException(String message) { +public class PidRefsFileExistsException extends IOException { + public PidRefsFileExistsException(String message) { super(message); } diff --git a/src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java b/src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java index 523b95dd..a3afe905 100644 --- a/src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java +++ b/src/main/java/org/dataone/hashstore/filehashstore/FileHashStore.java @@ -7,6 +7,8 @@ import java.io.IOException; import java.io.InputStream; import java.io.OutputStreamWriter; +import java.nio.channels.FileChannel; +import java.nio.channels.FileLock; import java.nio.charset.StandardCharsets; import java.nio.file.AtomicMoveNotSupportedException; import java.nio.file.FileAlreadyExistsException; @@ -14,6 +16,7 @@ import java.nio.file.Path; import java.nio.file.Paths; import java.nio.file.StandardCopyOption; +import java.nio.file.StandardOpenOption; import java.security.MessageDigest; import java.security.NoSuchAlgorithmException; import java.util.ArrayList; @@ -23,7 +26,6 @@ import java.util.Map; import java.util.Objects; import java.util.Properties; -import java.util.Random; import com.fasterxml.jackson.databind.ObjectMapper; import com.fasterxml.jackson.dataformat.yaml.YAMLFactory; @@ -32,9 +34,11 @@ import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; -import org.dataone.hashstore.ObjectInfo; +import org.dataone.hashstore.ObjectMetadata; import org.dataone.hashstore.HashStore; -import org.dataone.hashstore.exceptions.PidObjectExistsException; +import org.dataone.hashstore.exceptions.OrphanPidRefsFileException; +import org.dataone.hashstore.exceptions.PidNotFoundInCidRefsFileException; +import org.dataone.hashstore.exceptions.PidRefsFileExistsException; /** * FileHashStore is a HashStore adapter class that manages the storage of objects and metadata to a @@ -46,6 +50,7 @@ public class FileHashStore implements HashStore { private static final int TIME_OUT_MILLISEC = 1000; private static final ArrayList objectLockedIds = new ArrayList<>(100); private static final ArrayList metadataLockedIds = new ArrayList<>(100); + private static final ArrayList referenceLockedCids = new ArrayList<>(100); private final Path STORE_ROOT; private final int DIRECTORY_DEPTH; private final int DIRECTORY_WIDTH; @@ -55,6 +60,10 @@ public class FileHashStore implements HashStore { private final String DEFAULT_METADATA_NAMESPACE; private final Path METADATA_STORE_DIRECTORY; private final Path METADATA_TMP_FILE_DIRECTORY; + private final Path REFS_STORE_DIRECTORY; + private final Path REFS_TMP_FILE_DIRECTORY; + private final Path REFS_PID_FILE_DIRECTORY; + private final Path REFS_CID_FILE_DIRECTORY; public static final String HASHSTORE_YAML = "hashstore.yaml"; @@ -128,19 +137,28 @@ public FileHashStore(Properties hashstoreProperties) throws IllegalArgumentExcep DIRECTORY_WIDTH = storeWidth; OBJECT_STORE_ALGORITHM = storeAlgorithm; DEFAULT_METADATA_NAMESPACE = storeMetadataNamespace; - // Resolve object/metadata directories + // Resolve object/metadata/refs directories OBJECT_STORE_DIRECTORY = storePath.resolve("objects"); METADATA_STORE_DIRECTORY = storePath.resolve("metadata"); + REFS_STORE_DIRECTORY = storePath.resolve("refs"); // Resolve tmp object/metadata directory paths, this is where objects are // created before they are moved to their permanent address OBJECT_TMP_FILE_DIRECTORY = OBJECT_STORE_DIRECTORY.resolve("tmp"); METADATA_TMP_FILE_DIRECTORY = METADATA_STORE_DIRECTORY.resolve("tmp"); + REFS_TMP_FILE_DIRECTORY = REFS_STORE_DIRECTORY.resolve("tmp"); + REFS_PID_FILE_DIRECTORY = REFS_STORE_DIRECTORY.resolve("pid"); + REFS_CID_FILE_DIRECTORY = REFS_STORE_DIRECTORY.resolve("cid"); + try { // Physically create object & metadata store and tmp directories Files.createDirectories(OBJECT_STORE_DIRECTORY); Files.createDirectories(METADATA_STORE_DIRECTORY); + Files.createDirectories(REFS_STORE_DIRECTORY); Files.createDirectories(OBJECT_TMP_FILE_DIRECTORY); Files.createDirectories(METADATA_TMP_FILE_DIRECTORY); + Files.createDirectories(REFS_TMP_FILE_DIRECTORY); + Files.createDirectories(REFS_PID_FILE_DIRECTORY); + Files.createDirectories(REFS_CID_FILE_DIRECTORY); logFileHashStore.debug("FileHashStore - Created store and store tmp directories."); } catch (IOException ioe) { @@ -318,7 +336,6 @@ protected void writeHashStoreYaml(String yamlString) throws IOException { new OutputStreamWriter(Files.newOutputStream(hashstoreYaml), StandardCharsets.UTF_8) )) { writer.write(yamlString); - writer.close(); } catch (IOException ioe) { logFileHashStore.fatal( @@ -399,10 +416,11 @@ protected String buildHashStoreYamlString( // HashStore Public API Methods @Override - public ObjectInfo storeObject( + public ObjectMetadata storeObject( InputStream object, String pid, String additionalAlgorithm, String checksum, String checksumAlgorithm, long objSize - ) throws NoSuchAlgorithmException, IOException, PidObjectExistsException, RuntimeException { + ) throws NoSuchAlgorithmException, IOException, RuntimeException, InterruptedException, + PidRefsFileExistsException { logFileHashStore.debug( "FileHashStore.storeObject - Called to store object for pid: " + pid ); @@ -425,7 +443,6 @@ public ObjectInfo storeObject( validateAlgorithm(checksumAlgorithm); } if (objSize != -1) { - System.out.println("Checking not negative..."); FileHashStoreUtility.checkNotNegativeOrZero(objSize, "storeObject"); } @@ -437,10 +454,11 @@ public ObjectInfo storeObject( /** * Method to synchronize storing objects with FileHashStore */ - private ObjectInfo syncPutObject( + private ObjectMetadata syncPutObject( InputStream object, String pid, String additionalAlgorithm, String checksum, String checksumAlgorithm, long objSize - ) throws NoSuchAlgorithmException, PidObjectExistsException, IOException, RuntimeException { + ) throws NoSuchAlgorithmException, PidRefsFileExistsException, IOException, RuntimeException, + InterruptedException { // Lock pid for thread safety, transaction control and atomic writing // A pid can only be stored once and only once, subsequent calls will // be accepted but will be rejected if pid hash object exists @@ -465,9 +483,12 @@ private ObjectInfo syncPutObject( + ". checksumAlgorithm: " + checksumAlgorithm ); // Store object - ObjectInfo objInfo = putObject( + ObjectMetadata objInfo = putObject( object, pid, additionalAlgorithm, checksum, checksumAlgorithm, objSize ); + // Tag object + String cid = objInfo.getCid(); + tagObject(pid, cid); logFileHashStore.info( "FileHashStore.syncPutObject - Object stored for pid: " + pid + ". Permanent address: " + getRealPath(pid, "object", null) @@ -480,11 +501,11 @@ private ObjectInfo syncPutObject( logFileHashStore.error(errMsg); throw nsae; - } catch (PidObjectExistsException poee) { + } catch (PidRefsFileExistsException prfee) { String errMsg = "FileHashStore.syncPutObject - Unable to store object for pid: " + pid - + ". PidObjectExistsException: " + poee.getMessage(); + + ". PidRefsFileExistsException: " + prfee.getMessage(); logFileHashStore.error(errMsg); - throw poee; + throw prfee; } catch (IOException ioe) { // Covers AtomicMoveNotSupportedException, FileNotFoundException @@ -513,25 +534,48 @@ private ObjectInfo syncPutObject( } /** - * Overload method for storeObject with an additionalAlgorithm + * Overload method for storeObject with just an InputStream */ @Override - public ObjectInfo storeObject(InputStream object, String pid, String additionalAlgorithm) - throws NoSuchAlgorithmException, IOException, PidObjectExistsException, RuntimeException { - FileHashStoreUtility.ensureNotNull( - additionalAlgorithm, "additionalAlgorithm", "storeObject" - ); + public ObjectMetadata storeObject(InputStream object) throws NoSuchAlgorithmException, + IOException, PidRefsFileExistsException, RuntimeException { + // 'putObject' is called directly to bypass the pid synchronization implemented to + // efficiently handle duplicate object store requests. Since there is no pid, calling + // 'storeObject' would unintentionally create a bottleneck for all requests without a + // pid (they would be executed sequentially). This scenario occurs when metadata about + // the object (ex. form data including the pid, checksum, checksum algorithm, etc.) is + // unavailable. + // + // Note: This method does not tag the object to make it discoverable, so the client can + // call 'verifyObject' (optional) to check that the object is valid, and 'tagObject' + // (required) to create the reference files needed to associate the respective pids/cids. + return putObject(object, "HashStoreNoPid", null, null, null, -1); + } - return storeObject(object, pid, additionalAlgorithm, null, null, -1); + + /** + * Overload method for storeObject with size and a checksum & checksumAlgorithm. + */ + @Override + public ObjectMetadata storeObject( + InputStream object, String pid, String checksum, String checksumAlgorithm, long objSize + ) throws NoSuchAlgorithmException, IOException, PidRefsFileExistsException, RuntimeException, + InterruptedException { + FileHashStoreUtility.ensureNotNull(checksum, "checksum", "storeObject"); + FileHashStoreUtility.ensureNotNull(checksumAlgorithm, "checksumAlgorithm", "storeObject"); + FileHashStoreUtility.checkNotNegativeOrZero(objSize, "storeObject"); + + return storeObject(object, pid, null, checksum, checksumAlgorithm, objSize); } /** * Overload method for storeObject with just a checksum and checksumAlgorithm */ @Override - public ObjectInfo storeObject( + public ObjectMetadata storeObject( InputStream object, String pid, String checksum, String checksumAlgorithm - ) throws NoSuchAlgorithmException, IOException, PidObjectExistsException, RuntimeException { + ) throws NoSuchAlgorithmException, IOException, PidRefsFileExistsException, RuntimeException, + InterruptedException { FileHashStoreUtility.ensureNotNull(checksum, "checksum", "storeObject"); FileHashStoreUtility.ensureNotNull(checksumAlgorithm, "checksumAlgorithm", "storeObject"); @@ -539,16 +583,209 @@ public ObjectInfo storeObject( } /** - * Overload method for storeObject with size of object to validate + * Overload method for storeObject with just the size of object to validate */ @Override - public ObjectInfo storeObject(InputStream object, String pid, long objSize) - throws NoSuchAlgorithmException, IOException, PidObjectExistsException, RuntimeException { + public ObjectMetadata storeObject(InputStream object, String pid, long objSize) + throws NoSuchAlgorithmException, IOException, PidRefsFileExistsException, RuntimeException, + InterruptedException { FileHashStoreUtility.checkNotNegativeOrZero(objSize, "storeObject"); return storeObject(object, pid, null, null, null, objSize); } + /** + * Overload method for storeObject with an additionalAlgorithm + */ + @Override + public ObjectMetadata storeObject(InputStream object, String pid, String additionalAlgorithm) + throws NoSuchAlgorithmException, IOException, PidRefsFileExistsException, RuntimeException, + InterruptedException { + FileHashStoreUtility.ensureNotNull( + additionalAlgorithm, "additionalAlgorithm", "storeObject" + ); + + return storeObject(object, pid, additionalAlgorithm, null, null, -1); + } + + @Override + public boolean verifyObject( + ObjectMetadata objectInfo, String checksum, String checksumAlgorithm, long objSize + ) throws IOException, NoSuchAlgorithmException, IllegalArgumentException { + logFileHashStore.debug( + "FileHashStore.verifyObject - Called to verify object with id: " + objectInfo.getCid() + ); + FileHashStoreUtility.ensureNotNull(objectInfo, "objectInfo", "verifyObject"); + FileHashStoreUtility.ensureNotNull(checksum, "checksum", "verifyObject"); + FileHashStoreUtility.ensureNotNull(checksumAlgorithm, "checksumAlgorithm", "verifyObject"); + FileHashStoreUtility.checkNotNegativeOrZero(objSize, "verifyObject"); + + Map hexDigests = objectInfo.getHexDigests(); + String digestFromHexDigests = hexDigests.get(checksumAlgorithm); + long objInfoRetrievedSize = objectInfo.getSize(); + String objCid = objectInfo.getCid(); + + if (objInfoRetrievedSize != objSize) { + logFileHashStore.info( + "FileHashStore.verifyObject - Object size invalid for cid: " + objCid + + ". Expected size: " + objSize + ". Actual size: " + objInfoRetrievedSize + ); + return false; + + } else if (!digestFromHexDigests.equals(checksum)) { + logFileHashStore.info( + "FileHashStore.verifyObject - Object content invalid for cid: " + objCid + + ". Expected checksum: " + checksum + ". Actual checksum calculated: " + + digestFromHexDigests + " (algorithm: " + checksumAlgorithm + ")" + ); + return false; + + } else { + logFileHashStore.info( + "FileHashStore.verifyObject - Object has been validated for cid: " + objCid + + ". Expected checksum: " + checksum + ". Actual checksum calculated: " + + digestFromHexDigests + " (algorithm: " + checksumAlgorithm + ")" + ); + return true; + } + } + + @Override + public void tagObject(String pid, String cid) throws IOException, PidRefsFileExistsException, + NoSuchAlgorithmException, FileNotFoundException, InterruptedException { + logFileHashStore.debug( + "FileHashStore.tagObject - Called to tag cid (" + cid + ") with pid: " + pid + ); + // Validate input parameters + FileHashStoreUtility.ensureNotNull(pid, "pid", "tagObject"); + FileHashStoreUtility.ensureNotNull(cid, "cid", "tagObject"); + FileHashStoreUtility.checkForEmptyString(pid, "pid", "tagObject"); + FileHashStoreUtility.checkForEmptyString(cid, "cid", "tagObject"); + + synchronized (referenceLockedCids) { + while (referenceLockedCids.contains(cid)) { + try { + referenceLockedCids.wait(TIME_OUT_MILLISEC); + + } catch (InterruptedException ie) { + String errMsg = + "FileHashStore.tagObject - referenceLockedCids lock was interrupted while" + + " waiting to tag pid: " + pid + " and cid: " + cid + + ". InterruptedException: " + ie.getMessage(); + logFileHashStore.error(errMsg); + throw new InterruptedException(errMsg); + } + } + logFileHashStore.debug( + "FileHashStore.tagObject - Synchronizing referenceLockedCids for cid: " + cid + ); + referenceLockedCids.add(cid); + } + + try { + Path absPidRefsPath = getRealPath(pid, "refs", "pid"); + Path absCidRefsPath = getRealPath(cid, "refs", "cid"); + + // Check that pid refs file doesn't exist yet + if (Files.exists(absPidRefsPath)) { + String errMsg = "FileHashStore.tagObject - pid refs file already exists for pid: " + + pid + ". A pid can only reference one cid."; + logFileHashStore.error(errMsg); + throw new PidRefsFileExistsException(errMsg); + + } else if (Files.exists(absCidRefsPath)) { + // Only update cid refs file if pid is not in the file + boolean pidFoundInCidRefFiles = isPidInCidRefsFile(pid, absCidRefsPath); + if (!pidFoundInCidRefFiles) { + updateCidRefsFiles(pid, absCidRefsPath); + } + // Get the pid refs file + File pidRefsTmpFile = writePidRefsFile(cid); + File absPathPidRefsFile = absPidRefsPath.toFile(); + move(pidRefsTmpFile, absPathPidRefsFile, "refs"); + // Verify tagging process, this throws exceptions if there's an issue + verifyHashStoreRefsFiles(pid, cid, absPidRefsPath, absCidRefsPath); + + logFileHashStore.info( + "FileHashStore.tagObject - Object with cid: " + cid + + " has been updated and tagged successfully with pid: " + pid + ); + + } else { + // Get pid and cid refs files + File pidRefsTmpFile = writePidRefsFile(cid); + File cidRefsTmpFile = writeCidRefsFile(pid); + // Move refs files to permanent location + File absPathPidRefsFile = absPidRefsPath.toFile(); + File absPathCidRefsFile = absCidRefsPath.toFile(); + move(pidRefsTmpFile, absPathPidRefsFile, "refs"); + move(cidRefsTmpFile, absPathCidRefsFile, "refs"); + // Verify tagging process, this throws exceptions if there's an issue + verifyHashStoreRefsFiles(pid, cid, absPidRefsPath, absCidRefsPath); + + logFileHashStore.info( + "FileHashStore.tagObject - Object with cid: " + cid + + " has been tagged successfully with pid: " + pid + ); + } + + } finally { + // Release lock + synchronized (referenceLockedCids) { + logFileHashStore.debug( + "FileHashStore.tagObject - Releasing referenceLockedCids for cid: " + cid + ); + referenceLockedCids.remove(cid); + referenceLockedCids.notifyAll(); + } + } + } + + @Override + public String findObject(String pid) throws NoSuchAlgorithmException, IOException { + logFileHashStore.debug("FileHashStore.findObject - Called to find object for pid: " + pid); + FileHashStoreUtility.ensureNotNull(pid, "pid", "findObject"); + FileHashStoreUtility.checkForEmptyString(pid, "pid", "findObject"); + + // Get path of the pid references file + Path absPidRefsPath = getRealPath(pid, "refs", "pid"); + + if (Files.exists(absPidRefsPath)) { + String cid = new String(Files.readAllBytes(absPidRefsPath)); + Path absCidRefsPath = getRealPath(cid, "refs", "cid"); + + // Throw exception if the cid refs file doesn't exist + if (!Files.exists(absCidRefsPath)) { + String errMsg = + "FileHashStore.deleteObject - Cid refs file does not exist for cid: " + cid + + " with address: " + absCidRefsPath + ", but pid refs file exists."; + logFileHashStore.error(errMsg); + throw new OrphanPidRefsFileException(errMsg); + } + // If the pid is found in the expected cid refs file, return it + if (isPidInCidRefsFile(pid, absCidRefsPath)) { + logFileHashStore.info( + "FileHashStore.findObject - Cid (" + cid + ") found for pid:" + pid + ); + return cid; + + } else { + String errMsg = "FileHashStore.deleteObject - Pid refs file exists, but pid (" + pid + + ") not found in cid refs file for cid: " + cid + " with address: " + + absCidRefsPath; + logFileHashStore.error(errMsg); + throw new PidNotFoundInCidRefsFileException(errMsg); + } + + } else { + String errMsg = "FileHashStore.findObject - Unable to find cid for pid: " + pid + + ". Pid refs file does not exist at: " + absPidRefsPath; + logFileHashStore.error(errMsg); + // Create custom exception class + throw new FileNotFoundException(errMsg); + } + } + @Override public String storeMetadata(InputStream metadata, String pid, String formatId) throws IOException, FileNotFoundException, IllegalArgumentException, InterruptedException, @@ -595,8 +832,8 @@ private String syncPutMetadata(InputStream metadata, String pid, String checkedF "FileHashStore.storeMetadata - Metadata lock was interrupted while" + " storing metadata for: " + pid + " and formatId: " + checkedFormatId + ". InterruptedException: " + ie.getMessage(); - logFileHashStore.warn(errMsg); - throw ie; + logFileHashStore.error(errMsg); + throw new InterruptedException(errMsg); } } logFileHashStore.debug( @@ -740,6 +977,7 @@ public InputStream retrieveMetadata(String pid, String formatId) /** * Overload method for retrieveMetadata with default metadata namespace */ + @Override public InputStream retrieveMetadata(String pid) throws IllegalArgumentException, FileNotFoundException, IOException, NoSuchAlgorithmException { logFileHashStore.debug( @@ -782,9 +1020,42 @@ public InputStream retrieveMetadata(String pid) throws IllegalArgumentException, return metadataCidInputStream; } + @Override + public void deleteObject(String cid, boolean deleteCid) throws IllegalArgumentException, + FileNotFoundException, IOException, NoSuchAlgorithmException { + logFileHashStore.debug( + "FileHashStore.deleteObject - Called to delete object with content identifeir: " + cid + ); + if (deleteCid) { + // Validate input parameters + FileHashStoreUtility.ensureNotNull(cid, "cid", "deleteObject"); + FileHashStoreUtility.checkForEmptyString(cid, "cid", "deleteObject"); + + // Confirm that the object called to delete does not have a cid reference file + Path absCidRefsPath = getRealPath(cid, "refs", "cid"); + if (Files.exists(absCidRefsPath)) { + // The cid is referenced by pids, do not delete. + return; + + } else { + // Get permanent address of the actual cid + String objShardString = FileHashStoreUtility.getHierarchicalPathString( + DIRECTORY_DEPTH, DIRECTORY_WIDTH, cid + ); + Path expectedRealPath = OBJECT_STORE_DIRECTORY.resolve(objShardString); + + // If file exists, delete it. + if (Files.exists(expectedRealPath)) { + Files.delete(expectedRealPath); + } + } + } + } + @Override public void deleteObject(String pid) throws IllegalArgumentException, FileNotFoundException, - IOException, NoSuchAlgorithmException { + IOException, NoSuchAlgorithmException, InterruptedException, + PidNotFoundInCidRefsFileException { logFileHashStore.debug( "FileHashStore.deleteObject - Called to delete object for pid: " + pid ); @@ -792,28 +1063,106 @@ public void deleteObject(String pid) throws IllegalArgumentException, FileNotFou FileHashStoreUtility.ensureNotNull(pid, "pid", "deleteObject"); FileHashStoreUtility.checkForEmptyString(pid, "pid", "deleteObject"); - // Get permanent address of the pid by calculating its sha-256 hex digest - Path objRealPath = getRealPath(pid, "object", null); + // First, find the object and evaluate its state + String cid; + try { + cid = findObject(pid); + + } catch (OrphanPidRefsFileException oprfe) { + // Delete the pid refs file and return, nothing else to delete. + Path absPidRefsPath = getRealPath(pid, "refs", "pid"); + Files.delete(absPidRefsPath); + + String warnMsg = "FileHashStore.deleteObject - Cid refs file does not exist for pid: " + + pid + ". Deleted orphan pid refs file."; + logFileHashStore.warn(warnMsg); + return; + + } catch (PidNotFoundInCidRefsFileException pnficrfe) { + // Delete pid refs file and return, nothing else to delete + Path absPidRefsPath = getRealPath(pid, "refs", "pid"); + Files.delete(absPidRefsPath); + + String warnMsg = + "FileHashStore.deleteObject - Pid not found in expected cid refs file for pid: " + + pid + ". Deleted orphan pid refs file."; + logFileHashStore.warn(warnMsg); + return; + } - // Check to see if object exists - if (!Files.exists(objRealPath)) { - String errMsg = "FileHashStore.deleteObject - File does not exist for pid: " + pid - + " with object address: " + objRealPath; - logFileHashStore.warn(errMsg); - throw new FileNotFoundException(errMsg); + // If cid has been retrieved without any errors, proceed with second stage of deletion. + synchronized (referenceLockedCids) { + while (referenceLockedCids.contains(cid)) { + try { + referenceLockedCids.wait(TIME_OUT_MILLISEC); + + } catch (InterruptedException ie) { + String errMsg = + "FileHashStore.deleteObject - referenceLockedCids lock was interrupted while" + + " waiting to delete object with cid: " + cid + + ". InterruptedException: " + ie.getMessage(); + logFileHashStore.error(errMsg); + throw new InterruptedException(errMsg); + } + } + logFileHashStore.debug( + "FileHashStore.deleteObject - Synchronizing referenceLockedCids for cid: " + cid + ); + referenceLockedCids.add(cid); + } + + try { + // Get permanent address of the pid by calculating its sha-256 hex digest + Path objRealPath = getRealPath(pid, "object", null); + // Get the path to the cid refs file to work with + Path absCidRefsPath = getRealPath(cid, "refs", "cid"); + + if (!Files.exists(objRealPath)) { + // Throw exception if object doesn't exist + String errMsg = "FileHashStore.deleteObject - File does not exist for pid: " + pid + + " with object address: " + objRealPath; + logFileHashStore.error(errMsg); + throw new FileNotFoundException(errMsg); + + } else { + // Proceed to delete the reference files and object + // Delete pid reference file + deletePidRefsFile(pid); + // Remove pid from cid refs file + deleteCidRefsPid(pid, absCidRefsPath); + // Delete obj and cid refs file only if the cid refs file is empty + if (Files.size(absCidRefsPath) == 0) { + // Delete empty cid refs file + Files.delete(absCidRefsPath); + // Delete actual object + Files.delete(objRealPath); + } else { + String warnMsg = "FileHashStore.deleteObject - cid referenced by pid: " + pid + + " is not empty (references exist for the cid). Skipping object deletion."; + logFileHashStore.warn(warnMsg); + } + logFileHashStore.info( + "FileHashStore.deleteObject - File and references deleted for: " + pid + + " with object address: " + objRealPath + ); + // TODO: Discuss where deleteObject should also remove all default system metadata + } + } finally { + // Release lock + synchronized (referenceLockedCids) { + logFileHashStore.debug( + "FileHashStore.deleteObject - Releasing referenceLockedCids for cid: " + cid + ); + referenceLockedCids.remove(cid); + referenceLockedCids.notifyAll(); + } } - // Proceed to delete - Files.delete(objRealPath); - logFileHashStore.info( - "FileHashStore.deleteObject - File deleted for: " + pid + " with object address: " - + objRealPath - ); } @Override public void deleteMetadata(String pid, String formatId) throws IllegalArgumentException, - FileNotFoundException, IOException, NoSuchAlgorithmException { + IOException, NoSuchAlgorithmException { logFileHashStore.debug( "FileHashStore.deleteMetadata - Called to delete metadata for pid: " + pid ); @@ -826,25 +1175,26 @@ public void deleteMetadata(String pid, String formatId) throws IllegalArgumentEx // Get permanent address of the pid by calculating its sha-256 hex digest Path metadataCidPath = getRealPath(pid, "metadata", formatId); - // Check to see if object exists if (!Files.exists(metadataCidPath)) { String errMsg = "FileHashStore.deleteMetadata - File does not exist for pid: " + pid + " with metadata address: " + metadataCidPath; logFileHashStore.warn(errMsg); - throw new FileNotFoundException(errMsg); - } + return; - // Proceed to delete - Files.delete(metadataCidPath); - logFileHashStore.info( - "FileHashStore.deleteMetadata - File deleted for: " + pid + " with metadata address: " - + metadataCidPath - ); + } else { + // Proceed to delete + Files.delete(metadataCidPath); + logFileHashStore.info( + "FileHashStore.deleteMetadata - File deleted for: " + pid + + " with metadata address: " + metadataCidPath + ); + } } /** * Overload method for deleteMetadata with default metadata namespace */ + @Override public void deleteMetadata(String pid) throws IllegalArgumentException, FileNotFoundException, IOException, NoSuchAlgorithmException { deleteMetadata(pid, DEFAULT_METADATA_NAMESPACE); @@ -861,24 +1211,33 @@ public String getHexDigest(String pid, String algorithm) throws NoSuchAlgorithmE FileHashStoreUtility.checkForEmptyString(pid, "pid", "getHexDigest"); validateAlgorithm(algorithm); - // Get permanent address of the pid by calculating its sha-256 hex digest - Path objRealPath = getRealPath(pid, "object", null); + // Find the content identifier + if (algorithm.equals(OBJECT_STORE_ALGORITHM)) { + String cid = findObject(pid); + return cid; - // Check to see if object exists - if (!Files.exists(objRealPath)) { - String errMsg = "FileHashStore.getHexDigest - File does not exist for pid: " + pid - + " with object address: " + objRealPath; - logFileHashStore.warn(errMsg); - throw new FileNotFoundException(errMsg); - } + } else { + // Get permanent address of the pid + Path objRealPath = getRealPath(pid, "object", null); - InputStream dataStream = Files.newInputStream(objRealPath); - String mdObjectHexDigest = FileHashStoreUtility.calculateHexDigest(dataStream, algorithm); - logFileHashStore.info( - "FileHashStore.getHexDigest - Hex digest calculated for pid: " + pid - + ", with hex digest value: " + mdObjectHexDigest - ); - return mdObjectHexDigest; + // Check to see if object exists + if (!Files.exists(objRealPath)) { + String errMsg = "FileHashStore.getHexDigest - File does not exist for pid: " + pid + + " with object address: " + objRealPath; + logFileHashStore.warn(errMsg); + throw new FileNotFoundException(errMsg); + } + + InputStream dataStream = Files.newInputStream(objRealPath); + String mdObjectHexDigest = FileHashStoreUtility.calculateHexDigest( + dataStream, algorithm + ); + logFileHashStore.info( + "FileHashStore.getHexDigest - Hex digest calculated for pid: " + pid + + ", with hex digest value: " + mdObjectHexDigest + ); + return mdObjectHexDigest; + } } // FileHashStore Core & Supporting Methods @@ -899,7 +1258,7 @@ public String getHexDigest(String pid, String algorithm) throws NoSuchAlgorithmE * @param checksum Value of checksum to validate against * @param checksumAlgorithm Algorithm of checksum submitted * @param objSize Expected size of object to validate after storing - * @return 'ObjectInfo' object that contains the file id, size, and a checksum map based on + * @return 'ObjectMetadata' object that contains the file id, size, and a checksum map based on * the default algorithm list. * @throws IOException I/O Error when writing file, generating checksums, * moving file or deleting tmpFile upon duplicate found @@ -908,24 +1267,20 @@ public String getHexDigest(String pid, String algorithm) throws NoSuchAlgorithmE * @throws SecurityException Insufficient permissions to read/access files or when * generating/writing to a file * @throws FileNotFoundException tmpFile not found during store - * @throws PidObjectExistsException Duplicate object in store exists + * @throws PidRefsFileExistsException If the given pid already references an object * @throws IllegalArgumentException When signature values are empty (checksum, pid, * etc.) * @throws NullPointerException Arguments are null for pid or object * @throws AtomicMoveNotSupportedException When attempting to move files across file systems */ - protected ObjectInfo putObject( + protected ObjectMetadata putObject( InputStream object, String pid, String additionalAlgorithm, String checksum, String checksumAlgorithm, long objSize ) throws IOException, NoSuchAlgorithmException, SecurityException, FileNotFoundException, - PidObjectExistsException, IllegalArgumentException, NullPointerException, + PidRefsFileExistsException, IllegalArgumentException, NullPointerException, AtomicMoveNotSupportedException { logFileHashStore.debug("FileHashStore.putObject - Called to put object for pid: " + pid); - // Begin input validation - FileHashStoreUtility.ensureNotNull(object, "object", "putObject"); - FileHashStoreUtility.ensureNotNull(pid, "pid", "putObject"); - FileHashStoreUtility.checkForEmptyString(pid, "pid", "putObject"); // Validate algorithms if not null or empty, throws exception if not supported if (additionalAlgorithm != null) { FileHashStoreUtility.checkForEmptyString( @@ -939,6 +1294,9 @@ protected ObjectInfo putObject( ); validateAlgorithm(checksumAlgorithm); } + if (checksum != null) { + FileHashStoreUtility.checkForEmptyString(checksum, "checksum", "putObject"); + } if (objSize != -1) { FileHashStoreUtility.checkNotNegativeOrZero(objSize, "putObject"); } @@ -946,24 +1304,10 @@ protected ObjectInfo putObject( // If validation is desired, checksumAlgorithm and checksum must both be present boolean requestValidation = verifyChecksumParameters(checksum, checksumAlgorithm); - // Gather ObjectInfo elements and prepare object permanent address - String objectCid = getPidHexDigest(pid, OBJECT_STORE_ALGORITHM); - String objShardString = getHierarchicalPathString( - DIRECTORY_DEPTH, DIRECTORY_WIDTH, objectCid - ); - Path objRealPath = OBJECT_STORE_DIRECTORY.resolve(objShardString); - - // If file (pid hash) exists, reject request immediately - if (Files.exists(objRealPath)) { - String errMsg = "FileHashStore.putObject - File already exists for pid: " + pid - + ". Object address: " + objRealPath + ". Aborting request."; - logFileHashStore.warn(errMsg); - throw new PidObjectExistsException(errMsg); - } - // Generate tmp file and write to it logFileHashStore.debug("FileHashStore.putObject - Generating tmpFile"); - File tmpFile = generateTmpFile("tmp", OBJECT_TMP_FILE_DIRECTORY); + File tmpFile = FileHashStoreUtility.generateTmpFile("tmp", OBJECT_TMP_FILE_DIRECTORY); + Path tmpFilePath = tmpFile.toPath(); Map hexDigests; try { hexDigests = writeToTmpFileAndGenerateChecksums( @@ -988,19 +1332,34 @@ protected ObjectInfo putObject( // Validate object if checksum and checksum algorithm is passed validateTmpObject( - requestValidation, checksum, checksumAlgorithm, tmpFile, hexDigests, objSize, + requestValidation, checksum, checksumAlgorithm, tmpFilePath, hexDigests, objSize, storedObjFileSize ); - // Move object - File permFile = objRealPath.toFile(); - move(tmpFile, permFile, "object"); - logFileHashStore.debug( - "FileHashStore.putObject - Move object success, permanent address: " + objRealPath + // Gather the elements to form the permanent address + String objectCid = hexDigests.get(OBJECT_STORE_ALGORITHM); + String objShardString = FileHashStoreUtility.getHierarchicalPathString( + DIRECTORY_DEPTH, DIRECTORY_WIDTH, objectCid ); + Path objRealPath = OBJECT_STORE_DIRECTORY.resolve(objShardString); - // Create ObjectInfo to return with pertinent data - return new ObjectInfo(objectCid, storedObjFileSize, hexDigests); + // Confirm that the object does not yet exist, delete tmpFile if so + if (Files.exists(objRealPath)) { + String errMsg = "FileHashStore.putObject - File already exists for pid: " + pid + + ". Object address: " + objRealPath + ". Deleting temporary file."; + logFileHashStore.warn(errMsg); + tmpFile.delete(); + } else { + // Move object + File permFile = objRealPath.toFile(); + move(tmpFile, permFile, "object"); + logFileHashStore.debug( + "FileHashStore.putObject - Move object success, permanent address: " + objRealPath + ); + } + + // Create ObjectMetadata to return with pertinent data + return new ObjectMetadata(objectCid, storedObjFileSize, hexDigests); } /** @@ -1013,30 +1372,38 @@ protected ObjectInfo putObject( * @param checksumAlgorithm Hash algorithm of checksum value * @param tmpFile tmpFile that has been written * @param hexDigests Map of the hex digests available to check with - * @throws NoSuchAlgorithmException When algorithm supplied is not supported - * @throws IOException When tmpFile fails to be deleted + * @param tmpFile Path to the file that is being evaluated + * @param hexDigests Map of the hex digests to parse data from + * @param objSize Expected size of object + * @param storedObjFileSize Actual size of object stored + * @return + * @throws NoSuchAlgorithmException + * @throws IOException */ - private void validateTmpObject( - boolean requestValidation, String checksum, String checksumAlgorithm, File tmpFile, + private boolean validateTmpObject( + boolean requestValidation, String checksum, String checksumAlgorithm, Path tmpFile, Map hexDigests, long objSize, long storedObjFileSize ) throws NoSuchAlgorithmException, IOException { if (objSize > 0) { if (objSize != storedObjFileSize) { // Delete tmp File - boolean deleteStatus = tmpFile.delete(); - if (!deleteStatus) { + try { + Files.delete(tmpFile); + + } catch (Exception ge) { String errMsg = "FileHashStore.validateTmpObject - objSize given is not equal to the" - + " stored object size. ObjSize: " + objSize + ". storedObjFileSize:" - + storedObjFileSize + ". Failed to delete tmpFile: " + tmpFile - .getName(); + + " stored object size. ObjSize: " + objSize + ". storedObjFileSize: " + + storedObjFileSize + ". Failed to delete tmpFile: " + tmpFile + ". " + + ge.getMessage(); logFileHashStore.error(errMsg); throw new IOException(errMsg); } + String errMsg = "FileHashStore.validateTmpObject - objSize given is not equal to the" - + " stored object size. ObjSize: " + objSize + ". storedObjFileSize:" - + storedObjFileSize + ". Deleting tmpFile: " + tmpFile.getName(); + + " stored object size. ObjSize: " + objSize + ". storedObjFileSize: " + + storedObjFileSize + ". Deleting tmpFile: " + tmpFile; logFileHashStore.error(errMsg); throw new IllegalArgumentException(errMsg); } @@ -1059,25 +1426,30 @@ private void validateTmpObject( if (!checksum.equalsIgnoreCase(digestFromHexDigests)) { // Delete tmp File - boolean deleteStatus = tmpFile.delete(); - if (!deleteStatus) { + try { + Files.delete(tmpFile); + + } catch (Exception ge) { String errMsg = "FileHashStore.validateTmpObject - Object cannot be validated. Checksum given" + " is not equal to the calculated hex digest: " + digestFromHexDigests + ". Checksum" + " provided: " + checksum - + ". Failed to delete tmpFile: " + tmpFile.getName(); + + ". Failed to delete tmpFile: " + tmpFile + ". " + ge.getMessage(); + ; logFileHashStore.error(errMsg); throw new IOException(errMsg); } + String errMsg = "FileHashStore.validateTmpObject - Checksum given is not equal to the" + " calculated hex digest: " + digestFromHexDigests + ". Checksum" - + " provided: " + checksum + ". tmpFile has been deleted: " + tmpFile - .getName(); + + " provided: " + checksum + ". tmpFile has been deleted: " + tmpFile; logFileHashStore.error(errMsg); throw new IllegalArgumentException(errMsg); } } + + return true; } /** @@ -1097,8 +1469,10 @@ protected boolean validateAlgorithm(String algorithm) throws NullPointerExceptio boolean algorithmSupported = Arrays.asList(SUPPORTED_HASH_ALGORITHMS).contains(algorithm); if (!algorithmSupported) { - String errMsg = "Algorithm not supported: " + algorithm + ". Supported algorithms: " - + Arrays.toString(SUPPORTED_HASH_ALGORITHMS); + String errMsg = "FileHashStore - validateAlgorithm: Algorithm not supported: " + + algorithm + ". Supported algorithms: " + Arrays.toString( + SUPPORTED_HASH_ALGORITHMS + ); logFileHashStore.error(errMsg); throw new NoSuchAlgorithmException(errMsg); } @@ -1106,6 +1480,23 @@ protected boolean validateAlgorithm(String algorithm) throws NullPointerExceptio return true; } + /** + * Checks whether the algorithm supplied is included in the DefaultHashAlgorithms + * + * @param algorithm Algorithm to check + * @return True if it's included + */ + private boolean isDefaultAlgorithm(String algorithm) { + boolean isDefaultAlgorithm = false; + for (DefaultHashAlgorithms defAlgo : DefaultHashAlgorithms.values()) { + if (algorithm.equals(defAlgo.getName())) { + isDefaultAlgorithm = true; + break; + } + } + return isDefaultAlgorithm; + } + /** * Determines whether an object will be verified with a given checksum and checksumAlgorithm * @@ -1142,101 +1533,6 @@ protected boolean verifyChecksumParameters(String checksum, String checksumAlgor return requestValidation; } - /** - * Given a string and supported algorithm returns the hex digest - * - * @param pid authority based identifier or persistent identifier - * @param algorithm string value (ex. SHA-256) - * @return Hex digest of the given string in lower-case - * @throws IllegalArgumentException String or algorithm cannot be null or empty - * @throws NoSuchAlgorithmException Algorithm not supported - */ - protected String getPidHexDigest(String pid, String algorithm) throws NoSuchAlgorithmException, - IllegalArgumentException { - FileHashStoreUtility.ensureNotNull(pid, "pid", "getPidHexDigest"); - FileHashStoreUtility.checkForEmptyString(pid, "pid", "getPidHexDigest"); - FileHashStoreUtility.ensureNotNull(algorithm, "algorithm", "getPidHexDigest"); - FileHashStoreUtility.checkForEmptyString(algorithm, "algorithm", "getPidHexDigest"); - validateAlgorithm(algorithm); - - MessageDigest stringMessageDigest = MessageDigest.getInstance(algorithm); - byte[] bytes = pid.getBytes(StandardCharsets.UTF_8); - stringMessageDigest.update(bytes); - // stringDigest - return DatatypeConverter.printHexBinary(stringMessageDigest.digest()).toLowerCase(); - } - - /** - * Generates a hierarchical path by dividing a given digest into tokens of fixed width, and - * concatenating them with '/' as the delimiter. - * - * @param dirDepth integer to represent number of directories - * @param dirWidth width of each directory - * @param digest value to shard - * @return String - */ - protected String getHierarchicalPathString(int dirDepth, int dirWidth, String digest) { - List tokens = new ArrayList<>(); - int digestLength = digest.length(); - for (int i = 0; i < dirDepth; i++) { - int start = i * dirWidth; - int end = Math.min((i + 1) * dirWidth, digestLength); - tokens.add(digest.substring(start, end)); - } - - if (dirDepth * dirWidth < digestLength) { - tokens.add(digest.substring(dirDepth * dirWidth)); - } - - List stringArray = new ArrayList<>(); - for (String str : tokens) { - if (!str.trim().isEmpty()) { - stringArray.add(str); - } - } - // stringShard - return String.join("/", stringArray); - } - - /** - * Creates an empty file in a given location - * - * @param prefix string to prepend before tmp file - * @param directory location to create tmp file - * @return Temporary file (File) ready to write into - * @throws IOException Issues with generating tmpFile - * @throws SecurityException Insufficient permissions to create tmpFile - */ - protected File generateTmpFile(String prefix, Path directory) throws IOException, - SecurityException { - Random rand = new Random(); - int randomNumber = rand.nextInt(1000000); - String newPrefix = prefix + "-" + System.currentTimeMillis() + randomNumber; - - try { - Path newPath = Files.createTempFile(directory, newPrefix, null); - File newFile = newPath.toFile(); - logFileHashStore.trace( - "FileHashStore.generateTmpFile - tmpFile generated: " + newFile.getAbsolutePath() - ); - newFile.deleteOnExit(); - return newFile; - - } catch (IOException ioe) { - String errMsg = "FileHashStore.generateTmpFile - Unable to generate tmpFile: " + ioe - .fillInStackTrace(); - logFileHashStore.error(errMsg); - throw new IOException(errMsg); - - } catch (SecurityException se) { - String errMsg = "FileHashStore.generateTmpFile - Unable to generate tmpFile: " + se - .fillInStackTrace(); - logFileHashStore.error(errMsg); - throw new SecurityException(errMsg); - - } - } - /** * Write the input stream into a given file (tmpFile) and return a HashMap consisting of * algorithms and their respective hex digests. If an additional algorithm is supplied and @@ -1258,17 +1554,22 @@ protected File generateTmpFile(String prefix, Path directory) throws IOException protected Map writeToTmpFileAndGenerateChecksums( File tmpFile, InputStream dataStream, String additionalAlgorithm, String checksumAlgorithm ) throws NoSuchAlgorithmException, IOException, FileNotFoundException, SecurityException { + // Determine whether to calculate additional or checksum algorithms + boolean generateAddAlgo = false; if (additionalAlgorithm != null) { FileHashStoreUtility.checkForEmptyString( additionalAlgorithm, "additionalAlgorithm", "writeToTmpFileAndGenerateChecksums" ); validateAlgorithm(additionalAlgorithm); + generateAddAlgo = !isDefaultAlgorithm(additionalAlgorithm); } - if (checksumAlgorithm != null) { + boolean generateCsAlgo = false; + if (checksumAlgorithm != null && !checksumAlgorithm.equals(additionalAlgorithm)) { FileHashStoreUtility.checkForEmptyString( checksumAlgorithm, "checksumAlgorithm", "writeToTmpFileAndGenerateChecksums" ); validateAlgorithm(checksumAlgorithm); + generateCsAlgo = !isDefaultAlgorithm(checksumAlgorithm); } FileOutputStream os = new FileOutputStream(tmpFile); @@ -1279,14 +1580,14 @@ protected Map writeToTmpFileAndGenerateChecksums( MessageDigest sha512 = MessageDigest.getInstance(DefaultHashAlgorithms.SHA_512.getName()); MessageDigest additionalAlgo = null; MessageDigest checksumAlgo = null; - if (additionalAlgorithm != null) { + if (generateAddAlgo) { logFileHashStore.debug( "FileHashStore.writeToTmpFileAndGenerateChecksums - Adding additional algorithm" + " to hex digest map, algorithm: " + additionalAlgorithm ); additionalAlgo = MessageDigest.getInstance(additionalAlgorithm); } - if (checksumAlgorithm != null && !checksumAlgorithm.equals(additionalAlgorithm)) { + if (generateCsAlgo) { logFileHashStore.debug( "FileHashStore.writeToTmpFileAndGenerateChecksums - Adding checksum algorithm" + " to hex digest map, algorithm: " + checksumAlgorithm @@ -1305,10 +1606,10 @@ protected Map writeToTmpFileAndGenerateChecksums( sha256.update(buffer, 0, bytesRead); sha384.update(buffer, 0, bytesRead); sha512.update(buffer, 0, bytesRead); - if (additionalAlgorithm != null) { + if (generateAddAlgo) { additionalAlgo.update(buffer, 0, bytesRead); } - if (checksumAlgorithm != null && !checksumAlgorithm.equals(additionalAlgorithm)) { + if (generateCsAlgo) { checksumAlgo.update(buffer, 0, bytesRead); } } @@ -1321,6 +1622,7 @@ protected Map writeToTmpFileAndGenerateChecksums( throw ioe; } finally { + dataStream.close(); os.flush(); os.close(); } @@ -1337,12 +1639,12 @@ protected Map writeToTmpFileAndGenerateChecksums( hexDigests.put(DefaultHashAlgorithms.SHA_256.getName(), sha256Digest); hexDigests.put(DefaultHashAlgorithms.SHA_384.getName(), sha384Digest); hexDigests.put(DefaultHashAlgorithms.SHA_512.getName(), sha512Digest); - if (additionalAlgorithm != null) { + if (generateAddAlgo) { String extraAlgoDigest = DatatypeConverter.printHexBinary(additionalAlgo.digest()) .toLowerCase(); hexDigests.put(additionalAlgorithm, extraAlgoDigest); } - if (checksumAlgorithm != null && !checksumAlgorithm.equals(additionalAlgorithm)) { + if (generateCsAlgo) { String extraChecksumDigest = DatatypeConverter.printHexBinary(checksumAlgo.digest()) .toLowerCase(); hexDigests.put(checksumAlgorithm, extraChecksumDigest); @@ -1377,13 +1679,12 @@ protected void move(File source, File target, String entity) throws IOException, ); // Validate input parameters FileHashStoreUtility.ensureNotNull(entity, "entity", "move"); - FileHashStoreUtility.checkForEmptyString(entity, "entity", "move"); // Entity is only used when checking for an existence of an object if (entity.equals("object") && target.exists()) { String errMsg = "FileHashStore.move - File already exists for target: " + target; - logFileHashStore.debug(errMsg); - throw new FileAlreadyExistsException(errMsg); + logFileHashStore.warn(errMsg); + return; } File destinationDirectory = new File(target.getParent()); @@ -1421,6 +1722,242 @@ protected void move(File source, File target, String entity) throws IOException, } } + /** + * Verifies that the reference files for the given pid and cid exist and contain + * the expected values. + * + * @param pid Authority-based or persistent identifier + * @param cid Content identifier + * @param absPidRefsPath Path to where the pid refs file exists + * @param absCidRefsPath Path to where the cid refs file exists + * @throws FileNotFoundException Any refs files are missing + * @throws IOException Unable to read any of the refs files or if the refs content + * is not what is expected + */ + protected void verifyHashStoreRefsFiles( + String pid, String cid, Path absPidRefsPath, Path absCidRefsPath + ) throws FileNotFoundException, IOException { + // First confirm that the files were created + if (!Files.exists(absCidRefsPath)) { + String errMsg = "FileHashStore.verifyHashStoreRefsFiles - cid refs file is missing: " + + absCidRefsPath + " for pid: " + pid; + logFileHashStore.error(errMsg); + throw new FileNotFoundException(errMsg); + } + if (!Files.exists(absPidRefsPath)) { + String errMsg = "FileHashStore.verifyHashStoreRefsFiles - pid refs file is missing: " + + absPidRefsPath + " for cid: " + cid; + logFileHashStore.error(errMsg); + throw new FileNotFoundException(errMsg); + } + // Now verify the content + try { + String cidRead = new String(Files.readAllBytes(absPidRefsPath)); + if (!cidRead.equals(cid)) { + String errMsg = "FileHashStore.verifyHashStoreRefsFiles - Unexpected cid: " + + cidRead + " found in pid refs file: " + absPidRefsPath + ". Expected cid: " + + cid; + logFileHashStore.error(errMsg); + throw new IOException(errMsg); + } + boolean pidFoundInCidRefFiles = isPidInCidRefsFile(pid, absCidRefsPath); + if (!pidFoundInCidRefFiles) { + String errMsg = "FileHashStore.verifyHashStoreRefsFiles - Missing expected pid: " + + pid + " in cid refs file: " + absCidRefsPath; + logFileHashStore.error(errMsg); + throw new IOException(errMsg); + } + } catch (IOException ioe) { + String errMsg = "FileHashStore.verifyHashStoreRefsFiles - " + ioe.getMessage(); + logFileHashStore.error(errMsg); + throw new IOException(errMsg); + } + } + + /** + * Writes the given 'pid' into a file in the 'cid' refs file format, which consists of + * multiple pids that references a 'cid' each on its own line (delimited by "\n"). + * + * @param pid Authority-based or persistent identifier to write + * @throws IOException Failure to write pid refs file + */ + protected File writeCidRefsFile(String pid) throws IOException { + File cidRefsTmpFile = FileHashStoreUtility.generateTmpFile("tmp", REFS_TMP_FILE_DIRECTORY); + try (BufferedWriter writer = new BufferedWriter( + new OutputStreamWriter( + Files.newOutputStream(cidRefsTmpFile.toPath()), StandardCharsets.UTF_8 + ) + )) { + writer.write(pid); + writer.close(); + + logFileHashStore.debug( + "FileHashStore.writeCidRefsFile - cid refs file written for: " + pid + ); + return cidRefsTmpFile; + + } catch (IOException ioe) { + logFileHashStore.error( + "FileHashStore.writeCidRefsFile - Unable to write cid refs file for pid: " + pid + + " IOException: " + ioe.getMessage() + ); + throw ioe; + } + } + + /** + * Writes the given 'cid' into a file in the 'pid' refs file format. A pid refs file + * contains a single 'cid'. Note, a 'pid' can only ever reference one 'cid'. + * + * @param cid Content identifier to write + * @throws IOException Failure to write pid refs file + */ + protected File writePidRefsFile(String cid) throws IOException { + File pidRefsTmpFile = FileHashStoreUtility.generateTmpFile("tmp", REFS_TMP_FILE_DIRECTORY); + try (BufferedWriter writer = new BufferedWriter( + new OutputStreamWriter( + Files.newOutputStream(pidRefsTmpFile.toPath()), StandardCharsets.UTF_8 + ) + )) { + writer.write(cid); + writer.close(); + + logFileHashStore.debug( + "FileHashStore.writePidRefsFile - pid refs file written for: " + cid + ); + return pidRefsTmpFile; + + } catch (IOException ioe) { + String errMsg = + "FileHashStore.writePidRefsFile - Unable to write pid refs file for cid: " + cid + + " IOException: " + ioe.getMessage(); + logFileHashStore.error(errMsg); + throw new IOException(errMsg); + } + } + + /** + * Checks a given cid refs file for a pid. This is case-sensitive. + * + * @param pid Authority-based or persistent identifier to search + * @param absCidRefsPath Path to the cid refs file to check + * @return True if cid is found, false otherwise + * @throws IOException If unable to read the cid refs file. + */ + protected boolean isPidInCidRefsFile(String pid, Path absCidRefsPath) throws IOException { + List lines = Files.readAllLines(absCidRefsPath); + boolean pidFoundInCidRefFiles = false; + for (String line : lines) { + if (line.equals(pid)) { + pidFoundInCidRefFiles = true; + break; + } + } + return pidFoundInCidRefFiles; + } + + /** + * Updates a cid refs file with a pid that references the cid + * + * @param pid Authority-based or persistent identifier + * @param absCidRefsPath Path to the cid refs file to update + * @throws IOException Issue with updating a cid refs file + */ + protected void updateCidRefsFiles(String pid, Path absCidRefsPath) throws IOException { + // This update process is atomic, so we first write the updated content + // into a temporary file before overwriting it. + File tmpFile = FileHashStoreUtility.generateTmpFile("tmp", REFS_TMP_FILE_DIRECTORY); + Path tmpFilePath = tmpFile.toPath(); + try { + // Obtain a lock on the file before updating it + try (FileChannel channel = FileChannel.open( + absCidRefsPath, StandardOpenOption.READ, StandardOpenOption.WRITE + ); FileLock ignored = channel.lock()) { + List lines = new ArrayList<>(Files.readAllLines(absCidRefsPath)); + lines.add(pid); + + Files.write(tmpFilePath, lines, StandardOpenOption.WRITE); + move(tmpFile, absCidRefsPath.toFile(), "refs"); + logFileHashStore.debug( + "FileHashStore.updateCidRefsFiles - Pid: " + pid + + " has been added to cid refs file: " + absCidRefsPath + ); + } + // The lock is automatically released when the try block exits + } catch (IOException ioe) { + String errMsg = "FileHashStore.updateCidRefsFiles - " + ioe.getMessage(); + logFileHashStore.error(errMsg); + throw new IOException(errMsg); + } + } + + /** + * Deletes a pid references file + * + * @param pid Authority-based or persistent identifier + * @throws NoSuchAlgorithmException Incompatible algorithm used to find pid refs file + * @throws IOException Unable to delete object or open pid refs file + */ + protected void deletePidRefsFile(String pid) throws NoSuchAlgorithmException, IOException { + FileHashStoreUtility.ensureNotNull(pid, "pid", "deletePidRefsFile"); + FileHashStoreUtility.checkForEmptyString(pid, "pid", "deletePidRefsFile"); + + Path absPidRefsPath = getRealPath(pid, "refs", "pid"); + // Check to see if pid refs file exists + if (!Files.exists(absPidRefsPath)) { + String errMsg = + "FileHashStore.deletePidRefsFile - File refs file does not exist for pid: " + pid + + " with address: " + absPidRefsPath; + logFileHashStore.error(errMsg); + throw new FileNotFoundException(errMsg); + + } else { + // Proceed to delete + Files.delete(absPidRefsPath); + logFileHashStore.debug( + "FileHashStore.deletePidRefsFile - Pid refs file deleted for: " + pid + + " with address: " + absPidRefsPath + ); + } + } + + + /** + * Removes a pid from a cid refs file. + * + * @param pid Authority-based or persistent identifier. + * @param absCidRefsPath Path to the cid refs file to remove the pid from + * @throws IOException Unable to access cid refs file + */ + protected void deleteCidRefsPid(String pid, Path absCidRefsPath) throws IOException { + FileHashStoreUtility.ensureNotNull(pid, "pid", "deleteCidRefsPid"); + FileHashStoreUtility.ensureNotNull(absCidRefsPath, "absCidRefsPath", "deleteCidRefsPid"); + // This deletes process is atomic, so we first write the updated content + // into a temporary file before overwriting it. + File tmpFile = FileHashStoreUtility.generateTmpFile("tmp", REFS_TMP_FILE_DIRECTORY); + Path tmpFilePath = tmpFile.toPath(); + try (FileChannel channel = FileChannel.open( + absCidRefsPath, StandardOpenOption.READ, StandardOpenOption.WRITE + ); FileLock ignored = channel.lock()) { + // Read all lines into a List + List lines = new ArrayList<>(Files.readAllLines(absCidRefsPath)); + lines.remove(pid); + Files.write(tmpFilePath, lines, StandardOpenOption.WRITE); + move(tmpFile, absCidRefsPath.toFile(), "refs"); + logFileHashStore.debug( + "FileHashStore.deleteCidRefsPid - Pid: " + pid + " removed from cid refs file: " + + absCidRefsPath + ); + // The lock is automatically released when the try block exits + } catch (IOException ioe) { + String errMsg = "FileHashStore.deleteCidRefsPid - Unable to remove pid: " + pid + + " from cid refs file: " + absCidRefsPath + ". Additional Info: " + ioe + .getMessage(); + logFileHashStore.error(errMsg); + throw new IOException(errMsg); + } + } + /** * Takes a given input stream and writes it to its permanent address on disk based on the * SHA-256 hex digest of the given pid + formatId. If no formatId is supplied, it will use the @@ -1457,11 +1994,15 @@ protected String putMetadata(InputStream metadata, String pid, String formatId) } // Get permanent address for the given metadata document - String metadataCid = getPidHexDigest(pid + checkedFormatId, OBJECT_STORE_ALGORITHM); + String metadataCid = FileHashStoreUtility.getPidHexDigest( + pid + checkedFormatId, OBJECT_STORE_ALGORITHM + ); Path metadataCidPath = getRealPath(pid, "metadata", checkedFormatId); // Store metadata to tmpMetadataFile - File tmpMetadataFile = generateTmpFile("tmp", METADATA_TMP_FILE_DIRECTORY); + File tmpMetadataFile = FileHashStoreUtility.generateTmpFile( + "tmp", METADATA_TMP_FILE_DIRECTORY + ); boolean tmpMetadataWritten = writeToTmpMetadataFile(tmpMetadataFile, metadata); if (tmpMetadataWritten) { logFileHashStore.debug( @@ -1516,30 +2057,55 @@ protected boolean writeToTmpMetadataFile(File tmpFile, InputStream metadataStrea /** * Get the absolute path of a HashStore object or metadata file * - * @param pid Authority-based identifier + * @param abId Authority-based, persistent or content identifier * @param entity "object" or "metadata" - * @param formatId Metadata namespace + * @param formatId Metadata namespace or reference type (pid/cid) * @return Actual path to object * @throws IllegalArgumentException If entity is not object or metadata * @throws NoSuchAlgorithmException If store algorithm is not supported + * @throws IOException If unable to retrieve cid */ - protected Path getRealPath(String pid, String entity, String formatId) - throws IllegalArgumentException, NoSuchAlgorithmException { + protected Path getRealPath(String abId, String entity, String formatId) + throws IllegalArgumentException, NoSuchAlgorithmException, IOException { Path realPath; if (entity.equalsIgnoreCase("object")) { - String objectCid = getPidHexDigest(pid, OBJECT_STORE_ALGORITHM); - String objShardString = getHierarchicalPathString( + // 'abId' is expected to be a pid + String objectCid = findObject(abId); + String objShardString = FileHashStoreUtility.getHierarchicalPathString( DIRECTORY_DEPTH, DIRECTORY_WIDTH, objectCid ); realPath = OBJECT_STORE_DIRECTORY.resolve(objShardString); } else if (entity.equalsIgnoreCase("metadata")) { - String objectCid = getPidHexDigest(pid + formatId, OBJECT_STORE_ALGORITHM); - String objShardString = getHierarchicalPathString( + String objectCid = FileHashStoreUtility.getPidHexDigest( + abId + formatId, OBJECT_STORE_ALGORITHM + ); + String objShardString = FileHashStoreUtility.getHierarchicalPathString( DIRECTORY_DEPTH, DIRECTORY_WIDTH, objectCid ); realPath = METADATA_STORE_DIRECTORY.resolve(objShardString); + } else if (entity.equalsIgnoreCase("refs")) { + if (formatId.equalsIgnoreCase("pid")) { + String pidRefId = FileHashStoreUtility.getPidHexDigest( + abId, OBJECT_STORE_ALGORITHM + ); + String pidShardString = FileHashStoreUtility.getHierarchicalPathString( + DIRECTORY_DEPTH, DIRECTORY_WIDTH, pidRefId + ); + realPath = REFS_PID_FILE_DIRECTORY.resolve(pidShardString); + } else if (formatId.equalsIgnoreCase("cid")) { + String cidShardString = FileHashStoreUtility.getHierarchicalPathString( + DIRECTORY_DEPTH, DIRECTORY_WIDTH, abId + ); + realPath = REFS_CID_FILE_DIRECTORY.resolve(cidShardString); + } else { + String errMsg = + "FileHashStore.getRealPath - formatId must be 'pid' or 'cid' when entity is 'refs'"; + logFileHashStore.error(errMsg); + throw new IllegalArgumentException(errMsg); + } + } else { throw new IllegalArgumentException( "FileHashStore.getRealPath - entity must be 'object' or 'metadata'" diff --git a/src/main/java/org/dataone/hashstore/filehashstore/FileHashStoreUtility.java b/src/main/java/org/dataone/hashstore/filehashstore/FileHashStoreUtility.java index 4ddee363..6b7ae759 100644 --- a/src/main/java/org/dataone/hashstore/filehashstore/FileHashStoreUtility.java +++ b/src/main/java/org/dataone/hashstore/filehashstore/FileHashStoreUtility.java @@ -1,11 +1,16 @@ package org.dataone.hashstore.filehashstore; +import java.io.File; import java.io.IOException; import java.io.InputStream; +import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Path; import java.security.MessageDigest; import java.security.NoSuchAlgorithmException; +import java.util.ArrayList; +import java.util.List; +import java.util.Random; import java.util.stream.Stream; import javax.xml.bind.DatatypeConverter; @@ -67,6 +72,29 @@ public static String calculateHexDigest(InputStream dataStream, String algorithm return DatatypeConverter.printHexBinary(mdObject.digest()).toLowerCase(); } + /** + * Given a string and supported algorithm returns the hex digest + * + * @param pid authority based identifier or persistent identifier + * @param algorithm string value (ex. SHA-256) + * @return Hex digest of the given string in lower-case + * @throws IllegalArgumentException String or algorithm cannot be null or empty + * @throws NoSuchAlgorithmException Algorithm not supported + */ + public static String getPidHexDigest(String pid, String algorithm) + throws NoSuchAlgorithmException, IllegalArgumentException { + FileHashStoreUtility.ensureNotNull(pid, "pid", "getPidHexDigest"); + FileHashStoreUtility.checkForEmptyString(pid, "pid", "getPidHexDigest"); + FileHashStoreUtility.ensureNotNull(algorithm, "algorithm", "getPidHexDigest"); + FileHashStoreUtility.checkForEmptyString(algorithm, "algorithm", "getPidHexDigest"); + + MessageDigest stringMessageDigest = MessageDigest.getInstance(algorithm); + byte[] bytes = pid.getBytes(StandardCharsets.UTF_8); + stringMessageDigest.update(bytes); + // stringDigest + return DatatypeConverter.printHexBinary(stringMessageDigest.digest()).toLowerCase(); + } + /** * Checks whether a directory is empty or contains files. If a file is found, it returns true. * @@ -122,4 +150,57 @@ public static void checkNotNegativeOrZero(long longInt, String method) } } + /** + * Generates a hierarchical path by dividing a given digest into tokens of fixed width, and + * concatenating them with '/' as the delimiter. + * + * @param depth integer to represent number of directories + * @param width width of each directory + * @param digest value to shard + * @return String + */ + public static String getHierarchicalPathString(int depth, int width, String digest) { + List tokens = new ArrayList<>(); + int digestLength = digest.length(); + for (int i = 0; i < depth; i++) { + int start = i * width; + int end = Math.min((i + 1) * width, digestLength); + tokens.add(digest.substring(start, end)); + } + + if (depth * width < digestLength) { + tokens.add(digest.substring(depth * width)); + } + + List stringArray = new ArrayList<>(); + for (String str : tokens) { + if (!str.trim().isEmpty()) { + stringArray.add(str); + } + } + // stringShard + return String.join("/", stringArray); + } + + /** + * Creates an empty/temporary file in a given location. If this file is not moved, it will + * be deleted upon JVM gracefully exiting or shutting down. + * + * @param prefix string to prepend before tmp file + * @param directory location to create tmp file + * @return Temporary file ready to write into + * @throws IOException Issues with generating tmpFile + * @throws SecurityException Insufficient permissions to create tmpFile + */ + public static File generateTmpFile(String prefix, Path directory) throws IOException, + SecurityException { + Random rand = new Random(); + int randomNumber = rand.nextInt(1000000); + String newPrefix = prefix + "-" + System.currentTimeMillis() + randomNumber; + + Path newPath = Files.createTempFile(directory, newPrefix, null); + File newFile = newPath.toFile(); + newFile.deleteOnExit(); + return newFile; + } } diff --git a/src/test/java/org/dataone/hashstore/HashStoreClientTest.java b/src/test/java/org/dataone/hashstore/HashStoreClientTest.java index 443fda3e..40bd443a 100644 --- a/src/test/java/org/dataone/hashstore/HashStoreClientTest.java +++ b/src/test/java/org/dataone/hashstore/HashStoreClientTest.java @@ -168,7 +168,7 @@ public void client_storeObjects() throws Exception { HashStoreClient.main(args); // Confirm object was stored - Path absPath = getObjectAbsPath(testData.pidData.get(pid).get("object_cid"), "object"); + Path absPath = getObjectAbsPath(testData.pidData.get(pid).get("sha256"), "object"); assertTrue(Files.exists(absPath)); // Put things back @@ -387,7 +387,7 @@ public void client_deleteMetadata() throws Exception { } /** - * Test hashStore client returns hex digest of object. + * Test hashStore client calculates the hex digest of object. */ @Test public void client_getHexDigest() throws Exception { @@ -427,4 +427,42 @@ public void client_getHexDigest() throws Exception { assertEquals(testDataChecksum, pidStdOut.trim()); } } + + /** + * Test hashStore client returns the content identifier (cid) of an object + */ + @Test + public void client_findObject() throws Exception { + for (String pid : testData.pidList) { + // Redirect stdout to capture output + ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); + PrintStream ps = new PrintStream(outputStream); + PrintStream old = System.out; + System.setOut(ps); + + String pidFormatted = pid.replace("/", "_"); + Path testDataFile = testData.getTestFile(pidFormatted); + InputStream dataStream = Files.newInputStream(testDataFile); + hashStore.storeObject(dataStream, pid, null, null, null, -1); + + // Call client + String optFindObject = "-findobject"; + String optStore = "-store"; + String optStorePath = hsProperties.getProperty("storePath"); + String optPid = "-pid"; + String optPidValue = pid; + String[] args = {optFindObject, optStore, optStorePath, optPid, optPidValue}; + HashStoreClient.main(args); + + String contentIdentifier = testData.pidData.get(pid).get("sha256"); + + // Put things back + System.out.flush(); + System.setOut(old); + + // Confirm correct content identifier has been saved + String pidStdOut = outputStream.toString(); + assertEquals(contentIdentifier, pidStdOut.trim()); + } + } } diff --git a/src/test/java/org/dataone/hashstore/HashStoreTest.java b/src/test/java/org/dataone/hashstore/HashStoreTest.java index 0507987e..1f34c70a 100644 --- a/src/test/java/org/dataone/hashstore/HashStoreTest.java +++ b/src/test/java/org/dataone/hashstore/HashStoreTest.java @@ -127,11 +127,11 @@ public void hashStore_storeObjects() throws Exception { Path testDataFile = testData.getTestFile(pidFormatted); InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = hashStore.storeObject(dataStream, pid, null, null, null, -1); + ObjectMetadata objInfo = hashStore.storeObject(dataStream, pid, null, null, null, -1); // Check id (sha-256 hex digest of the ab_id, aka object_cid) - String objAuthorityId = testData.pidData.get(pid).get("object_cid"); - assertEquals(objAuthorityId, objInfo.getId()); + String objContentId = testData.pidData.get(pid).get("sha256"); + assertEquals(objContentId, objInfo.getCid()); } } } diff --git a/src/test/java/org/dataone/hashstore/ObjectInfoTest.java b/src/test/java/org/dataone/hashstore/ObjectMetadataTest.java similarity index 68% rename from src/test/java/org/dataone/hashstore/ObjectInfoTest.java rename to src/test/java/org/dataone/hashstore/ObjectMetadataTest.java index f3f44c88..8d97a0f3 100644 --- a/src/test/java/org/dataone/hashstore/ObjectInfoTest.java +++ b/src/test/java/org/dataone/hashstore/ObjectMetadataTest.java @@ -10,15 +10,15 @@ import org.junit.jupiter.api.BeforeEach; /** - * Test class for ObjectInfo + * Test class for ObjectMetadata */ -public class ObjectInfoTest { +public class ObjectMetadataTest { private static String id = ""; private static long size; private static Map hexDigests; /** - * Initialize ObjectInfo variables for test efficiency purposes + * Initialize ObjectMetadata variables for test efficiency purposes */ @BeforeEach public void initializeInstanceVariables() { @@ -41,40 +41,40 @@ public void initializeInstanceVariables() { } /** - * Check ObjectInfo constructor + * Check ObjectMetadata constructor */ @Test - public void testObjectInfo() { - ObjectInfo objInfo = new ObjectInfo(id, size, hexDigests); + public void testObjectMetadata() { + ObjectMetadata objInfo = new ObjectMetadata(id, size, hexDigests); assertNotNull(objInfo); } /** - * Check ObjectInfo get id + * Check ObjectMetadata get id */ @Test - public void testObjectInfoGetId() { - ObjectInfo objInfo = new ObjectInfo(id, size, hexDigests); - String objId = objInfo.getId(); + public void testObjectMetadataGetId() { + ObjectMetadata objInfo = new ObjectMetadata(id, size, hexDigests); + String objId = objInfo.getCid(); assertEquals(objId, id); } /** - * Check ObjectInfo get size + * Check ObjectMetadata get size */ @Test public void testHashAddressGetSize() { - ObjectInfo objInfo = new ObjectInfo(id, size, hexDigests); + ObjectMetadata objInfo = new ObjectMetadata(id, size, hexDigests); long objSize = objInfo.getSize(); assertEquals(objSize, size); } /** - * Check ObjectInfo get hexDigests + * Check ObjectMetadata get hexDigests */ @Test - public void testObjectInfoGetHexDigests() { - ObjectInfo objInfo = new ObjectInfo(id, size, hexDigests); + public void testObjectMetadataGetHexDigests() { + ObjectMetadata objInfo = new ObjectMetadata(id, size, hexDigests); Map objInfoMap = objInfo.getHexDigests(); assertEquals(objInfoMap, hexDigests); } diff --git a/src/test/java/org/dataone/hashstore/filehashstore/FileHashStoreInterfaceTest.java b/src/test/java/org/dataone/hashstore/filehashstore/FileHashStoreInterfaceTest.java index e6f7fbfc..bdd6d2d2 100644 --- a/src/test/java/org/dataone/hashstore/filehashstore/FileHashStoreInterfaceTest.java +++ b/src/test/java/org/dataone/hashstore/filehashstore/FileHashStoreInterfaceTest.java @@ -29,8 +29,9 @@ import javax.xml.bind.DatatypeConverter; -import org.dataone.hashstore.ObjectInfo; -import org.dataone.hashstore.exceptions.PidObjectExistsException; +import org.dataone.hashstore.ObjectMetadata; +import org.dataone.hashstore.exceptions.OrphanPidRefsFileException; +import org.dataone.hashstore.exceptions.PidNotFoundInCidRefsFileException; import org.dataone.hashstore.testdata.TestDataHarness; import org.junit.jupiter.api.BeforeEach; import org.junit.jupiter.api.Test; @@ -38,11 +39,14 @@ /** - * Test class for FileHashStore HashStoreInterface override methods + * Test class for FileHashStore HashStore Interface methods. + * + * Note: `tagObject` & `verifyObject` tests can be found in the `FileHashStoreReferences` class */ public class FileHashStoreInterfaceTest { private FileHashStore fileHashStore; private Properties fhsProperties; + private Path rootDirectory; private static final TestDataHarness testData = new TestDataHarness(); /** @@ -50,7 +54,7 @@ public class FileHashStoreInterfaceTest { */ @BeforeEach public void initializeFileHashStore() { - Path rootDirectory = tempFolder.resolve("metacat"); + rootDirectory = tempFolder.resolve("metacat"); Properties storeProperties = new Properties(); storeProperties.setProperty("storePath", rootDirectory.toString()); @@ -81,23 +85,7 @@ public void initializeFileHashStore() { public Path tempFolder; /** - * Utility method to get absolute path of a given object - */ - public Path getObjectAbsPath(String id) { - int shardDepth = Integer.parseInt(fhsProperties.getProperty("storeDepth")); - int shardWidth = Integer.parseInt(fhsProperties.getProperty("storeWidth")); - // Get relative path - String objCidShardString = fileHashStore.getHierarchicalPathString( - shardDepth, shardWidth, id - ); - // Get absolute path - Path storePath = Paths.get(fhsProperties.getProperty("storePath")); - - return storePath.resolve("objects/" + objCidShardString); - } - - /** - * Check that store object returns the correct ObjectInfo id + * Check that store object returns the correct ObjectMetadata id */ @Test public void storeObject() throws Exception { @@ -106,16 +94,18 @@ public void storeObject() throws Exception { Path testDataFile = testData.getTestFile(pidFormatted); InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.storeObject(dataStream, pid, null, null, null, -1); + ObjectMetadata objInfo = fileHashStore.storeObject( + dataStream, pid, null, null, null, -1 + ); - // Check id (sha-256 hex digest of the ab_id (pid)) - String objectCid = testData.pidData.get(pid).get("object_cid"); - assertEquals(objectCid, objInfo.getId()); + // Check id (content identifier based on the store algorithm) + String objectCid = testData.pidData.get(pid).get("sha256"); + assertEquals(objectCid, objInfo.getCid()); } } /** - * Check that store object returns the correct ObjectInfo size + * Check that store object returns the correct ObjectMetadata size */ @Test public void storeObject_objSize() throws Exception { @@ -124,7 +114,9 @@ public void storeObject_objSize() throws Exception { Path testDataFile = testData.getTestFile(pidFormatted); InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.storeObject(dataStream, pid, null, null, null, -1); + ObjectMetadata objInfo = fileHashStore.storeObject( + dataStream, pid, null, null, null, -1 + ); // Check the object size long objectSize = Long.parseLong(testData.pidData.get(pid).get("size")); @@ -133,7 +125,7 @@ public void storeObject_objSize() throws Exception { } /** - * Check that store object returns the correct ObjectInfo hex digests + * Check that store object returns the correct ObjectMetadata hex digests */ @Test public void storeObject_hexDigests() throws Exception { @@ -142,7 +134,9 @@ public void storeObject_hexDigests() throws Exception { Path testDataFile = testData.getTestFile(pidFormatted); InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.storeObject(dataStream, pid, null, null, null, -1); + ObjectMetadata objInfo = fileHashStore.storeObject( + dataStream, pid, null, null, null, -1 + ); Map hexDigests = objInfo.getHexDigests(); @@ -220,37 +214,41 @@ public void storeObject_zeroObjSize() { } /** - * Verify that storeObject generates an additional checksum with overloaded method + * Verify that storeObject stores and validates a given checksum and its expected size + * with overloaded method */ @Test - public void storeObject_additionalAlgorithm_overload() throws Exception { + public void storeObject_overloadChecksumCsAlgoAndSize() throws Exception { for (String pid : testData.pidList) { String pidFormatted = pid.replace("/", "_"); Path testDataFile = testData.getTestFile(pidFormatted); + String md2 = testData.pidData.get(pid).get("md2"); + long objectSize = Long.parseLong(testData.pidData.get(pid).get("size")); InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.storeObject(dataStream, pid, "MD2"); + ObjectMetadata objInfo = fileHashStore.storeObject( + dataStream, pid, md2, "MD2", objectSize + ); Map hexDigests = objInfo.getHexDigests(); // Validate checksum values - String md2 = testData.pidData.get(pid).get("md2"); assertEquals(md2, hexDigests.get("MD2")); } } /** - * Verify that storeObject validates checksum with overloaded method + * Verify that storeObject stores and validates a given checksum with overloaded method */ @Test - public void storeObject_validateChecksum_overload() throws Exception { + public void storeObject_overloadChecksumAndChecksumAlgo() throws Exception { for (String pid : testData.pidList) { String pidFormatted = pid.replace("/", "_"); Path testDataFile = testData.getTestFile(pidFormatted); String md2 = testData.pidData.get(pid).get("md2"); InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.storeObject(dataStream, pid, md2, "MD2"); + ObjectMetadata objInfo = fileHashStore.storeObject(dataStream, pid, md2, "MD2"); Map hexDigests = objInfo.getHexDigests(); @@ -260,24 +258,72 @@ public void storeObject_validateChecksum_overload() throws Exception { } /** - * Check that store object returns the correct ObjectInfo size with overloaded method + * Check that store object returns the correct ObjectMetadata size with overloaded method */ @Test - public void storeObject_objSize_overload() throws Exception { + public void storeObject_overloadObjSize() throws Exception { for (String pid : testData.pidList) { String pidFormatted = pid.replace("/", "_"); Path testDataFile = testData.getTestFile(pidFormatted); long objectSize = Long.parseLong(testData.pidData.get(pid).get("size")); InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.storeObject(dataStream, pid, objectSize); + ObjectMetadata objInfo = fileHashStore.storeObject(dataStream, pid, objectSize); assertEquals(objectSize, objInfo.getSize()); } } /** - * Verify that storeObject stores an object with a good checksum value + * Check that store object executes as expected with only an InputStream (does not create + * any reference files) + */ + @Test + public void storeObject_overloadInputStreamOnly() throws Exception { + for (String pid : testData.pidList) { + String pidFormatted = pid.replace("/", "_"); + Path testDataFile = testData.getTestFile(pidFormatted); + + InputStream dataStream = Files.newInputStream(testDataFile); + ObjectMetadata objInfo = fileHashStore.storeObject(dataStream); + + Map hexDigests = objInfo.getHexDigests(); + String defaultStoreAlgorithm = fhsProperties.getProperty("storeAlgorithm"); + String cid = objInfo.getCid(); + + assertEquals(hexDigests.get(defaultStoreAlgorithm), cid); + + assertThrows(FileNotFoundException.class, () -> { + fileHashStore.findObject(pid); + }); + + Path cidRefsFilePath = fileHashStore.getRealPath(cid, "refs", "cid"); + assertFalse(Files.exists(cidRefsFilePath)); + } + } + + /** + * Verify that storeObject generates an additional checksum with overloaded method + */ + @Test + public void storeObject_overloadAdditionalAlgo() throws Exception { + for (String pid : testData.pidList) { + String pidFormatted = pid.replace("/", "_"); + Path testDataFile = testData.getTestFile(pidFormatted); + + InputStream dataStream = Files.newInputStream(testDataFile); + ObjectMetadata objInfo = fileHashStore.storeObject(dataStream, pid, "MD2"); + + Map hexDigests = objInfo.getHexDigests(); + + // Validate checksum values + String md2 = testData.pidData.get(pid).get("md2"); + assertEquals(md2, hexDigests.get("MD2")); + } + } + + /** + * Verify that storeObject returns the expected checksum value */ @Test public void storeObject_validateChecksumValue() throws Exception { @@ -288,12 +334,9 @@ public void storeObject_validateChecksumValue() throws Exception { String checksumCorrect = "94f9b6c88f1f458e410c30c351c6384ea42ac1b5ee1f8430d3e365e43b78a38a"; InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo address = fileHashStore.storeObject( - dataStream, pid, null, checksumCorrect, "SHA-256", -1 - ); + fileHashStore.storeObject(dataStream, pid, null, checksumCorrect, "SHA-256", -1); - String objCid = address.getId(); - Path objCidAbsPath = getObjectAbsPath(objCid); + Path objCidAbsPath = fileHashStore.getRealPath(pid, "object", null); assertTrue(Files.exists(objCidAbsPath)); } @@ -376,7 +419,7 @@ public void storeObject_objSizeCorrect() throws Exception { long objectSize = Long.parseLong(testData.pidData.get(pid).get("size")); InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.storeObject( + ObjectMetadata objInfo = fileHashStore.storeObject( dataStream, pid, null, null, null, objectSize ); @@ -396,7 +439,7 @@ public void storeObject_objSizeIncorrect() { Path testDataFile = testData.getTestFile(pidFormatted); InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.storeObject( + ObjectMetadata objInfo = fileHashStore.storeObject( dataStream, pid, null, null, null, 1000 ); @@ -423,28 +466,36 @@ public void storeObject_invalidAlgorithm() { } /** - * Check that store object throws FileAlreadyExists error when storing duplicate object + * Check that store object tags cid refs file as expected when called + * to store a duplicate object (two pids that reference the same cid) */ @Test - public void storeObject_duplicate() { + public void storeObject_duplicate() throws Exception { for (String pid : testData.pidList) { - assertThrows(PidObjectExistsException.class, () -> { - String pidFormatted = pid.replace("/", "_"); - Path testDataFile = testData.getTestFile(pidFormatted); + String pidFormatted = pid.replace("/", "_"); + Path testDataFile = testData.getTestFile(pidFormatted); - InputStream dataStream = Files.newInputStream(testDataFile); - fileHashStore.storeObject(dataStream, pid, null, null, null, -1); + InputStream dataStream = Files.newInputStream(testDataFile); + fileHashStore.storeObject(dataStream, pid, null, null, null, -1); - InputStream dataStreamDup = Files.newInputStream(testDataFile); - fileHashStore.storeObject(dataStreamDup, pid, null, null, null, -1); - }); + String pidTwo = pid + ".test"; + InputStream dataStreamDup = Files.newInputStream(testDataFile); + ObjectMetadata objInfo = fileHashStore.storeObject( + dataStreamDup, pidTwo, null, null, null, -1 + ); + + String cid = objInfo.getCid(); + Path absCidRefsPath = fileHashStore.getRealPath(cid, "refs", "cid"); + assertTrue(fileHashStore.isPidInCidRefsFile(pid, absCidRefsPath)); + assertTrue(fileHashStore.isPidInCidRefsFile(pidTwo, absCidRefsPath)); } } /** * Test that storeObject successfully stores a 1GB file * - * Note, a 4GB successfully stored in approximately 1m30s + * Note 1: a 4GB successfully stored in approximately 1m30s + * Note 2: Successfully stores 250GB file confirmed from knbvm */ @Test public void storeObject_largeSparseFile() throws Exception { @@ -467,12 +518,9 @@ public void storeObject_largeSparseFile() throws Exception { InputStream dataStream = Files.newInputStream(testFilePath); String pid = "dou.sparsefile.1"; - ObjectInfo sparseFileObjInfo = fileHashStore.storeObject( - dataStream, pid, null, null, null, -1 - ); + fileHashStore.storeObject(dataStream, pid, null, null, null, -1); - String objCid = sparseFileObjInfo.getId(); - Path objCidAbsPath = getObjectAbsPath(objCid); + Path objCidAbsPath = fileHashStore.getRealPath(pid, "object", null); assertTrue(Files.exists(objCidAbsPath)); } @@ -505,7 +553,7 @@ public void storeObject_interruptProcess() throws Exception { InputStream dataStream = Files.newInputStream(testFilePath); String pid = "dou.sparsefile.1"; fileHashStore.storeObject(dataStream, pid, null, null, null, -1); - } catch (IOException | NoSuchAlgorithmException ioe) { + } catch (IOException | NoSuchAlgorithmException | InterruptedException ioe) { ioe.printStackTrace(); } }); @@ -527,14 +575,9 @@ public void storeObject_interruptProcess() throws Exception { * will encounter an `ExecutionException`. The thread that does not encounter an exception will * store the given object, and verifies that the object is stored successfully. * - * The threads that run into exceptions will encounter a `RunTimeException` or a - * `PidObjectExistsException`. If a call is made to 'storeObject' for a pid that is already in - * progress of being stored, a `RunTimeException` will be thrown. - * - * If a call is made to 'storeObject' for a pid that has been stored, the thread will encounter - * a `PidObjectExistsException` - since `putObject` checks for the existence of a given data - * object before it attempts to generate a temp file (write to it, generate checksums, etc.). - * + * The threads that run into exceptions will encounter a `RunTimeException` since the expected + * object to store is already in progress (thrown by `syncPutObject` which coordinates + * `store_object` requests with a pid). */ @Test public void storeObject_objectLockedIds_FiveThreads() throws Exception { @@ -543,92 +586,106 @@ public void storeObject_objectLockedIds_FiveThreads() throws Exception { Path testDataFile = testData.getTestFile(pid); // Create a thread pool with 3 threads - ExecutorService executorService = Executors.newFixedThreadPool(3); + ExecutorService executorService = Executors.newFixedThreadPool(5); - // Submit 3 threads, each calling storeObject + // Submit 5 futures to the thread pool, each calling storeObject Future future1 = executorService.submit(() -> { try { InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.storeObject( - dataStream, pid, null, null, null, 0 + ObjectMetadata objInfo = fileHashStore.storeObject( + dataStream, pid, null, null, null, -1 ); if (objInfo != null) { - String objId = objInfo.getId(); - Path objCidAbsPath = getObjectAbsPath(objId); + String cid = objInfo.getCid(); + Path objCidAbsPath = fileHashStore.getRealPath(pid, "object", null); + Path pidRefsPath = fileHashStore.getRealPath(pid, "refs", "pid"); + Path cidRefsPath = fileHashStore.getRealPath(cid, "refs", "cid"); assertTrue(Files.exists(objCidAbsPath)); + assertTrue(Files.exists(pidRefsPath)); + assertTrue(Files.exists(cidRefsPath)); } } catch (Exception e) { + System.out.println("Start Thread 1 Exception:"); System.out.println(e.getClass()); e.printStackTrace(); - assertTrue(e instanceof RuntimeException || e instanceof PidObjectExistsException); + System.out.println("End Thread 1 Exception\n"); + assertTrue(e instanceof RuntimeException); } }); Future future2 = executorService.submit(() -> { try { InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.storeObject( - dataStream, pid, null, null, null, 0 + ObjectMetadata objInfo = fileHashStore.storeObject( + dataStream, pid, null, null, null, -1 ); if (objInfo != null) { - String objId = objInfo.getId(); - Path objCidAbsPath = getObjectAbsPath(objId); + String cid = objInfo.getCid(); + Path objCidAbsPath = fileHashStore.getRealPath(pid, "object", null); + Path pidRefsPath = fileHashStore.getRealPath(pid, "refs", "pid"); + Path cidRefsPath = fileHashStore.getRealPath(cid, "refs", "cid"); assertTrue(Files.exists(objCidAbsPath)); + assertTrue(Files.exists(pidRefsPath)); + assertTrue(Files.exists(cidRefsPath)); } } catch (Exception e) { - System.out.println(e.getClass()); - e.printStackTrace(); - assertTrue(e instanceof RuntimeException || e instanceof PidObjectExistsException); + assertTrue(e instanceof RuntimeException); } }); Future future3 = executorService.submit(() -> { try { InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.storeObject( - dataStream, pid, null, null, null, 0 + ObjectMetadata objInfo = fileHashStore.storeObject( + dataStream, pid, null, null, null, -1 ); if (objInfo != null) { - String objId = objInfo.getId(); - Path objCidAbsPath = getObjectAbsPath(objId); + String cid = objInfo.getCid(); + Path objCidAbsPath = fileHashStore.getRealPath(pid, "object", null); + Path pidRefsPath = fileHashStore.getRealPath(pid, "refs", "pid"); + Path cidRefsPath = fileHashStore.getRealPath(cid, "refs", "cid"); assertTrue(Files.exists(objCidAbsPath)); + assertTrue(Files.exists(pidRefsPath)); + assertTrue(Files.exists(cidRefsPath)); } } catch (Exception e) { - System.out.println(e.getClass()); - e.printStackTrace(); - assertTrue(e instanceof RuntimeException || e instanceof PidObjectExistsException); + assertTrue(e instanceof RuntimeException); } }); Future future4 = executorService.submit(() -> { try { InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.storeObject( - dataStream, pid, null, null, null, 0 + ObjectMetadata objInfo = fileHashStore.storeObject( + dataStream, pid, null, null, null, -1 ); if (objInfo != null) { - String objId = objInfo.getId(); - Path objCidAbsPath = getObjectAbsPath(objId); + String cid = objInfo.getCid(); + Path objCidAbsPath = fileHashStore.getRealPath(pid, "object", null); + Path pidRefsPath = fileHashStore.getRealPath(pid, "refs", "pid"); + Path cidRefsPath = fileHashStore.getRealPath(cid, "refs", "cid"); assertTrue(Files.exists(objCidAbsPath)); + assertTrue(Files.exists(pidRefsPath)); + assertTrue(Files.exists(cidRefsPath)); } } catch (Exception e) { - System.out.println(e.getClass()); - e.printStackTrace(); - assertTrue(e instanceof RuntimeException || e instanceof PidObjectExistsException); + assertTrue(e instanceof RuntimeException); } }); Future future5 = executorService.submit(() -> { try { InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.storeObject( - dataStream, pid, null, null, null, 0 + ObjectMetadata objInfo = fileHashStore.storeObject( + dataStream, pid, null, null, null, -1 ); if (objInfo != null) { - String objId = objInfo.getId(); - Path objCidAbsPath = getObjectAbsPath(objId); + String cid = objInfo.getCid(); + Path objCidAbsPath = fileHashStore.getRealPath(pid, "object", null); + Path pidRefsPath = fileHashStore.getRealPath(pid, "refs", "pid"); + Path cidRefsPath = fileHashStore.getRealPath(cid, "refs", "cid"); assertTrue(Files.exists(objCidAbsPath)); + assertTrue(Files.exists(pidRefsPath)); + assertTrue(Files.exists(cidRefsPath)); } } catch (Exception e) { - System.out.println(e.getClass()); - e.printStackTrace(); - assertTrue(e instanceof RuntimeException || e instanceof PidObjectExistsException); + assertTrue(e instanceof RuntimeException); } }); @@ -643,65 +700,6 @@ public void storeObject_objectLockedIds_FiveThreads() throws Exception { executorService.awaitTermination(1, TimeUnit.MINUTES); } - /** - * Tests that the `storeObject` method can store an object successfully with two threads. This - * test uses two futures (threads) that run concurrently, one of which will encounter an - * `ExecutionException`. The thread that does not encounter an exception will store the given - * object, and verifies that the object is stored successfully. - */ - @Test - public void storeObject_objectLockedIds_TwoThreads() throws Exception { - // Get single test file to "upload" - String pid = "jtao.1700.1"; - Path testDataFile = testData.getTestFile(pid); - - // Create a thread pool with 3 threads - ExecutorService executorService = Executors.newFixedThreadPool(3); - - // Submit 3 threads, each calling storeObject - Future future1 = executorService.submit(() -> { - try { - InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.storeObject( - dataStream, pid, null, null, null, 0 - ); - if (objInfo != null) { - String objId = objInfo.getId(); - Path objCidAbsPath = getObjectAbsPath(objId); - assertTrue(Files.exists(objCidAbsPath)); - } - } catch (Exception e) { - System.out.println(e.getClass()); - e.printStackTrace(); - assertTrue(e instanceof RuntimeException || e instanceof PidObjectExistsException); - } - }); - Future future2 = executorService.submit(() -> { - try { - InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.storeObject( - dataStream, pid, null, null, null, 0 - ); - if (objInfo != null) { - String objId = objInfo.getId(); - Path objCidAbsPath = getObjectAbsPath(objId); - assertTrue(Files.exists(objCidAbsPath)); - } - } catch (Exception e) { - System.out.println(e.getClass()); - e.printStackTrace(); - assertTrue(e instanceof RuntimeException || e instanceof PidObjectExistsException); - } - }); - - // Wait for all tasks to complete and check results - // .get() on the future ensures that all tasks complete before the test ends - future1.get(); - future2.get(); - executorService.shutdown(); - executorService.awaitTermination(1, TimeUnit.MINUTES); - } - /** * Test storeMetadata stores metadata as expected */ @@ -717,7 +715,7 @@ public void storeMetadata() throws Exception { String metadataCid = fileHashStore.storeMetadata(metadataStream, pid, null); // Get relative path - String metadataCidShardString = fileHashStore.getHierarchicalPathString( + String metadataCidShardString = FileHashStoreUtility.getHierarchicalPathString( 3, 2, metadataCid ); // Get absolute path @@ -747,7 +745,7 @@ public void storeMetadata_defaultFormatId_overload() throws Exception { String metadataCid = fileHashStore.storeMetadata(metadataStream, pid); // Get relative path - String metadataCidShardString = fileHashStore.getHierarchicalPathString( + String metadataCidShardString = FileHashStoreUtility.getHierarchicalPathString( 3, 2, metadataCid ); // Get absolute path @@ -777,7 +775,7 @@ public void storeMetadata_fileSize() throws Exception { String metadataCid = fileHashStore.storeMetadata(metadataStream, pid, null); // Get relative path - String metadataCidShardString = fileHashStore.getHierarchicalPathString( + String metadataCidShardString = FileHashStoreUtility.getHierarchicalPathString( 3, 2, metadataCid ); // Get absolute path @@ -861,9 +859,10 @@ public void storeMetadata_pidEmptySpaces() { /** * Tests that the `storeMetadata()` method can store metadata successfully with multiple threads - * (3). This test uses three futures (threads) that run concurrently, each of which will have to - * wait for the given `pid` to be released from metadataLockedIds before proceeding to store the - * given metadata content from its `storeMetadata()` request. + * (3) and does not throw any exceptions. This test uses three futures (threads) that run + * concurrently, each of which will have to wait for the given `pid` to be released from + * metadataLockedIds before proceeding to store the given metadata content from its + * `storeMetadata()` request. * * All requests to store the same metadata will be executed, and the existing metadata file will * be overwritten by each thread. No exceptions should be encountered during these tests. @@ -950,17 +949,28 @@ public void retrieveObject() throws Exception { // Retrieve object InputStream objectCidInputStream = fileHashStore.retrieveObject(pid); assertNotNull(objectCidInputStream); + objectCidInputStream.close(); } } + /** + * Check that retrieveObject throws exception when there is no object + * associated with a given pid + */ + @Test + public void retrieveObject_pidDoesNotExist() { + assertThrows(FileNotFoundException.class, () -> { + fileHashStore.retrieveObject("pid.whose.object.does.not.exist"); + }); + } + /** * Check that retrieveObject throws exception when pid is null */ @Test public void retrieveObject_pidNull() { assertThrows(IllegalArgumentException.class, () -> { - InputStream pidInputStream = fileHashStore.retrieveObject(null); - pidInputStream.close(); + fileHashStore.retrieveObject(null); }); } @@ -970,8 +980,7 @@ public void retrieveObject_pidNull() { @Test public void retrieveObject_pidEmpty() { assertThrows(IllegalArgumentException.class, () -> { - InputStream pidInputStream = fileHashStore.retrieveObject(""); - pidInputStream.close(); + fileHashStore.retrieveObject(""); }); } @@ -981,8 +990,7 @@ public void retrieveObject_pidEmpty() { @Test public void retrieveObject_pidEmptySpaces() { assertThrows(IllegalArgumentException.class, () -> { - InputStream pidInputStream = fileHashStore.retrieveObject(" "); - pidInputStream.close(); + fileHashStore.retrieveObject(" "); }); } @@ -992,8 +1000,7 @@ public void retrieveObject_pidEmptySpaces() { @Test public void retrieveObject_pidNotFound() { assertThrows(FileNotFoundException.class, () -> { - InputStream pidInputStream = fileHashStore.retrieveObject("dou.2023.hs.1"); - pidInputStream.close(); + fileHashStore.retrieveObject("dou.2023.hs.1"); }); } @@ -1033,15 +1040,15 @@ public void retrieveObject_verifyContent() throws Exception { ioe.printStackTrace(); throw ioe; + } finally { + // Close stream + objectCidInputStream.close(); } // Get hex digest String sha256Digest = DatatypeConverter.printHexBinary(sha256.digest()).toLowerCase(); String sha256DigestFromTestData = testData.pidData.get(pid).get("sha256"); assertEquals(sha256Digest, sha256DigestFromTestData); - - // Close stream - objectCidInputStream.close(); } } @@ -1208,6 +1215,9 @@ public void retrieveMetadata_verifyContent() throws Exception { ioe.printStackTrace(); throw ioe; + } finally { + // Close stream + metadataCidInputStream.close(); } // Get hex digest @@ -1217,29 +1227,25 @@ public void retrieveMetadata_verifyContent() throws Exception { "metadata_sha256" ); assertEquals(sha256MetadataDigest, sha256MetadataDigestFromTestData); - - // Close stream - metadataCidInputStream.close(); } } /** - * Confirm that deleteObject deletes object and empty subdirectories + * Confirm that deleteObject deletes object */ @Test - public void deleteObject() throws Exception { + public void deleteObject_objectDeleted() throws Exception { for (String pid : testData.pidList) { String pidFormatted = pid.replace("/", "_"); Path testDataFile = testData.getTestFile(pidFormatted); InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.storeObject(dataStream, pid, null, null, null, -1); + fileHashStore.storeObject(dataStream, pid, null, null, null, -1); + Path objCidAbsPath = fileHashStore.getRealPath(pid, "object", null); fileHashStore.deleteObject(pid); // Check that file doesn't exist - String objId = objInfo.getId(); - Path objCidAbsPath = getObjectAbsPath(objId); assertFalse(Files.exists(objCidAbsPath)); // Check that parent directories are not deleted @@ -1252,6 +1258,100 @@ public void deleteObject() throws Exception { } } + /** + * Confirm that deleteObject deletes reference files + */ + @Test + public void deleteObject_referencesDeleted() throws Exception { + for (String pid : testData.pidList) { + String pidFormatted = pid.replace("/", "_"); + Path testDataFile = testData.getTestFile(pidFormatted); + + InputStream dataStream = Files.newInputStream(testDataFile); + ObjectMetadata objInfo = fileHashStore.storeObject( + dataStream, pid, null, null, null, -1 + ); + String cid = objInfo.getCid(); + + // Path objAbsPath = fileHashStore.getRealPath(pid, "object", null); + Path absPathPidRefsPath = fileHashStore.getRealPath(pid, "refs", "pid"); + Path absPathCidRefsPath = fileHashStore.getRealPath(cid, "refs", "cid"); + fileHashStore.deleteObject(pid); + assertFalse(Files.exists(absPathPidRefsPath)); + assertFalse(Files.exists(absPathCidRefsPath)); + } + } + + /** + * Confirm that cid refs file and object do not get deleted when an object has more than one + * reference (when the client calls 'deleteObject' on a pid that references an object that still + * has references). + */ + @Test + public void deleteObject_objectExistsIfCidRefencesFileNotEmpty() throws Exception { + for (String pid : testData.pidList) { + String pidFormatted = pid.replace("/", "_"); + Path testDataFile = testData.getTestFile(pidFormatted); + + InputStream dataStream = Files.newInputStream(testDataFile); + ObjectMetadata objInfo = fileHashStore.storeObject( + dataStream, pid, null, null, null, -1 + ); + String pidExtra = "dou.test" + pid; + String cid = objInfo.getCid(); + fileHashStore.tagObject(pidExtra, cid); + + Path objCidAbsPath = fileHashStore.getRealPath(pid, "object", null); + Path absPathPidRefsPath = fileHashStore.getRealPath(pid, "refs", "pid"); + Path absPathCidRefsPath = fileHashStore.getRealPath(cid, "refs", "cid"); + fileHashStore.deleteObject(pid); + + assertFalse(Files.exists(absPathPidRefsPath)); + assertTrue(Files.exists(objCidAbsPath)); + assertTrue(Files.exists(absPathCidRefsPath)); + } + } + + /** + * Confirm that deleteObject removes an orphan pid reference file when the associated cid refs + * file does not contain the expected pid. + * + * @throws Exception + */ + @Test + public void deleteObject_pidOrphan() throws Exception { + for (String pid : testData.pidList) { + String pidFormatted = pid.replace("/", "_"); + Path testDataFile = testData.getTestFile(pidFormatted); + + InputStream dataStream = Files.newInputStream(testDataFile); + ObjectMetadata objInfo = fileHashStore.storeObject( + dataStream, pid, null, null, null, -1 + ); + String cid = objInfo.getCid(); + String pidExtra = "dou.test" + pid; + Path objRealPath = fileHashStore.getRealPath(pid, "object", null); + + // Manually change the pid found in the cid refs file + Path absPathCidRefsPath = fileHashStore.getRealPath(cid, "refs", "cid"); + fileHashStore.updateCidRefsFiles(pidExtra, absPathCidRefsPath); + // Create an orphaned pid refs file + fileHashStore.deleteCidRefsPid(pid, absPathCidRefsPath); + + fileHashStore.deleteObject(pid); + + // Confirm cid refs file still exists + assertTrue(Files.exists(absPathCidRefsPath)); + // Confirm the original (and now orphaned) pid refs file is deleted + Path absPathPidRefsPath = fileHashStore.getRealPath(pid, "refs", "pid"); + assertFalse(Files.exists(absPathPidRefsPath)); + // Confirm the object has not been deleted + assertTrue(Files.exists(objRealPath)); + // Confirm the cid refs file still exists + assertTrue(Files.exists(absPathCidRefsPath)); + } + } + /** * Confirm that deleteObject throws exception when associated pid obj not found */ @@ -1286,6 +1386,87 @@ public void deleteObject_pidEmptySpaces() { assertThrows(IllegalArgumentException.class, () -> fileHashStore.deleteObject(" ")); } + /** + * Confirm deleteObject overload method to delete a cid deletes cid with a true bool + */ + @Test + public void deleteObject_overloadCidDeleteTrue() throws Exception { + for (String pid : testData.pidList) { + String pidFormatted = pid.replace("/", "_"); + Path testDataFile = testData.getTestFile(pidFormatted); + + InputStream dataStream = Files.newInputStream(testDataFile); + ObjectMetadata objInfo = fileHashStore.storeObject(dataStream); + String cid = objInfo.getCid(); + + // Set flag to true + fileHashStore.deleteObject(cid, true); + + // Get permanent address of the actual cid + int storeDepth = Integer.parseInt(fhsProperties.getProperty("storeDepth")); + int storeWidth = Integer.parseInt(fhsProperties.getProperty("storeWidth")); + String actualCid = objInfo.getCid(); + String cidShardString = FileHashStoreUtility.getHierarchicalPathString( + storeDepth, storeWidth, actualCid + ); + Path objectStoreDirectory = rootDirectory.resolve("objects").resolve(cidShardString); + assertFalse(Files.exists(objectStoreDirectory)); + } + } + + /** + * Confirm deleteObject overload method does not delete an object with a true bool + * because a cid refs file exists + */ + @Test + public void deleteObject_overloadCidDeleteTrueButCidRefsExists() throws Exception { + for (String pid : testData.pidList) { + String pidFormatted = pid.replace("/", "_"); + Path testDataFile = testData.getTestFile(pidFormatted); + + InputStream dataStream = Files.newInputStream(testDataFile); + ObjectMetadata objInfo = fileHashStore.storeObject( + dataStream, pid, null, null, null, -1 + ); + String cid = objInfo.getCid(); + + // Set flag to true + fileHashStore.deleteObject(cid, true); + + // Get permanent address of the actual cid + Path objRealPath = fileHashStore.getRealPath(pid, "object", null); + assertTrue(Files.exists(objRealPath)); + } + } + + /** + * Confirm deleteObject overload method does not delete an object with a false bool + */ + @Test + public void deleteObject_overloadCidDeleteFalse() throws Exception { + for (String pid : testData.pidList) { + String pidFormatted = pid.replace("/", "_"); + Path testDataFile = testData.getTestFile(pidFormatted); + + InputStream dataStream = Files.newInputStream(testDataFile); + ObjectMetadata objInfo = fileHashStore.storeObject(dataStream); + String cid = objInfo.getCid(); + + // Set flag to true + fileHashStore.deleteObject(cid, false); + + // Get permanent address of the actual cid + int storeDepth = Integer.parseInt(fhsProperties.getProperty("storeDepth")); + int storeWidth = Integer.parseInt(fhsProperties.getProperty("storeWidth")); + String actualCid = objInfo.getCid(); + String cidShardString = FileHashStoreUtility.getHierarchicalPathString( + storeDepth, storeWidth, actualCid + ); + Path objectStoreDirectory = rootDirectory.resolve("objects").resolve(cidShardString); + assertTrue(Files.exists(objectStoreDirectory)); + } + } + /** * Confirm that deleteMetadata deletes metadata and empty sub directories */ @@ -1349,14 +1530,13 @@ public void deleteMetadata_overload() throws Exception { } /** - * Confirm that deleteMetadata throws exception when associated pid obj not found + * Confirm that no exceptions are thrown when called to delete metadata + * that does not exist. */ @Test - public void deleteMetadata_pidNotFound() { - assertThrows(FileNotFoundException.class, () -> { - String formatId = "http://hashstore.tests/types/v1.0"; - fileHashStore.deleteMetadata("dou.2023.hashstore.1", formatId); - }); + public void deleteMetadata_pidNotFound() throws Exception { + String formatId = "http://hashstore.tests/types/v1.0"; + fileHashStore.deleteMetadata("dou.2023.hashstore.1", formatId); } /** @@ -1374,7 +1554,7 @@ public void deleteMetadata_pidNull() { * Confirm that deleteMetadata throws exception when pid is empty */ @Test - public void deleteMetadata_pidEmpty() { + public void deleteMetadata_pidEmpty() throws Exception { assertThrows(IllegalArgumentException.class, () -> { String formatId = "http://hashstore.tests/types/v1.0"; fileHashStore.deleteMetadata("", formatId); @@ -1436,7 +1616,9 @@ public void getHexDigest() throws Exception { Path testDataFile = testData.getTestFile(pidFormatted); InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.storeObject(dataStream, pid, null, null, null, -1); + ObjectMetadata objInfo = fileHashStore.storeObject( + dataStream, pid, null, null, null, -1 + ); // Then get the checksum String pidHexDigest = fileHashStore.getHexDigest(pid, "SHA-256"); @@ -1508,4 +1690,67 @@ public void getHexDigest_badAlgo() { }); } } + + /** + * Confirm expected cid is returned + */ + @Test + public void findObject_cid() throws Exception { + String pid = "dou.test.1"; + String cid = "abcdef123456789"; + fileHashStore.tagObject(pid, cid); + + String cidRetrieved = fileHashStore.findObject(pid); + + assertEquals(cid, cidRetrieved); + } + + /** + * Confirm that findObject throws OrphanPidRefsFileException exception when + * pid refs file found but cid refs file is missing. + */ + @Test + public void findObject_cidRefsFileNotFound() throws Exception { + String pid = "dou.test.1"; + String cid = "abcdef123456789"; + fileHashStore.tagObject(pid, cid); + + Path cidRefsPath = fileHashStore.getRealPath(cid, "refs", "cid"); + Files.delete(cidRefsPath); + + assertThrows(OrphanPidRefsFileException.class, () -> { + fileHashStore.findObject(pid); + }); + } + + + /** + * Confirm that findObject throws PidNotFoundInCidRefsFileException exception when + * pid refs file found but cid refs file is missing. + */ + @Test + public void findObject_cidRefsFileMissingPid() throws Exception { + String pid = "dou.test.1"; + String cid = "abcdef123456789"; + fileHashStore.tagObject(pid, cid); + + Path cidRefsPath = fileHashStore.getRealPath(cid, "refs", "cid"); + fileHashStore.deleteCidRefsPid(pid, cidRefsPath); + + assertThrows(PidNotFoundInCidRefsFileException.class, () -> { + fileHashStore.findObject(pid); + }); + } + + /** + * Check that exception is thrown when pid refs file doesn't exist + */ + @Test + public void findObject_pidNotFound() { + String pid = "dou.test.1"; + assertThrows(FileNotFoundException.class, () -> { + fileHashStore.findObject(pid); + }); + } + } diff --git a/src/test/java/org/dataone/hashstore/filehashstore/FileHashStoreProtectedTest.java b/src/test/java/org/dataone/hashstore/filehashstore/FileHashStoreProtectedTest.java index 819f5294..09a26f15 100644 --- a/src/test/java/org/dataone/hashstore/filehashstore/FileHashStoreProtectedTest.java +++ b/src/test/java/org/dataone/hashstore/filehashstore/FileHashStoreProtectedTest.java @@ -9,7 +9,6 @@ import java.io.File; import java.io.IOException; import java.io.InputStream; -import java.nio.file.FileAlreadyExistsException; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; @@ -20,8 +19,7 @@ import javax.xml.bind.DatatypeConverter; -import org.dataone.hashstore.ObjectInfo; -import org.dataone.hashstore.exceptions.PidObjectExistsException; +import org.dataone.hashstore.ObjectMetadata; import org.dataone.hashstore.testdata.TestDataHarness; import org.junit.jupiter.api.BeforeEach; import org.junit.jupiter.api.Test; @@ -69,9 +67,8 @@ public void initializeFileHashStore() { */ public File generateTemporaryFile() throws Exception { Path directory = tempFolder.resolve("metacat"); - System.out.println(directory); // newFile - return fileHashStore.generateTmpFile("testfile", directory); + return FileHashStoreUtility.generateTmpFile("testfile", directory); } /** @@ -167,7 +164,7 @@ public void generateTempFile() throws Exception { */ @Test public void getHierarchicalPathString() { - String shardedPath = fileHashStore.getHierarchicalPathString( + String shardedPath = FileHashStoreUtility.getHierarchicalPathString( 3, 2, "94f9b6c88f1f458e410c30c351c6384ea42ac1b5ee1f8430d3e365e43b78a38a" ); String shardedPathExpected = @@ -175,31 +172,6 @@ public void getHierarchicalPathString() { assertEquals(shardedPath, shardedPathExpected); } - /** - * Check getPidHexDigest calculates correct hex digest value - */ - @Test - public void getPidHexDigest() throws Exception { - for (String pid : testData.pidList) { - String abIdDigest = fileHashStore.getPidHexDigest(pid, "SHA-256"); - String abIdTestData = testData.pidData.get(pid).get("object_cid"); - assertEquals(abIdDigest, abIdTestData); - } - } - - /** - * Check that getPidHexDigest throws NoSuchAlgorithmException - */ - @Test - public void getPidHexDigest_badAlgorithm() { - for (String pid : testData.pidList) { - assertThrows( - NoSuchAlgorithmException.class, () -> fileHashStore.getPidHexDigest(pid, "SM2") - ); - - } - } - /** * Verify that putObject returns correct id */ @@ -210,16 +182,16 @@ public void putObject_testHarness_id() throws Exception { Path testDataFile = testData.getTestFile(pidFormatted); InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo address = fileHashStore.putObject(dataStream, pid, null, null, null, -1); + ObjectMetadata address = fileHashStore.putObject(dataStream, pid, null, null, null, -1); // Check id (sha-256 hex digest of the ab_id, aka object_cid) - String objAuthorityId = testData.pidData.get(pid).get("object_cid"); - assertEquals(objAuthorityId, address.getId()); + String objContentId = testData.pidData.get(pid).get("sha256"); + assertEquals(objContentId, address.getCid()); } } /** - * Check that store object returns the correct ObjectInfo size + * Check that store object returns the correct ObjectMetadata size */ @Test public void putObject_objSize() throws Exception { @@ -228,7 +200,7 @@ public void putObject_objSize() throws Exception { Path testDataFile = testData.getTestFile(pidFormatted); InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.putObject(dataStream, pid, null, null, null, -1); + ObjectMetadata objInfo = fileHashStore.putObject(dataStream, pid, null, null, null, -1); // Check id (sha-256 hex digest of the ab_id (pid)) long objectSize = Long.parseLong(testData.pidData.get(pid).get("size")); @@ -246,7 +218,7 @@ public void putObject_testHarness_hexDigests() throws Exception { Path testDataFile = testData.getTestFile(pidFormatted); InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo address = fileHashStore.putObject(dataStream, pid, null, null, null, -1); + ObjectMetadata address = fileHashStore.putObject(dataStream, pid, null, null, null, -1); Map hexDigests = address.getHexDigests(); @@ -276,13 +248,13 @@ public void putObject_validateChecksumValue() throws Exception { String checksumCorrect = "9c25df1c8ba1d2e57bb3fd4785878b85"; InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo address = fileHashStore.putObject( + ObjectMetadata address = fileHashStore.putObject( dataStream, pid, null, checksumCorrect, "MD2", -1 ); - String objCid = address.getId(); + String objCid = address.getCid(); // Get relative path - String objCidShardString = fileHashStore.getHierarchicalPathString(3, 2, objCid); + String objCidShardString = FileHashStoreUtility.getHierarchicalPathString(3, 2, objCid); // Get absolute path Path storePath = Paths.get(fhsProperties.getProperty("storePath")); Path objCidAbsPath = storePath.resolve("objects/" + objCidShardString); @@ -396,7 +368,7 @@ public void putObject_objSizeCorrect() throws Exception { long objectSize = Long.parseLong(testData.pidData.get(pid).get("size")); InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.putObject( + ObjectMetadata objInfo = fileHashStore.putObject( dataStream, pid, null, null, null, objectSize ); @@ -416,7 +388,7 @@ public void putObject_objSizeIncorrect() { Path testDataFile = testData.getTestFile(pidFormatted); InputStream dataStream = Files.newInputStream(testDataFile); - ObjectInfo objInfo = fileHashStore.putObject( + ObjectMetadata objInfo = fileHashStore.putObject( dataStream, pid, null, null, null, 1000 ); @@ -428,22 +400,27 @@ public void putObject_objSizeIncorrect() { } /** - * Verify putObject throws exception when storing a duplicate object + * Verify putObject deletes temporary file written if called to store an object + * that already exists (duplicate) */ @Test - public void putObject_duplicateObject() { - assertThrows(PidObjectExistsException.class, () -> { - // Get test file to "upload" - String pid = "jtao.1700.1"; - Path testDataFile = testData.getTestFile(pid); + public void putObject_duplicateObject() throws Exception { + // Get test file to "upload" + String pid = "jtao.1700.1"; + Path testDataFile = testData.getTestFile(pid); - InputStream dataStream = Files.newInputStream(testDataFile); - fileHashStore.putObject(dataStream, pid, null, null, null, -1); + InputStream dataStream = Files.newInputStream(testDataFile); + fileHashStore.putObject(dataStream, pid, null, null, null, -1); - // Try duplicate upload - InputStream dataStreamTwo = Files.newInputStream(testDataFile); - fileHashStore.putObject(dataStreamTwo, pid, null, null, null, -1); - }); + // Try duplicate upload + String pidTwo = pid + ".test"; + InputStream dataStreamTwo = Files.newInputStream(testDataFile); + fileHashStore.putObject(dataStreamTwo, pidTwo, null, null, null, -1); + + // Confirm there are no files in 'objects/tmp' directory + Path storePath = Paths.get(fhsProperties.getProperty("storePath")); + File[] files = storePath.resolve("objects/tmp").toFile().listFiles(); + assertEquals(0, files.length); } /** @@ -476,49 +453,6 @@ public void putObject_emptyAlgorithm() { }); } - /** - * Verify putObject throws exception when pid is empty - */ - @Test - public void putObject_emptyPid() { - assertThrows(IllegalArgumentException.class, () -> { - // Get test file to "upload" - String pidEmpty = ""; - String pid = "jtao.1700.1"; - Path testDataFile = testData.getTestFile(pid); - - InputStream dataStream = Files.newInputStream(testDataFile); - fileHashStore.putObject(dataStream, pidEmpty, null, null, null, -1); - }); - } - - /** - * Verify putObject throws exception when pid is null - */ - @Test - public void putObject_nullPid() { - assertThrows(IllegalArgumentException.class, () -> { - // Get test file to "upload" - String pid = "jtao.1700.1"; - Path testDataFile = testData.getTestFile(pid); - - InputStream dataStream = Files.newInputStream(testDataFile); - fileHashStore.putObject(dataStream, null, "MD2", null, null, -1); - }); - } - - /** - * Verify putObject throws exception object is null - */ - @Test - public void putObject_nullObject() { - assertThrows(IllegalArgumentException.class, () -> { - // Get test file to "upload" - String pid = "jtao.1700.1"; - fileHashStore.putObject(null, pid, "MD2", null, null, -1); - }); - } - /** * Check default checksums are generated */ @@ -693,19 +627,17 @@ public void testMove() throws Exception { } /** - * Confirm that FileAlreadyExistsException is thrown when target already exists + * Confirm that exceptions are not thrown when move is called on an object that already exists */ @Test - public void testMove_targetExists() { - assertThrows(FileAlreadyExistsException.class, () -> { - File newTmpFile = generateTemporaryFile(); - String targetString = tempFolder.toString() + "/testmove/test_tmp_object.tmp"; - File targetFile = new File(targetString); - fileHashStore.move(newTmpFile, targetFile, "object"); + public void testMove_targetExists() throws Exception { + File newTmpFile = generateTemporaryFile(); + String targetString = tempFolder.toString() + "/testmove/test_tmp_object.tmp"; + File targetFile = new File(targetString); + fileHashStore.move(newTmpFile, targetFile, "object"); - File newTmpFileTwo = generateTemporaryFile(); - fileHashStore.move(newTmpFileTwo, targetFile, "object"); - }); + File newTmpFileTwo = generateTemporaryFile(); + fileHashStore.move(newTmpFileTwo, targetFile, "object"); } /** @@ -762,7 +694,7 @@ public void putMetadata() throws Exception { String metadataCid = fileHashStore.putMetadata(metadataStream, pid, null); // Get relative path - String metadataCidShardString = fileHashStore.getHierarchicalPathString( + String metadataCidShardString = FileHashStoreUtility.getHierarchicalPathString( 3, 2, metadataCid ); // Get absolute path @@ -939,4 +871,66 @@ public void writeToTmpMetadataFile_metadataContent() throws Exception { metadataStoredStream.close(); } } + + /** + * Confirm that isPidInCidRefsFile returns true when pid is found + */ + @Test + public void isPidInCidRefsFile_pidFound() throws Exception { + for (String pid : testData.pidList) { + String pidFormatted = pid.replace("/", "_"); + Path testDataFile = testData.getTestFile(pidFormatted); + + InputStream dataStream = Files.newInputStream(testDataFile); + fileHashStore.storeObject(dataStream, pid, null, null, null, -1); + + String pidTwo = pid + ".test"; + InputStream dataStreamDup = Files.newInputStream(testDataFile); + ObjectMetadata objInfo = fileHashStore.storeObject( + dataStreamDup, pidTwo, null, null, null, -1 + ); + + String cid = objInfo.getCid(); + Path absCidRefsPath = fileHashStore.getRealPath(cid, "refs", "cid"); + assertTrue(fileHashStore.isPidInCidRefsFile(pidTwo, absCidRefsPath)); + } + } + + /** + * Confirm that isPidInCidRefsFile returns false when pid is found + */ + @Test + public void isPidInCidRefsFile_pidNotFound() throws Exception { + for (String pid : testData.pidList) { + String pidFormatted = pid.replace("/", "_"); + Path testDataFile = testData.getTestFile(pidFormatted); + + InputStream dataStream = Files.newInputStream(testDataFile); + ObjectMetadata objInfo = fileHashStore.storeObject( + dataStream, pid, null, null, null, -1 + ); + + String cid = objInfo.getCid(); + Path absCidRefsPath = fileHashStore.getRealPath(cid, "refs", "cid"); + assertFalse(fileHashStore.isPidInCidRefsFile("pid.not.found", absCidRefsPath)); + } + } + + @Test + public void getRealPath() throws Exception { + // Get single test file to "upload" + String pid = "jtao.1700.1"; + Path testDataFile = testData.getTestFile(pid); + + InputStream dataStream = Files.newInputStream(testDataFile); + ObjectMetadata objInfo = fileHashStore.storeObject(dataStream, pid, null, null, null, -1); + String cid = objInfo.getCid(); + + Path objCidAbsPath = fileHashStore.getRealPath(pid, "object", null); + Path pidRefsPath = fileHashStore.getRealPath(pid, "refs", "pid"); + Path cidRefsPath = fileHashStore.getRealPath(cid, "refs", "cid"); + assertTrue(Files.exists(objCidAbsPath)); + assertTrue(Files.exists(pidRefsPath)); + assertTrue(Files.exists(cidRefsPath)); + } } diff --git a/src/test/java/org/dataone/hashstore/filehashstore/FileHashStorePublicTest.java b/src/test/java/org/dataone/hashstore/filehashstore/FileHashStorePublicTest.java index d4c5458a..05b0fa42 100644 --- a/src/test/java/org/dataone/hashstore/filehashstore/FileHashStorePublicTest.java +++ b/src/test/java/org/dataone/hashstore/filehashstore/FileHashStorePublicTest.java @@ -247,39 +247,40 @@ public void initDefaultStore_directoryNull() { } /** - * Check object store directory is created after initialization + * Check object store and tmp directories are created after initialization */ @Test - public void initObjDirectory() { + public void initObjDirectories() { Path checkObjectStorePath = objStringFull; assertTrue(Files.isDirectory(checkObjectStorePath)); - } - - /** - * Check object store tmp directory is created after initialization - */ - @Test - public void initObjTmpDirectory() { Path checkTmpPath = objTmpStringFull; assertTrue(Files.isDirectory(checkTmpPath)); } /** - * Check metadata store directory is created after initialization + * Check metadata store and tmp directories are created after initialization */ @Test - public void initMetadataDirectory() { + public void initMetadataDirectories() { Path checkMetadataStorePath = metadataStringFull; assertTrue(Files.isDirectory(checkMetadataStorePath)); + Path checkMetadataTmpPath = metadataTmpStringFull; + assertTrue(Files.isDirectory(checkMetadataTmpPath)); } /** - * Check metadata store tmp directory is created after initialization + * Check refs tmp, pid and cid directories are created after initialization */ @Test - public void initMetadataTmpDirectory() { - Path checkMetadataTmpPath = metadataTmpStringFull; - assertTrue(Files.isDirectory(checkMetadataTmpPath)); + public void initRefsDirectories() { + Path refsPath = rootDirectory.resolve("refs"); + assertTrue(Files.isDirectory(refsPath)); + Path refsTmpPath = rootDirectory.resolve("refs/tmp"); + assertTrue(Files.isDirectory(refsTmpPath)); + Path refsPidPath = rootDirectory.resolve("refs/pid"); + assertTrue(Files.isDirectory(refsPidPath)); + Path refsCidPath = rootDirectory.resolve("refs/cid"); + assertTrue(Files.isDirectory(refsCidPath)); } /** diff --git a/src/test/java/org/dataone/hashstore/filehashstore/FileHashStoreReferencesTest.java b/src/test/java/org/dataone/hashstore/filehashstore/FileHashStoreReferencesTest.java new file mode 100644 index 00000000..3b48adc7 --- /dev/null +++ b/src/test/java/org/dataone/hashstore/filehashstore/FileHashStoreReferencesTest.java @@ -0,0 +1,406 @@ +package org.dataone.hashstore.filehashstore; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertFalse; +import static org.junit.jupiter.api.Assertions.assertThrows; +import static org.junit.jupiter.api.Assertions.assertTrue; +import static org.junit.jupiter.api.Assertions.fail; + +import java.io.File; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.InputStream; +import java.nio.file.Files; +import java.nio.file.Path; +import java.security.NoSuchAlgorithmException; +import java.util.List; +import java.util.Properties; + +import org.dataone.hashstore.ObjectMetadata; +import org.dataone.hashstore.exceptions.PidRefsFileExistsException; +import org.dataone.hashstore.testdata.TestDataHarness; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; + +/** + * Test class for FileHashStore references related methods + */ +public class FileHashStoreReferencesTest { + private FileHashStore fileHashStore; + private Path rootDirectory; + private Properties fhsProperties; + private static final TestDataHarness testData = new TestDataHarness(); + + /** + * Initialize FileHashStore before each test to creates tmp directories + */ + @BeforeEach + public void initializeFileHashStore() { + rootDirectory = tempFolder.resolve("metacat"); + + Properties storeProperties = new Properties(); + storeProperties.setProperty("storePath", rootDirectory.toString()); + storeProperties.setProperty("storeDepth", "3"); + storeProperties.setProperty("storeWidth", "2"); + storeProperties.setProperty("storeAlgorithm", "SHA-256"); + storeProperties.setProperty( + "storeMetadataNamespace", "http://ns.dataone.org/service/types/v2.0" + ); + + try { + fhsProperties = storeProperties; + fileHashStore = new FileHashStore(storeProperties); + + } catch (IOException ioe) { + fail("IOException encountered: " + ioe.getMessage()); + + } catch (NoSuchAlgorithmException nsae) { + fail("NoSuchAlgorithmException encountered: " + nsae.getMessage()); + + } + } + + /** + * Temporary folder for tests to run in + */ + @TempDir + public Path tempFolder; + + /** + * Check that tagObject writes expected pid refs files + */ + @Test + public void tagObject_pidRefsFile() throws Exception { + String pid = "dou.test.1"; + String cid = "abcdef123456789"; + fileHashStore.tagObject(pid, cid); + + Path pidRefsFilePath = fileHashStore.getRealPath(pid, "refs", "pid"); + assertTrue(Files.exists(pidRefsFilePath)); + } + + /** + * Check that tagObject writes expected cid refs files + */ + @Test + public void tagObject_cidRefsFile() throws Exception { + String pid = "dou.test.1"; + String cid = "abcdef123456789"; + fileHashStore.tagObject(pid, cid); + + Path cidRefsFilePath = fileHashStore.getRealPath(cid, "refs", "cid"); + assertTrue(Files.exists(cidRefsFilePath)); + } + + /** + * Check that tagObject throws exception when pid refs file already exists + */ + @Test + public void tagObject_pidRefsFileExists() throws Exception { + String pid = "dou.test.1"; + String cid = "abcdef123456789"; + fileHashStore.tagObject(pid, cid); + + assertThrows(PidRefsFileExistsException.class, () -> { + fileHashStore.tagObject(pid, cid); + }); + + } + + /** + * Check that tagObject creates a pid refs file and updates an existing cid refs file + */ + @Test + public void tagObject_cidRefsFileExists() throws Exception { + String pid = "dou.test.1"; + String cid = "abcdef123456789"; + fileHashStore.tagObject(pid, cid); + + String pidAdditional = "another.pid.2"; + fileHashStore.tagObject(pidAdditional, cid); + + Path pidRefsFilePath = fileHashStore.getRealPath(pid, "refs", "pid"); + assertTrue(Files.exists(pidRefsFilePath)); + + + // Check cid refs file + Path cidRefsFilePath = fileHashStore.getRealPath(cid, "refs", "cid"); + boolean pidFoundInCidRefFiles = fileHashStore.isPidInCidRefsFile( + pidAdditional, cidRefsFilePath + ); + assertTrue(pidFoundInCidRefFiles); + } + + /** + * Check that tagObject creates pid refs file when pid already exists in cid refs file + */ + @Test + public void tagObject_pidExistsInCidRefsFile() throws Exception { + String pid = "dou.test.1"; + String cid = "abcdef123456789"; + + File cidRefsTmpFile = fileHashStore.writeCidRefsFile(pid); + Path cidRefsFilePath = fileHashStore.getRealPath(cid, "refs", "cid"); + fileHashStore.move(cidRefsTmpFile, cidRefsFilePath.toFile(), "refs"); + + fileHashStore.tagObject(pid, cid); + + Path pidRefsFilePath = fileHashStore.getRealPath(pid, "refs", "pid"); + assertTrue(Files.exists(pidRefsFilePath)); + + // Confirm that cid refs file only has 1 line + List lines = Files.readAllLines(cidRefsFilePath); + int numberOfLines = lines.size(); + assertEquals(numberOfLines, 1); + + } + + /** + * Check that the cid supplied is written into the file given + */ + @Test + public void writePidRefsFile_content() throws Exception { + String cidToWrite = "test_cid_123"; + File pidRefsTmpFile = fileHashStore.writePidRefsFile(cidToWrite); + + String cidRead = new String(Files.readAllBytes(pidRefsTmpFile.toPath())); + assertEquals(cidRead, cidToWrite); + } + + /** + * Check that the pid supplied is written into the file given with a new line + */ + @Test + public void writeCidRefsFile_content() throws Exception { + String pidToWrite = "dou.test.123"; + File cidRefsTmpFile = fileHashStore.writeCidRefsFile(pidToWrite); + + String pidRead = new String(Files.readAllBytes(cidRefsTmpFile.toPath())); + assertEquals(pidRead, pidToWrite); + } + + /** + * Check that exception is thrown when incorrect cid in a pid refs file. + */ + @Test + public void verifyHashStoreRefFiles_unexpectedCid() throws Exception { + String pid = "dou.test.1"; + String cid = "abcdef123456789"; + fileHashStore.tagObject(pid, cid); + + // Create a pid refs file with the incorrect cid + String cidToWrite = "123456789abcdef"; + File pidRefsTmpFile = fileHashStore.writePidRefsFile(cidToWrite); + Path pidRefsTmpFilePath = pidRefsTmpFile.toPath(); + + // Get path of the cid refs file + Path cidRefsFilePath = fileHashStore.getRealPath(cid, "refs", "cid"); + + assertThrows(IOException.class, () -> { + fileHashStore.verifyHashStoreRefsFiles(pid, cid, pidRefsTmpFilePath, cidRefsFilePath); + }); + } + + /** + * Check that exception is thrown when an expected pid is not found in a cid refs file + */ + @Test + public void verifyHashStoreRefFiles_pidNotFoundInCidRefsFile() throws Exception { + String pid = "dou.test.1"; + String cid = "abcdef123456789"; + fileHashStore.tagObject(pid, cid); + + // Create a cid refs file with a different pid from the one that is expected + String cidToWrite = "dou.test.2"; + File cidRefsTmpFile = fileHashStore.writeCidRefsFile(cidToWrite); + Path cidRefsTmpFilePath = cidRefsTmpFile.toPath(); + + // Get path of the pid refs file + Path pidRefsFilePath = fileHashStore.getRealPath(pid, "refs", "pid"); + + assertThrows(IOException.class, () -> { + fileHashStore.verifyHashStoreRefsFiles(pid, cid, pidRefsFilePath, cidRefsTmpFilePath); + }); + } + + /** + * Confirm that cid refs file has been updated successfully + */ + @Test + public void updateCidRefsFiles_content() throws Exception { + String pid = "dou.test.1"; + String cid = "abcdef123456789"; + fileHashStore.tagObject(pid, cid); + + // Get path of the cid refs file + Path cidRefsFilePath = fileHashStore.getRealPath(cid, "refs", "cid"); + + String pidAdditional = "dou.test.2"; + fileHashStore.updateCidRefsFiles("dou.test.2", cidRefsFilePath); + + List lines = Files.readAllLines(cidRefsFilePath); + boolean pidOriginal_foundInCidRefFiles = false; + boolean pidAdditional_foundInCidRefFiles = false; + for (String line : lines) { + if (line.equals(pidAdditional)) { + pidAdditional_foundInCidRefFiles = true; + } + if (line.equals(pid)) { + pidOriginal_foundInCidRefFiles = true; + } + } + assertTrue(pidOriginal_foundInCidRefFiles); + assertTrue(pidAdditional_foundInCidRefFiles); + } + + /** + * Check that deletePidRefsFile deletes file + */ + @Test + public void deletePidRefsFile_fileDeleted() throws Exception { + String pid = "dou.test.1"; + String cid = "abcdef123456789"; + fileHashStore.tagObject(pid, cid); + + fileHashStore.deletePidRefsFile(pid); + + Path pidRefsFilePath = fileHashStore.getRealPath(pid, "refs", "pid"); + assertFalse(Files.exists(pidRefsFilePath)); + } + + /** + * Check that deletePidRefsFile throws exception when there is no file to delete + */ + @Test + public void deletePidRefsFile_missingPidRefsFile() { + String pid = "dou.test.1"; + + assertThrows(FileNotFoundException.class, () -> { + fileHashStore.deletePidRefsFile(pid); + }); + } + + /** + * Check that deleteCidRefsPid deletes pid from its cid refs file + */ + @Test + public void deleteCidRefsPid_pidRemoved() throws Exception { + String pid = "dou.test.1"; + String cid = "abcdef123456789"; + fileHashStore.tagObject(pid, cid); + String pidAdditional = "dou.test.2"; + fileHashStore.tagObject(pidAdditional, cid); + + Path cidRefsFilePath = fileHashStore.getRealPath(cid, "refs", "cid"); + fileHashStore.deleteCidRefsPid(pid, cidRefsFilePath); + + assertFalse(fileHashStore.isPidInCidRefsFile(pid, cidRefsFilePath)); + } + + /** + * Check that deleteCidRefsPid removes all pids as expected and leaves an + * empty file. + */ + @Test + public void deleteCidRefsPid_allPidsRemoved() throws Exception { + String pid = "dou.test.1"; + String cid = "abcdef123456789"; + fileHashStore.tagObject(pid, cid); + String pidAdditional = "dou.test.2"; + fileHashStore.tagObject(pidAdditional, cid); + Path cidRefsFilePath = fileHashStore.getRealPath(cid, "refs", "cid"); + + fileHashStore.deleteCidRefsPid(pid, cidRefsFilePath); + fileHashStore.deleteCidRefsPid(pidAdditional, cidRefsFilePath); + + assertTrue(Files.exists(cidRefsFilePath)); + assertTrue(Files.size(cidRefsFilePath) == 0); + } + + /** + * Check that verifyObject returns true with good values + */ + @Test + public void verifyObject_correctValues() throws Exception { + for (String pid : testData.pidList) { + String pidFormatted = pid.replace("/", "_"); + Path testDataFile = testData.getTestFile(pidFormatted); + + InputStream dataStream = Files.newInputStream(testDataFile); + ObjectMetadata objInfo = fileHashStore.storeObject(dataStream); + + String defaultStoreAlgorithm = fhsProperties.getProperty("storeAlgorithm"); + + // Get verifyObject args + String expectedChecksum = testData.pidData.get(pid).get("sha256"); + long expectedSize = Long.parseLong(testData.pidData.get(pid).get("size")); + + boolean isObjectValid = fileHashStore.verifyObject( + objInfo, expectedChecksum, defaultStoreAlgorithm, expectedSize + ); + assertTrue(isObjectValid); + } + } + + /** + * Check that verifyObject returns false with mismatched size value + */ + @Test + public void verifyObject_mismatchedValuesBadSize() throws Exception { + for (String pid : testData.pidList) { + String pidFormatted = pid.replace("/", "_"); + Path testDataFile = testData.getTestFile(pidFormatted); + + InputStream dataStream = Files.newInputStream(testDataFile); + ObjectMetadata objInfo = fileHashStore.storeObject(dataStream); + + String defaultStoreAlgorithm = fhsProperties.getProperty("storeAlgorithm"); + + // Get verifyObject args + String expectedChecksum = testData.pidData.get(pid).get("sha256"); + long expectedSize = 123456789; + + boolean isObjectValid = fileHashStore.verifyObject( + objInfo, expectedChecksum, defaultStoreAlgorithm, expectedSize + ); + assertFalse(isObjectValid); + } + } + + /** + * Check that verifyObject returns false and does not delete the file when + * there is a mismatch + */ + @Test + public void verifyObject_mismatchedValuesObjectDeleted() throws Exception { + for (String pid : testData.pidList) { + String pidFormatted = pid.replace("/", "_"); + Path testDataFile = testData.getTestFile(pidFormatted); + + InputStream dataStream = Files.newInputStream(testDataFile); + ObjectMetadata objInfo = fileHashStore.storeObject(dataStream); + + String defaultStoreAlgorithm = fhsProperties.getProperty("storeAlgorithm"); + + // Get verifyObject args + String expectedChecksum = "intentionallyWrongValue"; + long expectedSize = Long.parseLong(testData.pidData.get(pid).get("size")); + + boolean isObjectValid = fileHashStore.verifyObject( + objInfo, expectedChecksum, defaultStoreAlgorithm, expectedSize + ); + assertFalse(isObjectValid); + + int storeDepth = Integer.parseInt(fhsProperties.getProperty("storeDepth")); + int storeWidth = Integer.parseInt(fhsProperties.getProperty("storeWidth")); + String actualCid = objInfo.getCid(); + String cidShardString = FileHashStoreUtility.getHierarchicalPathString( + storeDepth, storeWidth, actualCid + ); + Path objectStoreDirectory = rootDirectory.resolve("objects").resolve(cidShardString); + assertTrue(Files.exists(objectStoreDirectory)); + + } + } +}