Skip to content

The Application Interface

Andrew Berger edited this page Oct 23, 2023 · 20 revisions

Using the Preassembly web form

Once your content is staged and ready for deposit, it's time to fill out the Preassembly web form. The web form is how you start the actual deposit process. The information on the form tells the Preassembly application:

  • what kind of content is being deposited
  • how to process the files
  • where to go to find the files (i.e. the staging location)

Accessioning is a two step process:

  1. Fill out the form and start what's called a "discovery report" job
  • This runs a set of checks with the goal of identifying potential problems with your accession
  • The discovery report will flag errors that would cause problems in processing
  • You should address any errors before moving on to run Preassembly
  1. After receiving an error-free discovery report, run a Preassembly job
  • The Preassembly job is the job that actually sends files into the SDR

Small Preassembly jobs, consisting of a few items and less than 1 GB of content, can finish in a matter of minutes. Large Preassembly jobs, consisting of hundreds or thousands of items and multiple terabytes of content will run for days. You do not need to keep the application window open while discovery report and Preassembly jobs are running. Preassembly will email you when your jobs are complete.


The Job Form

Screen Shot 2023-09-28 at 3 03 33 PM

  • To prepare a job, supply the following information

    • Project name [required]
      • This must be unique for a specific user, but does not need to be universally unique. You will get an error if you have already submitted a job with an identical name
      • Project names cannot include spaces. The allowable characters are: A-Z, a-z, 0-9, hyphen and underscore.
      • Hint: It's often easiest to use the accessioning ticket number for the Project Name - and makes it easier for the accessioning manager to track your job
    • Job type [required]
      • You must always start with the Discovery Report in order to make sure that your job is valid and able to be accessioned
      • Do not choose Pre Assembly Run without first running a discovery report.
    • Content structure [required]
      • Image is for any non-book-like image materials
      • Book covers book-like materials (choose ltr or rtl for left-to-right or right-to-left orientation books)
      • Document is for documents
      • File is for objects that should be file-download only
      • Media is for material that should be presented in the embedded streaming viewer (audio or visual). Note that if you use the media content type, you must supply a file manifest. (See instructions)
        • 3d is for 3d objects
        • Map is for map objects
        • 'Webarchive Seed` is for web archives
      • If your content doesn't easily fit into one of these categories, please consult with the Repository Manager before accessioning the content.
    • Staging location [required]
      • The staging directory is where your content is stored - this will always take the form /{storageMount}/{projectDirectory}/{contentDirectory}
        • Note the preceding slash in the above model
        • Your {storageMount} value must be in the pre-approved list (see Consul) or you will get an almost immediate error
    • Processing configuration [required]
      • Default puts each file in a separate resource in the digital object
      • Group by filename bundles files with the same filename but different extensions into a single resource (for instance foo.tif and foo.jp2), so this is the preferred method for image and book projects
      • Group by filename (with pre-existing OCR) uses a special manifest to organize the content into custom resource dispositions
      • I have a file manifest - check this box if you are supplying an optional file manifest for a complex accessioning project. Consult this page for more information: https://github.com/sul-dlss/pre-assembly/wiki/Accessioning-images-with-captions-(labels)
    • Preserve, Shelve, Publish Settings [required]
      • Default uses the publish, shelve, and preserve settings that are appropriate to most pre-assembly projects. These settings generally distinguish "access" files from "preservation master" files based on file type and only make the access files available on the Purl. Note that if using the media content type, you should use this setting: it will not affect the settings in your media_manifest.csv file.
      • Preserve=Yes, Shelve=Yes, Publish=Yes will make all files in your pre-assembly job available from the Purl as well as send them to preservation storage. Only use this setting if you know that the default settings will not be appropriate for your pre-assembly job, and that all files in the job should be made available online. If in doubt about whether to use this setting, please contact the Repository Manager.
  • Once you have filled out the form correctly and clicked the Submit button, a job will be created and will appear in the right-hand Recent jobs column.

  • You can refresh the page, impatiently, to watch your job status changes from Waiting to Running to Completed. OR you can just walk away... the system will email you when your report (or preassembly run) is complete. Note that the page does not auto-refresh when a job status changes, you need to refresh the webpage yourself. Note that if the entire job fails to start or the job completes but there was an error during the run, the status will indicate this.