diff --git a/README.md b/README.md index aa13352..5614efe 100644 --- a/README.md +++ b/README.md @@ -3,9 +3,9 @@ This repository consists of two modules: 1. unlv_image: A local implementation of the Islandora Image module to include additional metadata fields. -2. migrate_cdm: Uses the Migrate API to load Tiff masters, metadata from a CSV, and authority records from the Library of Congress. +2. migrate_cdm: Uses the Migrate API to load Tiff masters, metadata from a CSV (patterned after the [Move to Islandora Kit sample metadata CSV](https://github.com/MarcusBarnes/mik/blob/master/tests/assets/csv/sample_metadata.csv)), and MADS authority records from the Library of Congress. -*Note:* This proof-of-concept assumes that new content types will be created for each metadata profile/object type pair; ergo, this proof-of-concept includes a new content type called UNLV_image which parallel's islandora_image but adds some node entity references. However, the CLAW team is exploring alternative strategies for managing descriptive metadata which will make portions of this example out of date (hopefully soon). +*Note:* This proof-of-concept assumes that new content types will be created for each metadata profile/object type pair; ergo, this proof-of-concept includes a new content type called UNLV_image which parallel's islandora_image but adds some node entity references. However, CLAW is under active development which will cause this example to break from time to time. I intend to continue updating this example as the current dev-version of CLAW develops. # Source Data @@ -15,8 +15,23 @@ The source data used for this proof of concept came from the [Project Apollo Arc Note: using drush with migrate_tools is optional, but the instructions assume it is installed. -1. Copy the data directory to your drupal web root (e.g. in my tests the drupal web root is `/var/www/drupalvm/drupal/web` and the data directory is `/var/www/drupalvm/drupal/web/data`). -2. Copy the migrate_cdm and unlv_image directories to your modules directory. -3. Enable the modules. E.g. `drush en -y migrate_tools migrate_apollo`. -4. Run the migration. E.g. `drush mim --all`. -5. See a wonderful list of the newly migrated images on your Drupal site's front page! +0. Install the prerequisite modules (islandora_image, migrate_plus, and migrate_source_csv) and their dependencies. E.g. `composer require islandora/islandora_image drupal/migrate_tools:^4.0 drupal/migrate_source_csv`. +1. [Patch migrate_plus to allow looking up entities across multiple content types](https://www.drupal.org/project/migrate_plus/issues/2960251). +2. Copy the data directory to your drupal web root (e.g. in my tests the drupal web root is `/var/www/drupalvm/drupal/web` and the data directory is `/var/www/drupalvm/drupal/web/data`). +3. Copy the migrate_cdm and unlv_image directories to your modules directory. +4. Enable the modules. E.g. `drush en -y migrate_tools migrate_apollo`. +5. Run the migration. E.g. `drush mim --all`. +6. See a wonderful list of the newly migrated images on your Drupal site's front page! + +# The migrate_plus Patch + +Previously this example split out people that were subjects from topics that +were subjects. In that case we could perform entity lookups on each column for +the matching content type. + +The [Move to Islandora Kit sample metadata](https://github.com/MarcusBarnes/mik/blob/master/tests/assets/csv/sample_metadata.csv), +however, combines them into a single column. This requires us to perform a +single lookup across multiple content types, something the existing migrate_plus +module doesn't support. I've created a patch and issue to address the issue. +Until it is merged or some other solution is found, we will either have to +patch migrate_plus, or extend the process plugin for this small modification. diff --git a/data/apollo.csv b/data/apollo.csv index 38b67b1..53e6fe5 100644 --- a/data/apollo.csv +++ b/data/apollo.csv @@ -1,7 +1,8 @@ -AS11-36-5390,Apollo 11 Hasselblad image from film magazine 36/N - Trans-Lunar,"Neil took this picture of Buzz during their initial inspection of the LM at about 057:03. Journal Contributor David Sander notes that ""Buzz is wearing his intravehicular suit, a specially made set of garments designed to be as flame retardant as the rest of the ship, and made from the same fabric as the outer layer of the spacesuits"". Paolo Attivissimo notes that Buzz's watch reads 5:35 (Houston time), which is 57:03 GET (Ground Elapsed Time)","Aldrin, Buzz", -AS11-37-5528,"Apollo 11 Hasselblad image from film magazine 37/R - Orbit, Post-Landing, Post-EVA",,"Armstrong, Neil", -AS11-37-5545,"Apollo 11 Hasselblad image from film magazine 37/R - Orbit, Post-Landing, Post-EVA",,,Flags--United States -AS11-40-5850,Apollo 11 Hasselblad image from film magazine 40/S - EVA,"First EVA picture. Neil's first frame in a pan taken west of the ladder. Jettison bag under the Descent Stage, south footpad, bent probe, strut supports. The view is more or less up-Sun, so we are seeing the shadowed faces of boulders. 20 July 1969.",,Lunar excursion module;Moonwalk -AS11-40-5875,Apollo 11 Hasselblad image from film magazine 40/S - EVA,,"Aldrin, Buzz",Lunar excursion module;Flags--United States;Moonwalk -AS11-40-5903,Apollo 11 Hasselblad image from film magazine 40/S - EVA,,"Aldrin, Buzz",Moonwalk -AS11-44-6665,"Apollo 11 Hasselblad image from film magazine 44/V - LM inspection, rendezvous",,,Moon \ No newline at end of file +ID,Title,Date,Location,Subjects,Description,File +AS11-36-5390,Apollo 11 Hasselblad image from film magazine 36/N - Trans-Lunar,,,"Aldrin, Buzz","Neil took this picture of Buzz during their initial inspection of the LM at about 057:03. Journal Contributor David Sander notes that ""Buzz is wearing his intravehicular suit, a specially made set of garments designed to be as flame retardant as the rest of the ship, and made from the same fabric as the outer layer of the spacesuits"". Paolo Attivissimo notes that Buzz's watch reads 5:35 (Houston time), which is 57:03 GET (Ground Elapsed Time)",AS11-36-5390.tiff +AS11-37-5528,"Apollo 11 Hasselblad image from film magazine 37/R - Orbit, Post-Landing, Post-EVA",,,"Armstrong, Neil",,AS11-37-5528.tiff +AS11-37-5545,"Apollo 11 Hasselblad image from film magazine 37/R - Orbit, Post-Landing, Post-EVA",,,Flags--United States,,AS11-37-5545.tiff +AS11-40-5850,Apollo 11 Hasselblad image from film magazine 40/S - EVA,,,Lunar excursion module;Moonwalk,"First EVA picture. Neil's first frame in a pan taken west of the ladder. Jettison bag under the Descent Stage, south footpad, bent probe, strut supports. The view is more or less up-Sun, so we are seeing the shadowed faces of boulders. 20 July 1969.",AS11-40-5850.tiff +AS11-40-5875,Apollo 11 Hasselblad image from film magazine 40/S - EVA,,,"Aldrin, Buzz;Lunar excursion module;Flags--United States;Moonwalk",,AS11-40-5875.tiff +AS11-40-5903,Apollo 11 Hasselblad image from film magazine 40/S - EVA,,,"Aldrin, Buzz;Moonwalk",,AS11-40-5903.tiff +AS11-44-6665,"Apollo 11 Hasselblad image from film magazine 44/V - LM inspection, rendezvous",,,Moon,,AS11-44-6665.tiff diff --git a/migrate_apollo/config/install/migrate_plus.migration.claw_file.yml b/migrate_apollo/config/install/migrate_plus.migration.claw_file.yml index 24ca295..0f85c48 100644 --- a/migrate_apollo/config/install/migrate_plus.migration.claw_file.yml +++ b/migrate_apollo/config/install/migrate_plus.migration.claw_file.yml @@ -6,19 +6,20 @@ source: plugin: csv path: 'data/apollo.csv' # Path relative to Drupal site root delimiter: ',' - header_row_count: 0 # No headers, 1 if there are headers + header_row_count: 1 # headers, 0 if there are no headers keys: - digital_id constants: source_base_dir: 'data/images' collection_alias: 'apollo' dest_base_dir: 'public://masters' - extension: 'tiff' column_names: 0: - digital_id: 'Digital ID' # basename of the file + digital_id: 'Digital ID' # identifier key 1: title: 'Title' # Used for title and alt-text + 6: + file: 'File' process: @@ -30,19 +31,12 @@ process: plugin: default_value default_value: image - filename: - plugin: concat - delimiter: '.' - source: - - digital_id - - constants/extension - - source_full_path: + source_file_path: plugin: concat delimiter: / source: - constants/source_base_dir - - '@filename' + - file destination_file_path: plugin: concat @@ -50,12 +44,12 @@ process: source: - constants/dest_base_dir - constants/collection_alias - - '@filename' + - file uri: plugin: file_copy source: - - '@source_full_path' #where it is + - '@source_file_path' #where it is - '@destination_file_path' #where we want it destination: diff --git a/migrate_apollo/config/install/migrate_plus.migration.claw_image.yml b/migrate_apollo/config/install/migrate_plus.migration.claw_image.yml index 7d3978f..a13a291 100644 --- a/migrate_apollo/config/install/migrate_plus.migration.claw_image.yml +++ b/migrate_apollo/config/install/migrate_plus.migration.claw_image.yml @@ -15,22 +15,26 @@ source: plugin: csv path: 'data/apollo.csv' # Path relative to Drupal site root delimiter: ',' - header_row_count: 0 # No headers, 1 if there are headers + header_row_count: 1 # headers, 0 if there are no headers keys: - digital_id constants: collection_alias: 'apollo' - column_names: # Based on Welcome Home, Howard + column_names: 0: digital_id: 'Digital ID' 1: title: 'Title' 2: - description: 'Description' + date: 'Date' # Ignoring for now. 3: - subject_person: 'Identified Individual' + location: 'Location' # Ignoring for now. 4: subjects: 'Subjects' + 5: + description: 'Description' + 6: + file: 'File' destination: # We're creating nodes, ya'll. plugin: entity:node @@ -53,29 +57,13 @@ process: - constants/collection_alias - digital_id - # SUBJECTS (similar to CREATORS above) - # Since subjects can be of multiple content types we need to perform - # lookups for each type, assign them to a temp array, and recombine - # them all before assigning them to the appropriate entity reference field. - - temp_subjects_person: # Temporary array of person entity refs - - - plugin: skip_on_empty # Don't bother if there aren't any values - source: subject_person - method: process # Only this field, not the whole CSV row - - # Account for multiple entries in a cell delimited by ; - plugin: explode # Note: no whitespace trimming or quoting support is provided! Be careful with leading or trailing spaces between values in your source data! - delimiter: ';' - - - plugin: entity_generate - value_key: title - bundle_key: type - bundle: person - entity_type: node - default_values: - type: person + # SUBJECTS + # For subjects we can't find in the system, we can't determine by their value + # if they should be persons, corporate, families, or topics. This defaults to + # creating them as subject nodes (topics). We may opt for NOT creating them + # and reporting out the issue at a later time, if that makes sense. - temp_subjects: # Temporary field of subjects + field_subjects: # Temporary field of subjects - plugin: skip_on_empty source: subjects @@ -84,23 +72,10 @@ process: plugin: explode delimiter: ';' - - plugin: entity_generate - value_key: title - bundle: subject - bundle_key: type - entity_type: node + plugin: entity_generate # Create a subject entity if it doesn't already exist default_values: type: subject - field_subjects: # Gather temp arrays into the destination field - - - plugin: get - source: - - '@temp_subjects_person' - - '@temp_subjects' - - - plugin: flatten # an array of arrays to a flat array of entity refs - # Now the TIFF entity references field_tiff/target_id: plugin: migration_lookup diff --git a/migrate_apollo/config/install/migrate_plus.migration.claw_media.yml b/migrate_apollo/config/install/migrate_plus.migration.claw_media.yml index 2dc72e1..4a2a9ce 100644 --- a/migrate_apollo/config/install/migrate_plus.migration.claw_media.yml +++ b/migrate_apollo/config/install/migrate_plus.migration.claw_media.yml @@ -10,12 +10,12 @@ source: plugin: csv path: 'data/apollo.csv' # Path relative to Drupal site root delimiter: ',' - header_row_count: 0 # No headers, 1 if there are headers + header_row_count: 1 # headers, 0 if there are no headers keys: - digital_id column_names: 0: - digital_id: 'Digital ID' # basename of the file + digital_id: 'Digital ID' # identifier key process: mid: