Skip to content

Commit

Permalink
update to use mik-style CSV
Browse files Browse the repository at this point in the history
Now using Move to Islandora Kit sample CSV structure for the descriptive metadata.
  • Loading branch information
seth-shaw-unlv committed Apr 11, 2018
1 parent 95d52f9 commit 7413194
Show file tree
Hide file tree
Showing 5 changed files with 55 additions and 70 deletions.
29 changes: 22 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
This repository consists of two modules:

1. unlv_image: A local implementation of the Islandora Image module to include additional metadata fields.
2. migrate_cdm: Uses the Migrate API to load Tiff masters, metadata from a CSV, and authority records from the Library of Congress.
2. migrate_cdm: Uses the Migrate API to load Tiff masters, metadata from a CSV (patterned after the [Move to Islandora Kit sample metadata CSV](https://github.com/MarcusBarnes/mik/blob/master/tests/assets/csv/sample_metadata.csv)), and MADS authority records from the Library of Congress.

*Note:* This proof-of-concept assumes that new content types will be created for each metadata profile/object type pair; ergo, this proof-of-concept includes a new content type called UNLV_image which parallel's islandora_image but adds some node entity references. However, the CLAW team is exploring alternative strategies for managing descriptive metadata which will make portions of this example out of date (hopefully soon).
*Note:* This proof-of-concept assumes that new content types will be created for each metadata profile/object type pair; ergo, this proof-of-concept includes a new content type called UNLV_image which parallel's islandora_image but adds some node entity references. However, CLAW is under active development which will cause this example to break from time to time. I intend to continue updating this example as the current dev-version of CLAW develops.

# Source Data

Expand All @@ -15,8 +15,23 @@ The source data used for this proof of concept came from the [Project Apollo Arc

Note: using drush with migrate_tools is optional, but the instructions assume it is installed.

1. Copy the data directory to your drupal web root (e.g. in my tests the drupal web root is `/var/www/drupalvm/drupal/web` and the data directory is `/var/www/drupalvm/drupal/web/data`).
2. Copy the migrate_cdm and unlv_image directories to your modules directory.
3. Enable the modules. E.g. `drush en -y migrate_tools migrate_apollo`.
4. Run the migration. E.g. `drush mim --all`.
5. See a wonderful list of the newly migrated images on your Drupal site's front page!
0. Install the prerequisite modules (islandora_image, migrate_plus, and migrate_source_csv) and their dependencies. E.g. `composer require islandora/islandora_image drupal/migrate_tools:^4.0 drupal/migrate_source_csv`.
1. [Patch migrate_plus to allow looking up entities across multiple content types](https://www.drupal.org/project/migrate_plus/issues/2960251).
2. Copy the data directory to your drupal web root (e.g. in my tests the drupal web root is `/var/www/drupalvm/drupal/web` and the data directory is `/var/www/drupalvm/drupal/web/data`).
3. Copy the migrate_cdm and unlv_image directories to your modules directory.
4. Enable the modules. E.g. `drush en -y migrate_tools migrate_apollo`.
5. Run the migration. E.g. `drush mim --all`.
6. See a wonderful list of the newly migrated images on your Drupal site's front page!

# The migrate_plus Patch

Previously this example split out people that were subjects from topics that
were subjects. In that case we could perform entity lookups on each column for
the matching content type.

The [Move to Islandora Kit sample metadata](https://github.com/MarcusBarnes/mik/blob/master/tests/assets/csv/sample_metadata.csv),
however, combines them into a single column. This requires us to perform a
single lookup across multiple content types, something the existing migrate_plus
module doesn't support. I've created a patch and issue to address the issue.
Until it is merged or some other solution is found, we will either have to
patch migrate_plus, or extend the process plugin for this small modification.
15 changes: 8 additions & 7 deletions data/apollo.csv
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
AS11-36-5390,Apollo 11 Hasselblad image from film magazine 36/N - Trans-Lunar,"Neil took this picture of Buzz during their initial inspection of the LM at about 057:03. Journal Contributor David Sander notes that ""Buzz is wearing his intravehicular suit, a specially made set of garments designed to be as flame retardant as the rest of the ship, and made from the same fabric as the outer layer of the spacesuits"". Paolo Attivissimo notes that Buzz's watch reads 5:35 (Houston time), which is 57:03 GET (Ground Elapsed Time)","Aldrin, Buzz",
AS11-37-5528,"Apollo 11 Hasselblad image from film magazine 37/R - Orbit, Post-Landing, Post-EVA",,"Armstrong, Neil",
AS11-37-5545,"Apollo 11 Hasselblad image from film magazine 37/R - Orbit, Post-Landing, Post-EVA",,,Flags--United States
AS11-40-5850,Apollo 11 Hasselblad image from film magazine 40/S - EVA,"First EVA picture. Neil's first frame in a pan taken west of the ladder. Jettison bag under the Descent Stage, south footpad, bent probe, strut supports. The view is more or less up-Sun, so we are seeing the shadowed faces of boulders. 20 July 1969.",,Lunar excursion module;Moonwalk
AS11-40-5875,Apollo 11 Hasselblad image from film magazine 40/S - EVA,,"Aldrin, Buzz",Lunar excursion module;Flags--United States;Moonwalk
AS11-40-5903,Apollo 11 Hasselblad image from film magazine 40/S - EVA,,"Aldrin, Buzz",Moonwalk
AS11-44-6665,"Apollo 11 Hasselblad image from film magazine 44/V - LM inspection, rendezvous",,,Moon
ID,Title,Date,Location,Subjects,Description,File
AS11-36-5390,Apollo 11 Hasselblad image from film magazine 36/N - Trans-Lunar,,,"Aldrin, Buzz","Neil took this picture of Buzz during their initial inspection of the LM at about 057:03. Journal Contributor David Sander notes that ""Buzz is wearing his intravehicular suit, a specially made set of garments designed to be as flame retardant as the rest of the ship, and made from the same fabric as the outer layer of the spacesuits"". Paolo Attivissimo notes that Buzz's watch reads 5:35 (Houston time), which is 57:03 GET (Ground Elapsed Time)",AS11-36-5390.tiff
AS11-37-5528,"Apollo 11 Hasselblad image from film magazine 37/R - Orbit, Post-Landing, Post-EVA",,,"Armstrong, Neil",,AS11-37-5528.tiff
AS11-37-5545,"Apollo 11 Hasselblad image from film magazine 37/R - Orbit, Post-Landing, Post-EVA",,,Flags--United States,,AS11-37-5545.tiff
AS11-40-5850,Apollo 11 Hasselblad image from film magazine 40/S - EVA,,,Lunar excursion module;Moonwalk,"First EVA picture. Neil's first frame in a pan taken west of the ladder. Jettison bag under the Descent Stage, south footpad, bent probe, strut supports. The view is more or less up-Sun, so we are seeing the shadowed faces of boulders. 20 July 1969.",AS11-40-5850.tiff
AS11-40-5875,Apollo 11 Hasselblad image from film magazine 40/S - EVA,,,"Aldrin, Buzz;Lunar excursion module;Flags--United States;Moonwalk",,AS11-40-5875.tiff
AS11-40-5903,Apollo 11 Hasselblad image from film magazine 40/S - EVA,,,"Aldrin, Buzz;Moonwalk",,AS11-40-5903.tiff
AS11-44-6665,"Apollo 11 Hasselblad image from film magazine 44/V - LM inspection, rendezvous",,,Moon,,AS11-44-6665.tiff
22 changes: 8 additions & 14 deletions migrate_apollo/config/install/migrate_plus.migration.claw_file.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,20 @@ source:
plugin: csv
path: 'data/apollo.csv' # Path relative to Drupal site root
delimiter: ','
header_row_count: 0 # No headers, 1 if there are headers
header_row_count: 1 # headers, 0 if there are no headers
keys:
- digital_id
constants:
source_base_dir: 'data/images'
collection_alias: 'apollo'
dest_base_dir: 'public://masters'
extension: 'tiff'
column_names:
0:
digital_id: 'Digital ID' # basename of the file
digital_id: 'Digital ID' # identifier key
1:
title: 'Title' # Used for title and alt-text
6:
file: 'File'


process:
Expand All @@ -30,32 +31,25 @@ process:
plugin: default_value
default_value: image

filename:
plugin: concat
delimiter: '.'
source:
- digital_id
- constants/extension

source_full_path:
source_file_path:
plugin: concat
delimiter: /
source:
- constants/source_base_dir
- '@filename'
- file

destination_file_path:
plugin: concat
delimiter: /
source:
- constants/dest_base_dir
- constants/collection_alias
- '@filename'
- file

uri:
plugin: file_copy
source:
- '@source_full_path' #where it is
- '@source_file_path' #where it is
- '@destination_file_path' #where we want it

destination:
Expand Down
55 changes: 15 additions & 40 deletions migrate_apollo/config/install/migrate_plus.migration.claw_image.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,22 +15,26 @@ source:
plugin: csv
path: 'data/apollo.csv' # Path relative to Drupal site root
delimiter: ','
header_row_count: 0 # No headers, 1 if there are headers
header_row_count: 1 # headers, 0 if there are no headers
keys:
- digital_id
constants:
collection_alias: 'apollo'
column_names: # Based on Welcome Home, Howard
column_names:
0:
digital_id: 'Digital ID'
1:
title: 'Title'
2:
description: 'Description'
date: 'Date' # Ignoring for now.
3:
subject_person: 'Identified Individual'
location: 'Location' # Ignoring for now.
4:
subjects: 'Subjects'
5:
description: 'Description'
6:
file: 'File'

destination: # We're creating nodes, ya'll.
plugin: entity:node
Expand All @@ -53,29 +57,13 @@ process:
- constants/collection_alias
- digital_id

# SUBJECTS (similar to CREATORS above)
# Since subjects can be of multiple content types we need to perform
# lookups for each type, assign them to a temp array, and recombine
# them all before assigning them to the appropriate entity reference field.

temp_subjects_person: # Temporary array of person entity refs
-
plugin: skip_on_empty # Don't bother if there aren't any values
source: subject_person
method: process # Only this field, not the whole CSV row
- # Account for multiple entries in a cell delimited by ;
plugin: explode # Note: no whitespace trimming or quoting support is provided! Be careful with leading or trailing spaces between values in your source data!
delimiter: ';'
-
plugin: entity_generate
value_key: title
bundle_key: type
bundle: person
entity_type: node
default_values:
type: person
# SUBJECTS
# For subjects we can't find in the system, we can't determine by their value
# if they should be persons, corporate, families, or topics. This defaults to
# creating them as subject nodes (topics). We may opt for NOT creating them
# and reporting out the issue at a later time, if that makes sense.

temp_subjects: # Temporary field of subjects
field_subjects: # Temporary field of subjects
-
plugin: skip_on_empty
source: subjects
Expand All @@ -84,23 +72,10 @@ process:
plugin: explode
delimiter: ';'
-
plugin: entity_generate
value_key: title
bundle: subject
bundle_key: type
entity_type: node
plugin: entity_generate # Create a subject entity if it doesn't already exist
default_values:
type: subject

field_subjects: # Gather temp arrays into the destination field
-
plugin: get
source:
- '@temp_subjects_person'
- '@temp_subjects'
-
plugin: flatten # an array of arrays to a flat array of entity refs

# Now the TIFF entity references
field_tiff/target_id:
plugin: migration_lookup
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ source:
plugin: csv
path: 'data/apollo.csv' # Path relative to Drupal site root
delimiter: ','
header_row_count: 0 # No headers, 1 if there are headers
header_row_count: 1 # headers, 0 if there are no headers
keys:
- digital_id
column_names:
0:
digital_id: 'Digital ID' # basename of the file
digital_id: 'Digital ID' # identifier key

process:
mid:
Expand Down

0 comments on commit 7413194

Please sign in to comment.