Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address oai.samvera validation issues #514

Merged
merged 4 commits into from
Dec 20, 2023
Merged

Conversation

lthurston
Copy link
Contributor

No description provided.

@lthurston lthurston self-assigned this Sep 18, 2023
@lthurston lthurston linked an issue Sep 18, 2023 that may be closed by this pull request
@christinklez christinklez added this to the #4 CIC work milestone Sep 20, 2023
@lthurston lthurston marked this pull request as ready for review October 5, 2023 21:38
@amywieliczka
Copy link
Collaborator

I'm confused why there's so much happening in this mapper but there was so little happening in the legacy mapper: https://github.com/calisphere-legacy-harvester/dpla-ingestion/blob/cfe3dcb06008c0c6cb9d8207fe28bfaa1a855e4f/lib/mappers/ucsc_oai_mapper.py#L6 Are there things that should be happening in the base OAI mapper class that aren't???

@lthurston lthurston force-pushed the oai.samvera_validation branch 2 times, most recently from 40e87f3 to 3ef092d Compare November 13, 2023 22:14
@lthurston
Copy link
Contributor Author

@christinklez @aturner @amywieliczka @barbarahui

I'm attaching a screenshot and a full CSV of the validation report from collection 154 after fixing a few things. There are some questions:

  • Content mismatch on identifier: the uclamss_1387_* values don't show up in the vernacular, so this feels like data drift. I do see similar strings in filenames, but it would require transformation to generate these.
  • Content mismatch on is_shown_at: URL domain was ursus.library.ucla.edu, now is digital.library.ucla.edu. I can write a validation rule to ignore this if that's helpful.
  • Content mismatch on date: there are two date formats provided, mapping both seems correct to me, but I can modify the logic for this mapper if desired
  • Content mismatch on subject: different order of list items

Screenshot 2023-11-13 at 2 17 38 PM
11-13-2023T21-54-23.csv

@aturner
Copy link
Collaborator

aturner commented Nov 14, 2023

@lthurston In looking at what we've previously harvested from the old fetcher/mapper vs. Rikolti results, it looks like the first 3 items can be attributed to data drift.

UCLA has been migrating content across new systems, so the change to the item landing page/edm:isShownAt URL makes sense.

We are AOK with the multiple dates, and also the reordered subjects.

@lthurston lthurston marked this pull request as draft November 20, 2023 15:34
@lthurston lthurston force-pushed the oai.samvera_validation branch 2 times, most recently from 7aaff59 to c9afb77 Compare December 2, 2023 20:11
@amywieliczka amywieliczka marked this pull request as ready for review December 20, 2023 22:11
@amywieliczka
Copy link
Collaborator

Marking this as ready for review and merging as these updates make sense for a re-inventory of OAI mappers.

@amywieliczka amywieliczka merged commit f38bcd8 into main Dec 20, 2023
2 checks passed
@amywieliczka amywieliczka deleted the oai.samvera_validation branch February 28, 2024 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

oai.samvera mapper validation results
5 participants