Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cleanup_value enrichment #532

Merged
merged 2 commits into from
Oct 11, 2023
Merged

Fix cleanup_value enrichment #532

merged 2 commits into from
Oct 11, 2023

Conversation

lthurston
Copy link
Contributor

@amywieliczka @barbarahui

I'm deleting the other PR related to this enrichment, as it's not particularly useful.

@amywieliczka
Copy link
Collaborator

amywieliczka commented Oct 10, 2023

The cleanup_values enrichment updates implemented here did resolve many "errors" in the validation reports for Flickr collections, but as it turns out, many of these errors were actually desirable, so the cleanup_values enrichment updates did wind up having some mildly adverse affects on the already approved Flickr metadata. Since these adverse affects aren't any worse than what is already published in Calisphere, though, we're moving a cleanup_values analysis into post-MVP. Specifically, at this point in time, cleanup_values removes many trailing periods from the title field for Flickr collections.

This spreadsheet https://docs.google.com/spreadsheets/d/19vvscRbGSqztr7mDBoL4d3KTBZrC8LeK68dETspl0tE/edit?usp=sharing lists all validation errors that had previously existed in the Flickr validation reports and, upon merging this PR, no longer exist in the Flickr validation reports. There were no new errors reported. There was one error that transformed: the legacy date value for this record is None, and the Rikolti value prior to this update was ['Dec.'], but the Rikolti value after this update is ['Dec']:

"27324--21445057170","date","None","['Dec.']","https://calisphere.org/search/?q=27324--21445057170","https://harvest-prd.cdlib.org/solr/dc-collection/select?q=harvest_id_s%3A%2227324--21445057170%22&wt=json&indent=true","https://harvest-prd.cdlib.org/couchdb/_utils/document.html?ucldc/27324--21445057170","https://calisphere-test.cdlib.org/search/?q=27324--21445057170","https://harvest-stg.cdlib.org/solr/dc-collection/select?q=harvest_id_s%3A%2227324--21445057170%22&wt=json&indent=true","https://harvest-stg.cdlib.org/couchdb/_utils/document.html?ucldc/27324--21445057170"
"27324--21445057170","date","None","['Dec']","https://calisphere.org/search/?q=27324--21445057170","https://harvest-prd.cdlib.org/solr/dc-collection/select?q=harvest_id_s%3A%2227324--21445057170%22&wt=json&indent=true","https://harvest-prd.cdlib.org/couchdb/_utils/document.html?ucldc/27324--21445057170","https://calisphere-test.cdlib.org/search/?q=27324--21445057170","https://harvest-stg.cdlib.org/solr/dc-collection/select?q=harvest_id_s%3A%2227324--21445057170%22&wt=json&indent=true","https://harvest-stg.cdlib.org/couchdb/_utils/document.html?ucldc/27324--21445057170"

Creating a GitHub issue for the future cleanup_values analysis and linking this PR and comment.

@amywieliczka amywieliczka merged commit 0493e59 into main Oct 11, 2023
2 checks passed
@amywieliczka amywieliczka deleted the cleanup-value-enrichment branch October 11, 2023 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants