-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chapman Mapper Refinements #483
Conversation
See: #26387 (08-11); #26386 (08-11); #26286 (08-11); #26339 (08-11); #26290 (08-11)
|
1e06e88
to
f969b05
Compare
The issue is that the type is
Maybe instead of looking at |
Done.
The extra spaces are in the source record:
and
No action taken on this one.
Terminal apostrophe does appear in the source record, encoded as
The slashes reported in the legacy and Rikolti description are escaped tab characters, which occur in the source data as
Looks like an encoding issue. I'll save this one for last.
Done. Spatial is getting
Done. |
062744a
to
1535da4
Compare
1535da4
to
9062417
Compare
This required a change to the Fetcher class. I can move this change to another branch / PR if desired. UPDATE: the fetcher is fixed, but the mapper is still writing files with escaped utf-8 characters. I'm looking into this. UPDATE again: the mapper is now fixed, too, in the base mapper class. This is worth a good sanity check by @amywieliczka / @barbarahui. Again, I can shift these changes to another PR since this isn't specific to chapman, if you'd like. |
Hmm, same question here as in #514 re: additions to this child mapper that don't happen in the legacy mapper and maybe should happen in the parent OAI mapper? I recall seeing source and language in the Samvera mapper PR as well. |
I guess we could just play whack-a-mole now and then do some analysis of what we can pull into the base mapper class later, re-run the validation reports after factoring out commonly implemented functions, and diff them against the approved validation reports? |
Just did some spot checking, and thinking out loud here.
(I was going to look up another example, but am unable to generate reports right now. Curious if the examples above, which are all using the base OAI mapper, suggest enough variance that we should take the whack-a-mole approach for now?) |
Hmm, at least for language, it does look like this is getting mapped in the base class in the legacy harvester (https://github.com/calisphere-legacy-harvester/dpla-ingestion/blob/cfe3dcb06008c0c6cb9d8207fe28bfaa1a855e4f/lib/mappers/dublin_core_mapper.py#L97). I would expect to see language come up in the validation reports for all OAI feeds that have language data. So I think there's actually a good argument for not playing whack-a-mole and instead addressing language at the root level by adding the mapping to the OAI mapper. I don't actually see a mapping for source happening anywhere in the legacy mapper, so I suspect that's coming from an enrichment chain? I think we should probably figure that one out sooner rather than later as well. |
Ahh, found the "source" mapping here: https://github.com/calisphere-legacy-harvester/dpla-ingestion/blob/cfe3dcb06008c0c6cb9d8207fe28bfaa1a855e4f/lib/mappers/mapper.py#L171 I don't think we have a "dataProvider" equivalent in Rikolti just yet - perhaps this is a case where whack-a-mole makes the most sense. |
Hi all, I'm a little late to the party, apologies. I was playing whack-a-mole, but the thought did occur to me that if several mappers ended up with the same, or similar mapping configuration that it would go in the parent class. What I'll do, if you'd like, is to go through the mappers that Christine references above (chapman, samvera, up, chico) and run validations on a few collections of each. Using validations to compare has been a much better way to verify the results of these mappers than how I was doing it previously, and by doing them all at the same time with an eye toward abstracting the language and source mappings to the parent will make that more likely. Amy said:
Would you mind explaining why this is a case where we do want to whack-a-mole? Won't someone think of the poor moles?! |
So I just went down a very long rabbit hole trying to figure out where the heck It seems that by default, See: https://github.com/ucldc/harvester/blob/master/harvester/solr_updater.py#L123 and https://github.com/ucldc/harvester/blob/master/harvester/solr_updater.py#L718-L740 If the original record (aka vernacular metadata) has the field Anyway, I think we should move this mapping into the Rikolti OAI mapper (and other base mappers as necessary)? Spare the moles! |
@lthurston Confirming that map_source() should be implemented in the OAI mapper class. Also, for future mappers where we encounter this issue, please implement map_source() in the base mapper class for that type. |
fdde511
to
d5793bd
Compare
d5793bd
to
caa212a
Compare
caa212a
to
2b27b0c
Compare
d708ff6
to
ec4f8a6
Compare
Reformat code as well
ec4f8a6
to
54bdcfa
Compare
Commented out some of the type handling here and will address type more holistically. Otherwise, this all looks good, so I'm merging it in! |
@lthurston I ran the Chapman mapper through the validator and encountered several validation errors. This branch (and PR) is a start at resolving some of the obvious ones:
@christinklez as you continue to evaluate the Chapman mapper, could you reference this PR in the issues you create? @lthurston could you work on this branch to refine the Chapman mapper?
I'm proposing that this PR becomes a place to centralize ongoing Chapman mapper conversation till such a time that we've approved the Chapman mapper, while individual issues might have a shorter life span and would spec out the details of specific issues - what do you both think?