Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement XML fetcher / PastPerfect mapper #468

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Implement XML fetcher / PastPerfect mapper #468

wants to merge 2 commits into from

Conversation

lthurston
Copy link
Contributor

No description provided.

@lthurston
Copy link
Contributor Author

@amywieliczka, if you want to take a sneak peek at this XML fetcher, I invite you to do so. It works, and I fetched collection 26935 of the reported 77k records in about 20 seconds locally. It reported there are actually more than 110k records though, so there might be an issue there, or maybe there's actually more records.

I haven't written any mapping code yet, so I consider this to be a little naive, a little optimistic, but nevertheless it does what it's supposed to do. Let me know your thoughts!

@aturner
Copy link
Collaborator

aturner commented Jul 19, 2023

@lthurston I think our legacy harvester code has some logic built in to leave out "metadata only" records; the source collection has some items that don't have a digital image -- just metadata records only. That may account for the count difference that you're seeing

@lthurston
Copy link
Contributor Author

@aturner That makes sense, thanks for the explanation. My instinct is to leave those records in our imported files in order to stay as true to the original source data as possible (despite the fact that we have to rewrite it to paginate), but am only too happy to be overruled.

@lthurston lthurston changed the title [WIP] Implement xml_file fetcher Implement XML fetcher / PastPerfect mapper Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PastPerfectXMLMapper(Mapper) -- paused Fetcher: XML -- paused
3 participants