-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Europeana images may change the direct URL which cause broken images to be displayed in Openverse #3772
Comments
Since Europeana is an aggregator, I suspect that all of the images from this particular source might have been affected (given they're all producing the same I've run the following to see how pervasive this issue is: deploy@localhost:openledger> select count(*) from image where provider='europeana' and STARTS_WITH(url, 'https://www.muis.ee');
+--------+
| count |
|--------|
| 143600 |
+--------+
SELECT 1
Time: 313.911s (5 minutes 13 seconds), executed in: 313.905s (5 minutes 13 seconds) This seems like something that could be addressed in a batched update, if we could figure out how to correct the URLs! |
Diving into the result above, it looks like all of the related URLs differ now:
Because these are all unique UUIDs, it doesn't look like we can derive those values in a way that could be updated using the batched update 😞 Maybe the best option would be to use the |
Following up in this thread from an in-person conversation: I think this sounds good, but noting that because Europeana does not have a traditional reingestion DAG we'd want to look into whether there's a reasonable range of dates we could re-run the DAG for to cover all images from this domain. |
I believe I've found a suitable
|
Confirmed that that should work! I ran this locally and ingested 250 records, all of which were from the Estonian National Museum. We should be able to run this triggered DAG next week!
|
The ingestion completed (DAG run link), but only ingested 21,020 records 😕 We did get a data refresh after that, but even with the updated record the primary URL is still showing the "not found" thumbnail 😖 https://api.openverse.engineering/v1/images/f8c86a20-eb9c-4ffc-9a06-3664151dbce6/ I'm going to try and see if I can isolate this exact result in a query, and see if Europeana is giving us incorrect URLs. |
I've narrowed down a set of query parameters that reflects the images affected above:
This returns 7 results, one of which is the result shared above, here's the full contents of the response body:
We use the |
I've emailed the folks at Europeana directly to ask them about this issue. |
@AetherUnbound did you ever hear back from Europeana? If not, should we recheck the response and see if the problem still exists, and then if so, reach back out to Europeana and potentially the museum itself? Maybe the data issue goes upstream from Europeana and the institution would be able to help (and maybe more responsive). |
Description
Apparently, some Europeana images can change their direct link while remaining available through their landing page. This is a problem for us because it seems the Data Refresh process is not updating this value (I haven't confirmed it).
Observe this image for example: https://api.openverse.engineering/v1/images/f8c86a20-eb9c-4ffc-9a06-3664151dbce6/
Reproduction
Screenshots
Additional context
Sentry issue.
The text was updated successfully, but these errors were encountered: