You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SPARQL query fetched through HTTP request, returns a JSON listing of all Wikidata entities that are within the scope of the WikiProject Invasion Biology and have been tagged with an open license. Saved to file.
Post processing script (Deno):
reads JSON file of entries
loops through each entry
pulls Wikidata entity through CitationJS
checks if there is a DOI, if yes:
retrieves Crossref item
processes Wikidata (and Crossref if present) entity into XML and writes to file system
Toolforge server (that hosts OAI-PMH endpoint) webhook called that git pulls the updates onto the server
There's a github action set up to run this regularly though it has an issue that is stopping it from successfully running #14 and is probably not an effective way of running it (better to be run on demand - for instance, when entries are updated or a versioned dump is created)
It's a pretty rudimentary approach that has allowed for quick(ish) prototyping but is not a very satisfactory solution on a number of accounts:
a simple looping system was used to prevent overwhelming the Wikidata API endpoint (called through CitationJS) - batch processing should be possible but requires a fairly in-depth refactor
all entries are processed in order, whether or not they have changed - very time and processing inefficient. Some simple checks could be used to alleviate this, though this might miss updates in linked entries where the main entry hasn't changed.
rather ugly, would be far nicer and more maintainable to perform much of the system through a CitationJS plugin - using the current Wikidata plugin as a base
works on live data rather than a defined changeset - one idea would be to process from specific data dumps, or pull RDF (TTL) files of the SPARQL entries and work directly from these (which would also alleviate the Wikidata API bottleneck issue).
The text was updated successfully, but these errors were encountered:
The current processing pipeline:
There's a github action set up to run this regularly though it has an issue that is stopping it from successfully running #14 and is probably not an effective way of running it (better to be run on demand - for instance, when entries are updated or a versioned dump is created)
It's a pretty rudimentary approach that has allowed for quick(ish) prototyping but is not a very satisfactory solution on a number of accounts:
The text was updated successfully, but these errors were encountered: