Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminology sources that share a prefix cannot be identified #29

Open
coret opened this issue Nov 26, 2023 · 5 comments
Open

Terminology sources that share a prefix cannot be identified #29

coret opened this issue Nov 26, 2023 · 5 comments

Comments

@coret
Copy link
Contributor

coret commented Nov 26, 2023

I've made a Network visualization of the terminology sources used by datasets:
https://data.netwerkdigitaalerfgoed.nl/coret/-/queries/kg-termen-netwerk/

In this network I see several datasets using KB's Brinkman Thesaurus, among others by the Gouda Timemachine ("13000" GTM). But, as can be seen by quering the source @ GTM, the Brinkman isn't used by the GTM; thesauri NBT, DBNL and NTA (all KB) are used by the GTM.

I haven't checked, but I suspect other datasets aren't using the Brinkman either, but one (or more) of the other KB thesauri.

@ddeboer
Copy link
Member

ddeboer commented Nov 30, 2023

Possibly due to multiple terminology sources sharing a URI prefix, in this case http://data.bibliotheken.nl/id/thes/, which is shared between Brinkman, NTA and STCN. Unfortunately, for datasets that don’t have their own unique prefixes, it’s very hard to produce reliable results.

@ddeboer ddeboer changed the title Incorrect Outgoing links (Brinkman) Terminology sources that share a prefix cannot be identified Dec 1, 2023
@ddeboer
Copy link
Member

ddeboer commented Dec 6, 2023

@EnnoMeijers will try to figure out if we can distinguish http://data.bibliotheken.nl/id/thes/ prefixes based on their p… identifier.

Other examples of shared prefixes include:

  • GTAA
  • all datasets that we have subdivided on our side, including AAT, CHT.

@EnnoMeijers
Copy link
Contributor

I did some checking and there is no way to distinguish between the KB prefixes based on the 'p' identifier. The only option would be to do an additional query on the KB endpoint to find the specific source, something like this:
select ?datasetName { ?thes_id schema:mainEntityOfPage/schema:isPartOf ?datasetName }
But to do this for each KB link will probably be too expensive.

@ddeboer
Copy link
Member

ddeboer commented Dec 12, 2023

That‘s unfortunate. Then my proposal is to eliminate all prefix matches that match more than one terminology source, and keep only the prefixes that uniquely match a source.

That way we will lose data, but that may be preferable to providing confusing data, as described in this issue’s first post.

Additionally, however, I would like to add a requirement: netwerk-digitaal-erfgoed/requirements-terminologiebronnen#5.

@EnnoMeijers
Copy link
Contributor

Couldn´t we at least register the prefix without a further resolution to a specific source from a particular organization. In this way we could type Brinkman, NTA and STCN all as a subclass of a KB-thesaurus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants