Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of config file for semantic predicates #1

Open
rmfranken opened this issue Aug 29, 2024 · 4 comments
Open

Use of config file for semantic predicates #1

rmfranken opened this issue Aug 29, 2024 · 4 comments

Comments

@rmfranken
Copy link
Member

Given that we want to be able to use multiple kinds of ontologies, we should allow the user to somehow specify which predicates to use, as not all ontologies will use rdfs:label.

One possible solution is to configify this similarly to how respecter does it.

Alternatively, we could support some basic ones and handle this in the CLI (pre-filled options for skos, rdfs, dcat, schema etc. which a user can choose with a flag like -pred)

fuzon -predscheme skos - input ont1.ttl ont2.ttl

This would mean skos:prefLabel, skos:altLabel and maybe even skos:definition or skos:example get used as the prefixes on which to SPARQL.

@cmdoret
Copy link
Member

cmdoret commented Sep 2, 2024

Indeed, the current approach is a bit brute, with a hard-coded collection of common predicates; this is not great.

fuzon/src/lib.rs

Lines 12 to 24 in 7b3e1c4

lazy_static! {
static ref ANNOTATIONS: HashSet<String> = {
HashSet::from_iter(vec![
rdfs::LABEL.to_string(),
"http://schema.org/name".to_string(),
"http://www.w3.org/2004/02/skos/core#prefLabel".to_string(),
"http://www.w3.org/2004/02/skos/core#altLabel".to_string(),
"http://xmlns.com/foaf/0.1/name".to_string(),
"http://purl.org/dc/elements/1.1/title".to_string(),
"http://xmlns.com/foaf/0.1/name".to_string(),
].iter().cloned())
};
}

I guess we could have this as default, with the option to override it

@rmfranken
Copy link
Member Author

I see, perhaps an option is to store that list of predicates in a separate file and point the user towards it if they want to customize predicates - though that does reduce it's "easy to re-use" factor significantly. However, forcing a user to input specific iri's as arguments in a command line tool is also not optimal. It does not have to be as complicated as we did it in respecter though, that's for sure.

What do you think is less user-unfriendly? I definitely foresee use-cases where the current list of predicates will not suffice for indexing.

@cmdoret
Copy link
Member

cmdoret commented Sep 3, 2024

How about adding a CLI option like:

[--config-file / -c  FILE] yaml file containing a list of predicates to use. Overrides the deafult list.
  • fuzon -s input.ttl uses the default list above (hardcoded in binary)
  • fuzon -s input.ttl -c annotations.yaml` uses only the predicates defined in annotations.yaml

This would require minor API changes, likely making this list an attribute of TermMatcher instead of a constant (better practice anyways).

We could provide an examples folder containing example yaml file(s). Could look something like:

# annotations.yaml
 - https//example.org/label
 - ...

@rmfranken
Copy link
Member Author

Yes, I like it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants