-
Notifications
You must be signed in to change notification settings - Fork 30
Language processing in Linked Data authorities
Table of Contents
This document describes language processing for the linked data module in QA.
Reference: w3school's HTML Language Code Reference
Language processing requires either the authority to support language filtering or requires the authority results to have language tagged literals. If neither of these conditions exist, filtering will not be applied to results.
Some linked data authorites tag literals with a language (e.g. 'milk@en', 'Milch@de', 'Lait@fr'). When an authority has literals in multiple languages, it is desirable to be able to request literals for a specific language for two reasons:
- provide users terms in their desired language
- avoid long results that include the term in multiple languages
Language can be specified in multiple places. They are listed here in priority order with highest priority first. If the language is not specified at a higher priority location, then the next highest language specification that does exist will be used.
- Passed as part of the request URL using the
lang
parameter - Specified in the request header using
HTTP_ACCEPT_LANGUAGE
- Authority specific default defined in the authority configuration
- Site wide default defined in qa initializer
See Configuring and using language processing below for more information on how to setup and use language filtering.
Filtering can happen in two ways either with the authority performing the filtering or QA performing the filtering.
If the authority's API supports passing in a language parameter, then QA will pass the language to the authority for it to perform the filtering. Passing language as a parameter to an authority is limited to a single language (e.g. en
). If multiple languages are specified, then only the first language will be passed to the authority. (e.g. for [en, fr]
, only en
will be passed)
QA requests the full set of results that the authority will return. Then QA performs filtering on the full set of results based on the selected language. QA filtering supports filtering for multiple languages (e.g. [:en, :fr]
).
NOTE: Some authorities will filter to a default language regardless of what QA requests. In that case, QA filtering will have no effect on the results.
Rules for filtering:
- if a language is not specified, keep all triples
- keep triples where the object literal is tagged with the selected language
- keep triples where the object literal doe not have a language tag
- if there are 0 matches for a predicate, keep triples for all languages
Some authorities may have language tagged literals that are known to be incorrect or you may actually want to retrieve literals for all languages. To prevent language filtering, set the language to *
which acts as a wild card indicating all languages should be matched.
The most common usage of this is to set the authority default configuration to *
to prevent filtering for that authority.
Setting the site default to *
means that the default behavior is to not filter for any authority unless it is set individually in the authority or as part of the QA request.
A user can override the authority default and site default by passing in *
to prevent filtering for a specific request.
Caveat: If the language is passed as a parameter to the authority and a default value is set for the language parameter, the default for the parameter will be used if the user passes in *
for the language.
The QA API supports passing lang=
parameter on search and fetch requests. If passed in, it will be used as the language for filtering, ignoring all other language configurations.
No configuration required.
The following is an example QA request passing language as part of the URL.
curl 'http://localhost:3000/qa/search/linked_data/agrovoc_ld4l_cache?q=lait&lang=fr'
The QA API supports passing the language code as the HTTP_ACCEPT_LANGUAGE
in the request header for a QA search or fetch request. If set in the request header, it will be used as the language for filtering unless the user included a lang=
parameter on the request URL.
No configuration required.
The following is an example QA request passing language as part of the http header.
curl -H 'Accept-Language: fr' 'http://localhost:3000/qa/search/linked_data/agrovoc_ld4l_cache?q=lait'
If the language is not passed in through a parameter or the request header, QA will look to see if the authority has a default value to use for the language.
{
"term": {
...
"language": "en",
...
},
"search": {
...
"language": ["en", "fr"],
...
}
}
The following is an example QA request which does not pass in language. The language will be set to the default language configured for oclc_fast authority if it is defined in the oclc_fast search configuration; otherwise, it will use the site wide default language.
curl 'http://localhost:3000/qa/search/linked_data/ocld_fast?q=twain'
If the language is not passed in through any other means, QA will look to see if there is a site wide default value to use for language.
NOTE: This provides examples for configuring a parameter to pass to the authority for the authority to perform the filtering. If this is not an option for the authority, do not provide this configuration and the filtering will happen on the QA side provided the results from the authority include language tagged literals.
The site wide language default is configured in the qa initializer. When the qa:install
generator is run, the qa initializer is installed into /config/initializers/qa.rb
. The generator will also modify routes and perform other actions. If this is a new installation of qa, you can run the installer using...
$ rails generate qa:install
OR you can manually copy the qa intializer from /lib/generators/qa/install/templates/config/initializers/qa.rb
to /config/intializers/qa.rb
.
Edit /config/intializers/qa.rb
and modify the value for default_language (uncommenting if needed)...
config.default_language = :en
The following is an example QA request which does not pass in language. The language will be set to the default language configured for oclc_fast authority if it is defined in the oclc_fast search configuration; otherwise, it will use the site wide default language.
curl 'http://localhost:3000/qa/search/linked_data/ocld_fast?q=twain'
If the configuration defines a language parameter for the authority's search url or the authority's term url, the language value will be passed to the authority which will perform the language filtering. If not defined, the filtering occurs on the QA side with QA filtering results returned from the authority.
Determination of which language to use as the value of the language parameter is determined by the prioritization process for language described in Where can language be specified?. See the other sections in Configuring and using language processing for more details on configurations and determining which language will be used.
Requires the authority to support language filtering.
Configure a parameter to pass to the authority when fetching a single term. You see lang
defined in the "template"
. And there is a mapping for the lang
parameter.
{
"term": {
"url": {
"@context": "http://www.w3.org/ns/hydra/context.jsonld",
"@type": "IriTemplate",
"template": "http://api.library.cornell.edu/skosmos/rest/v1/nalt/data?{?lang}&uri={term_uri}",
"variableRepresentation": "BasicRepresentation",
"mapping": [
{
"@type": "IriTemplateMapping",
"variable": "term_uri",
"property": "hydra:freetextQuery",
"required": true,
"encode": false
},
{
"@type": "IriTemplateMapping",
"variable": "lang",
"property": "hydra:freetextQuery",
"required": false
}
]
},
...
},
...
}
Identify the parameter used by the authority for language. Many authorities support the commonly used lang
parameter, but QA does not assume this. It allows you to specify a different parameter to use in the authority's URL.
NOTE: The key in this hash is always "lang". The value for "lang" identifies the name of the parameter in the authority URL.
{
"term": {
...
"qa_replacement_patterns": {
"term_id": "term_uri",
"lang": "lang"
},
...
},
...
}
Similarly, you can define a parameter to use for the search template URL. Again, you see lang
defined in the template and a mapping for the lang
parameter.
{
...
"search": {
"url": {
"@context": "http://www.w3.org/ns/hydra/context.jsonld",
"@type": "IriTemplate",
"template": "http://services.ld4l.org/ld4l_services/agrovoc_batch.jsp?{?query}&{?maxRecords}&{?lang}",
"variableRepresentation": "BasicRepresentation",
"mapping": [
{
"@type": "IriTemplateMapping",
"variable": "query",
"property": "hydra:freetextQuery",
"required": true
},
{
"@type": "IriTemplateMapping",
"variable": "maxRecords",
"property": "hydra:freetextQuery",
"required": false,
"default": "20"
},
{
"@type": "IriTemplateMapping",
"variable": "lang",
"property": "hydra:freetextQuery",
"required": false
}
]
},
...
}
Also in the same was as for term fetch, you can identify the parameter used by the authority for language.
{
"qa_replacement_patterns": {
"query": "query",
"lang": "lang"
},
...
}
}
All previous examples of QA requests work for passing language to the authority for filtering. The key requirement to use authority filtering is that the authority supports a language parameter which can be used to pass language as part of the request to the authority.
Using Questioning Authority
- Connecting to Discogs
- Connecting to GeoNames
- Connecting to Getty
- Connecting to Library of Congress (LOC)
- Connecting to Medical Subject Headings (MeSH)
- Connecting to OCLC FAST
Custom Controlled Vocabularies
Linked Data Access to Authorities
- Connecting to Linked Data authorities
- Using the Linked Data module to access authorities
- Configuring access to a Linked Data authority
- Language processing in Linked Data authorities
Contributing to Questioning Authority
- Contributing a new external authority
- Template for authority documentation
- Understanding Existing Authorities