Skip to content

knaw-huc/sd-skosmos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SD-SKOSMOS

A Skosmos vocabulary loader and container setup used by the HuC.

Run with docker-compose

This repository contains two docker-compose files. One for local testing (docker-compose.yml) and one as an example Portainer-based setup (docker-compose-portainer-template.yml).

The docker-compose.yml file contains configuration options for both GraphDB and Fuseki. Pick one, and comment the unneeded bits out. See the configuration details in this README to find out what to use when.

For the Portainer setup, it's recommended to create your own Docker Compose file based on docker-compose-portainer-template.yml. The example setup uses Traefik, but other setups are possible as well of course. Please use the example file together with the below section on containers and their required environment variables to create your own production Compose file to make sure that it matches your needs.

Containers

This Skosmos setup requires two containers to run (plus additionally a triple store, if that is not hosted externally).

sd-skosmos

This is the container responsible for serving the actual Skosmos application. In order for GraphDB support to work this is based on our fork of Skosmos instead of the plain version. Other than GraphDB support, this doesn't change anything.

Env var Description
SPARQL_ENDPOINT The SPARQL endpoint which Skosmos needs to use to connect to the triple store. Only requires read-access
DATA The location of the 'data' folder containing the vocabulary configuration files

sd-skosmos-loader

This container is responsible for configuring and importing the used vocabularies based on the configuration files in the data directory. It runs side by side with the Skosmos container and uses cron to run the import script for updating vocabularies from an external source.

Env var Description
SPARQL_ENDPOINT The SPARQL endpoint which Skosmos needs to use to connect. Should be the same as for the sd-skosmos container.
DATABASE_TYPE graphdb or fuseki
STORE_BASE The base endpoint of the triple store. Depends on the type, for GraphDB this is the same as the SPARQL endpoint. For fuseki, include the repository name (e.g. http://fuseki-domain:3030/skosmos)
ADMIN_USERNAME The admin username for the triple store (we need write access).
ADMIN_PASSWORD The admin password for the triple store.
DATA The location of the 'data' folder containing the vocabulary configuration files

To start

docker-compose build
docker-compose up -d

To stop:

docker-compose down

adapt config.ttl

To overwrite config-docker-compose.ttl put a config.ttl in data/. To add classifications put them in a config-ext.ttl in data/.

Add new vocabularies

In order to add a vocabulary you will need three files. The first one is {vocabulary-name}.yaml which needs to be put in the data/ directory. The {vocabulary-name}.config and {vocabulary-name}.ttl files can be in there as well, but they can also be loaded from an external source. See the example yaml files below to see how to configure them.

Configuring the vocabulary with {vocabulary}.yaml

The yaml file is used for configuring how to load the vocabulary, and whether it needs to be refreshed at a certain interval. Refreshing is mainly useful when the vocabulary is loaded from an external source.

About supported types

When importing vocabularies, keep in mind that you can import any file type which is supported by the database you use. Some file types are treated differently by different databases. An example is TriG: GraphDB puts all triples and quads from the file into the graph we tell it to use (the skosmos:sparqlGraph from the {vocab-name}.config file), while Fuseki only imports the triples that are in the default graph in the TriG file. Any quads in the file will be ignored.

Please check the behavior of your chosen database to determine which format you wish to use.

Local Files Example

config:
  type: file
  location: my-vocabulary.config

source:
  type: file
  location: my-vocabulary.ttl

Get Request Example

config:
  refresh: Yes
  refreshInterval: 1 # Hours
  type: fetch
  location: https://example.com/path/to/vocabulary.config

source:
  type: fetch
  location: https://example.com/path/to/vocabulary.ttl
  headers:
    X-Custom-Header: Set header values here

Note that both config and source support the same configuration options, except refresh and refreshInterval can only be set for config (it will affect the entire vocabulary).

Get Request Example (private GitHub/GitLab repository)

For these get requests, there are two special cases where additional authentication settings can be used. These are GitHub and GitLab, when you need to access a file in a private repository.

# Auth Settings Example
config:
  type: fetch
  location: https://your-gitlab-domain.com/organisation/repository/-/blob/main/vocabulary.config
  auth:
    type: gitlab
    token: your-access-token

source:
  type: fetch
  location: https://raw.githubusercontent.com/organisation/repository/main/vocabulary.trig
  auth:
    type: github
    token: your-access-token

The GitLab example will use the file location in the repository to fabricate a call to the GitLab API instead, as that is the only way to access private repositories. For GitHub, it's just a shortcut for setting the correct headers (make sure to use the path to the 'raw' file in this case).

SPARQL Endpoint Example

You can also load vocabularies from a sparql endpoint. For this, you will need an additional file containing the sparql query itself, which needs to be in the data/ directory. For the example, assume we have the query saved in data/vocabulary.sparql.

config:
  refresh: Yes
  refreshInterval: 1 # Hours
  type: file
  location: vocabulary.config

source:
  type: sparql
  location: https://example.com/sparql-endpoint
  query_location: vocabulary.sparql
  format: ttl # Used to specify the file extension when it's not clear from the URL

POST request example

You can also retrieve the vocabulary by posting to an endpoint, using a json body.

config:
  refresh: Yes
  refreshInterval: 1 # Hours
  type: file
  location: vocabulary.config

source:
  type: post
  location: https://example.com/some-export-endpoint
  format: trig # Used to specify the file extension when it's not clear from the URL
  body: {
    "config-a": "some value",
    "config-b": "some other value"
  }
  headers:
    Authorization: Bearer your-token-here

Tweaking datasets

When you wish to append data to an externally loaded dataset without modifying the original data, for instance to add a label to a property so it displays in Skosmos, you can add a "tweaks" entry to the yaml file. This supports the same types of files as the source, and any triples in this location will be appended to the source dataset after it's loaded.

Tweaks Example

config:
  refresh: Yes
  refreshInterval: 1 # Hours
  type: fetch
  location: https://example.com/path/to/vocabulary.config

source:
  type: fetch
  location: https://example.com/path/to/vocabulary.ttl
  headers:
    X-Custom-Header: Set header values here

tweaks:
  type: file
  location: vocabulary-tweaks.ttl

Fallback/mirror setup

It is possible to set up a fallback data file or a mirror URL for when the primary source of the data is not available. When it fails to load (when using an external source, like a fetch or SPARQL query) this dataset will be loaded instead. An example configuration can be seen below (it adds a local file, for when the external fetch fails):

config:
  refresh: Yes
  refreshInterval: 1 # Hours
  type: fetch
  location: https://example.com/path/to/vocabulary.config

source:
  type: fetch
  location: https://example.com/path/to/vocabulary.ttl
  headers:
    X-Custom-Header: Set header values here

fallback:
  type: file
  location: vocabulary.ttl

The fallback configuration supports all types you can use for source.

Database/Triple Store Types

By default, this Skosmos setup supports two triple store types: Fuseki and GraphDB. However, it is possible to extend this by adding a custom database connector to the src/database_connectors folder (see below). The type of database can be set using the DATABASE_TYPE environment variable of the sd-skosmos-loader container, and adjusting the SPARQL_ENDPOINT of both the sd-skosmos and sd-skosmos-loader containers accordingly (see the container configuration options above). Some database types may require additional configuration. The two options supported out-of-the-box will be explained below.

Configure GraphDB

For using GraphDB, set the following environment variables for the sd-skosmos container:

SPARQL_ENDPOINT=http://graphdb-location:7200/repositories/skosmos

And these for the sd-skosmos-loader:

DATABASE_TYPE=graphdb
SPARQL_ENDPOINT=http://graphdb-location:7200/repositories/skosmos
ADMIN_USERNAME=admin-username
ADMIN_PASSWORD=super-secret-password

In both cases, skosmos can be replaced with whatever you wish to call your repository.

Configure Fuseki

For using Fuseki, set the following environment variables for the sd-skosmos container:

SPARQL_ENDPOINT=http://fuseki-location:3030/skosmos/sparql

And these for the sd-skosmos-loader:

DATABASE_TYPE=fuseki
SPARQL_ENDPOINT=http://fuseki-location:3030/skosmos/sparql
STORE_BASE=http://fuseki-location:3030/skosmos
ADMIN_USERNAME=admin-username
ADMIN_PASSWORD=super-secret-password

In both cases, skosmos can be replaced with whatever you wish to call your repository.

Note that the SPARQL_ENDPOINT should be the same as the one used for the sd-skosmos container, as that is used for generating the Skosmos config. It is added for consistency with the other.

Adding another database

Database connectors can be found in src/database_connectors. The value for DATABASE_TYPE corresponds to one of the python files in this directory. In order to add support for another kind of database, you can create a new file in this directory. It should have a function called create_connector which returns an instance of a class which extends the src.database.DatabaseConnector abstract base class. The other connectors (graphdb.py and fuseki.py) can be used for reference.

The base class deals with SPARQL operations which are used for keeping track of which vocabularies are loaded and when they were last updated. These methods should not need to be changed, if the database type correctly implements SPARQL. You will need to create a setup method for creating the Skosmos repository if it doesn't exist yet, and an add_vocabulary method which deals with importing vocabularies from a file. There is a helper method sparql_http_update which can be used if the database supports the SPARQL HTTP API.

License

MIT License

About

SKOSMOS container setup

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 2

  •  
  •