Skip to content

meaningfy-ws/eurovoc-pipelines

Repository files navigation

Content improvement implementation

Following is a description of the transformation processes for SRC-AP assets that are edited in VocBench and shall be published as SKOS-AP-EU in Cellar. The current implementation of the transformation pipelines uses Linked Pipes ETL (https://etl.linkedpipes.com/). The transformation processes aims to clean up the Authority Tables (AT) content using a well-established framework to resolve semantic dissonances, redundancies, overlaps, and other types of issues.

Folders:

  • SAI_W.P2.1_pipelines contains the pipelines with all executed transformations that aim to simplify the Authority Tables.
  • Old_Pipeline contains all the developed process for previous publication process of Eurovoc.

Input:

  • SRC-AP files for Corporate Body, Corporate Body Classification, Country, Membership Classification, Site, Language and Place Authority Tables.

Output goals:

  • To generate simplified and clean SREC-AP files.

ETL pipelines description

We developed a structured approach to implement the recommended actions based on the findings and executed transformations on the ATs in their SRC-AP representation, as following.

  • Extract the data from a GraphDB RDF database (triplestore), where the SRC-AP files are uploaded.
  • Apply transformations within the pipeline to convert the source data into the desired target representation.
  • load the processed data back into the triplestore in a new target environment, ensuring that the refined data would be readily available for use.

About

Pipelines for Publishing EuroVoc

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published