Skip to content

SCDH/dts-transformations

Repository files navigation

DTS Transformations

Tests Create release

This project provides XSLT stylesheets for those endpoints of Distributed Text Services (DTS), that can be implemented generically based on evaluating <citeStructure>.

  • navigation endpoint
  • document endpoint

The other endpoints are not targeted by this project. But there are recommendations.

Status of Implementation

Implemented version: 1.0rc1

Query parameters for the endpoints are supported through stylesheet parameters:

parameter navigation document
resource ³ ³
ref
start
end
down not used
tree
page not used
mediaType not used ¹

Evaluated TEI elements:

element navigation document
<refsDecl>
<citeStructure>
<citeData> ✅² not used

Notes

  1. see section about mediaType
  2. supported, but dcterms do not yet come out as a map as shown in the specification's examples
  3. for mapping of the values of resource to document URIs see the resource section

XSLT for Endpoints

Navigation

The xsl/navigation.xsl XSLT package generates the dts:Navigation JSON-LD object as required by the navigation endpoint. Members are generated by evaluating tei:citeStructure elements in the processed TEI document.

xsl/navigation.xsl can either be applied on a TEI source document, e.g. test/matt.xml

$SAXON_CMD -xsl:saxon-local.xml -xsl:xsl/navigation.xsl -s:test/matt.xml

... or it can be called with an initial template (the default initial template xsl:initial-template) where the source URL can then be passed as the resource stylesheet parameter:

$SAXON_CMD -xsl:saxon-local.xml -xsl:xsl/navigation.xsl -it resource=test/matt.xml

When a source document is processed, the resource stylesheet parameter can be used to set the source's URI in multiple properties of the JSON-LD output.

Document

The xsl/document.xsl XSLT package implements the either full or part-wise delivery of a TEI document.

Just as xsl/navigation.xsl, also xsl/document.xsl can be applied on a source document (where the resource parameter can be used to reset the resource identifier)

$SAXON_CMD -xsl:saxon-local.xml -xsl:xsl/document.xsl -s:test/matt.xml

... or it can be called with the default initial template:

$SAXON_CMD -xsl:saxon-local.xml -xsl:xsl/document.xsl -it resource=test/matt.xml

Example output:

$SAXON_CMD -config:saxon.he.xml -xsl:xsl/document.xsl -s:test/john.xml tree=page-hateoas start=p.1 end=p.1.end

This selects the content of the first page of test/john.xml, i.e. the nodes from <pb n="1"/> to the last node before <pb n="2"/>:

<?xml version="1.0" encoding="UTF-8"?><TEI xmlns="http://www.tei-c.org/ns/1.0"><dts:wrapper xmlns:dts="https://w3id.org/api/dts#"><pb n="1"/>
         
            <head>The book of John</head>
            
               <milestone unit="theme" xml:id="creation-start"/>
               <l n="1">In the beginning was the Word, and the Word was with God, and the Word was
                  God.</l>
               <l n="2">He was with God in the beginning.</l>
               <l n="3">Through him all things were made; without him nothing was made that has been
                  made.</l>
               In him was life, and that life was the light</dts:wrapper></TEI>

The output is well-formed and contains the nodes (trees) from the node identified by the start throughout the node identified by the end parameter. More about cutting out text based on milestone-like markup is written in the project's Wiki.

Getting started

Command Line

If you have Saxon HE at hand, simply use it as follows.

  1. Download released zip packages of the project. They are available as release assets.
    unzip dts-transformations-VERSION-package.zip
  2. Setup the class path for Saxon:
    export SAXON_CMD="java -cp ... net.sf.saxon.Transform"
  3. Transform:
    $SAXON_CMD -config:dts-transformations/saxon.he.xml -xsl:dts-transformations/xsl/navigation.xsl -s:YOUR_TEI.xml

Oxygen Framework

You can install the transformations bundled in an Oxygen framework. The framework works on top of the TEI P5 framework and its transformation scenarios support you well in writing cite structure declarations with <refsDecl> and <citeStructure> elements. The framework can simply be installed by putting the following URL into the dialog box in Help > Install new add-ons ....

https://scdh.github.io/dts-transformations/descriptor.xml

There is a detailed installation guide in the Wiki.

Errors may occur on older versions of Oxygen, see Issue 10. Consider installing a plugin with a newer version of Saxon.

Cloning

You can also clone this repo and set up and use its conveniant Tooling like so:

Setup:

# git clone ...
cd dts-transformations
./mvnw package            # sets up tooling

Besides a wrapper script for Saxon-HE under target/bin/xslt.sh, this also provides you with Apache Jena RIOT under target/bin/riot.sh and the command line interface of Titanium JSON-LD under target/bin/ld-cli.

Transforming:

target/bin/xslt.sh -config:saxon.he.xml -xsl:xsl/navigation.xsl -s:test/matt.xml

Other RDF serialization (e.g. expanded JSON-LD):

target/bin/xslt.sh -config:saxon.he.xml -xsl:xsl/navigation.xsl -s:test/matt.xml | target/bin/ld-cli expand -op

Deployment

To make DTS endpoints, the XSL transformations from this package need to be deployed on a web service. There are several options and we will publish a PoC for a deployment very soon.

You can use the initial templates of xsl/navigation.xsl and xsl/document.xsl for getting the document by the resource parameter. You can go along with URIs for resources; or you can overwrite dts:resource-uri#0 from xsl/resource.xsl for mapping arbitrary resource identifiers to document locations.

Customization

resource

The value coming in via the resource parameter must somehow be mapped to a document URI (at least when calling the initial template). There is a mapping function, that can easily be replaced. It's called dts:resource-uri#1 and defined in xsl/resource.xsl. This package can be replaced with one that suits your needs by the Saxon configuration file.

Citation Trees

To add custom citation tree constructions not based on <citeStructure>, you can add templates to the citationTrees mode defined in xsl/tree.xsl. It is initiated on every refsDecl and is first called on self::refsDecl. It runs in shallow-skip mode.

HTTP Status Codes

To get the HTTP status codes, that the DTS specs prescribe for certain errors, the static parameters in xsl/errors.xsl can be used. They define error codes that a web service can catch and then return HTTP status codes accordingly.

Please note, that the XSLT uses <xsl:assert> in some places, which does not throw errors per default, but needs the XSLT processor configured to do so. Saxon HE can be told by the -ea:on command line switch or by the /configration/xslt/@enableAssertions configuration file option to enable assertions.

mediaType

Processing of the mediaType parameter is a matter of post-processing the result of applying xsl/document.xsl. It is thus is up to customization. There are several approaches:

  1. chaining the output of the xsl/document.xsl to another transformation which evaluates the mediaType parameter
  2. importing parts of xsl/document.xsl in an third stylesheet that processes mediaType
  3. compile time customization of xsl/document.xsl through its static parameters which determine a media-type-package, its version, and how it is called for processing mediaType

The first option wins the award of straighforwardness, but may have a downside: The source-document context of the nodes will probably be lost during the post-processing phase. The other approaches can get the full benefit from the nice feature, that the nodes returned by the two dts:cut-...#1 functions in xsl/document.xsl are still in the context of the source document (node identity). So you can probably use your well-written stylesheets for getting HTML, plain text, LaTeX, etc, even for parts of your documents.

For the third option, see the example post-proc-(apply|call|fun).xsl packages in the test folder.

URI Templates

URI templates, which are required for the output of the dts:Resource LOD object, must of course be adaptable to specific project needs.

The adaption can be done by providing an custom XSLT package to the xsl/navigation.xsl through its static parameters uri-template-package and uri-template-package-version. An implementation must expose two functions:

dts:uri-template-map-entries ($resource as ducument-node(), parameters as map(xs:string, item()*) as item()*
dts:navigation-uri ($resource as ducument-node(), parameters as map(xs:string, item()*) as xs:anyURI?

They get the resource document and the query parameters for maximum flexiblity. The first function must return a sequence of <xsl:map-entry> elements.

The xsl/uri-templates/ folder offers different implementations.

Additional Metadata

xsl/navigation.xsl offers customization points for adding metadata and other LOD properties to the member objects.

  1. The mode member-metadata can be used to add additional elements to the intermediate <dts:member> elements. The mode es called for each of the source's nodes (forrests) selected by a citeStructure/@match. This mode does not contain any templates but the default shallow-skipisch ones.
  2. The function dts:member-metadata-json#1 can be used to access these additional elements in order to output additional LOD properties to the member objects.

@context

The value of JSON-LD @context property can be configured through the context parameters in xsl/dts.xsl.

JSON-LD Serialization

The JSON-LD output has an asserted order where order matters: in arrays. The members array is in document order.

The order of object properties does not carry any information and there are no guarantees about it. So the @context property of the root object may occur as the first or the last property or somewhere in the middle.

Saxon's JSON serializer per default escapes slashes with backslashes. If this matters, first think about configuring the serializer: There's a escape-solidus option.

Other Endpoints

The collection endpoint is not targeted by this project. We recommend to first extract an RDF-based knowledge graph from your set of documents using xtriples-micro and to then use SPARQL and JSON-LD Framing for generating the collection objects from it. We have documented this approach in the xtriples-micro's Wiki.

The entry endpoint and the use of URI templates is really the killer feature of DTS. Do not underestimate it! It even allows you to have different base URLs for the different endpoints and it can serve as an extensible service registry for your edition. Imagine to serve collection from a static web server like github pages and to have a generic single service for navigation and document with a different base URL that serves these endpoints for multiple editions or even a whole community.––There's no generic solution for the entry endpoint.

Contributing

Contributions of all kinds are well come. Please see the contributing guide.

There's also a Wiki which lives from community content.

License

MIT