Skip to content

ShEx‐based representation

Ángel Iglesias Préstamo edited this page Oct 19, 2023 · 3 revisions

The model in a nutshell

The concept of ShEx-based representation introduces the idea of Schema, which further improves the performance of the system both scalability and processing speed wise. This is because we know the size of the resulting arrays before actually processing the dump. If we do not define a Schema, the system has to process the whole dump and create as many arrays as different Subjects, with as many rows as unique predicates, and as many columns as unique objects, for a SPO orientation. However, in the case of having a Schema defined, we would have as many rows as different Subjects, with as many columns as predicates (fields in the Shape), where the value of the Object is stored in each cell. The number of arrays will be the same as the number of different Shapes defined in the Schema.

Example

Having the following RDF dataset, serialized in Turtle, let us see how it is stored in ShEx-RemoteHDT.

PREFIX :        <http://example.org/>
PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>

:alan       :instanceOf       :human                  ;
            :placeOfBirth     :warrington             ;
            :placeOfDeath     :wilmslow               ;
            :dateOfBirth      "1912-06-23"^^xsd:date  ;
            :employer         :GCHQ                   .

:warrington :country          :uk                     .

:wilmslow   :country          :uk                     ;
            :instanceOf       :town                   .

:bombe      :discoverer       :alan                   ;
            :instanceOf       :computer               ;
            :manufacturer     :GCHQ                   .

We also have the following ShEx Schema defined:

prefix : <http://example.org/>

:Person {
  :placeOfBirth     @:Place        ;
  :dateOfBirth      xsd:date       ;
  :employer         @:Organization ;
}

:Place {
  :country @:Country
}

:Country {}
:Organization {}

Lastly, the three-dimensional matrix will be created where for each Triple in the aforementioned RDF dump, the value of the object will be placed in its corresponding (x,y,z) position. Where x is the integer value representing the Shape, x stands for the Subject, and y for the Predicate. For those Triples that do not conform to any Shape, the usual representation will be used. For further improving this, some mechanisms for Schema validation could be used.

[[[{array: 1, row: 0}, "1912-06-23"^^xsd:date, {array: 3, row: 0}]], // --> Person Shape

 [[{array: 2, row: 0}]], // --> Place Shape

 [["http://example.org/uk"]], // --> Country Shape

 [["http://example.org/GCHQ"]]] // --> Organization Shape

Constraints

  • We will work with properties that are not repeated.
  • For the time now, the system accepts just 1-1 cardinalities.
Clone this wiki locally