Skip to content

Linked Data Support (LD) in pygeoapi #2176

@doublebyte1

Description

@doublebyte1

This issue discusses the current LD support in pygeoapi, and whether and how it could be improved.

Current State

Currently pygeoapi supports both a JSON and JSON-LD encoding. The JSON-LD encoding injects @content, @type and @id into the following resources:

This is an example of an Item Page: https://demo.pygeoapi.io/master/collections/obs/items/371?f=jsonld
The item page also adds a GeoSparql (gsp) endpoint.

Limitations/ Issues

The default schema is https://schema.org/ and the default gsp is 'http://www.opengis.net/ont/geosparql. It is possible to link properties of the features to terms from the vocabulary.

It is possible to add other vocabularies, but not to override the default vocabulary; the way the multiple vocabularies are handled may originate conflicts with unexpected results. As an example, in this pygeoapi config we define a datetime property based on DCAT:

    linked-data:
        context:
            - schema: https://www.w3.org/ns/dcat#
              stn_id: schema:identifier
              datetime:
                "@id": schema:startDate
                "@type": schema:DateTime

This is a snipped of the resulting JSON-LD definition of a feature:

{
  "@context": [
    {
      "schema": "https://schema.org/",
      "gsp": "http://www.opengis.net/ont/geosparql#",
      "type": "@type"
    },
    {
      "schema": "https://www.w3.org/ns/dcat#",
      "stn_id": "schema:identifier",
      "datetime": {
        "@id": "schema:startDate",
        "@type": "schema:DateTime"
      }
    }
  ],
  "type": "schema:Place",
  "id": 371,
  "linked_data": {
    "context": [
      {
        "schema": "https://www.w3.org/ns/dcat#",
        "stn_id": "schema:identifier",
        "datetime": {
          "@id": "schema:startDate",
          "@type": "schema:DateTime"
        }
      }
    ]
  },
  "stn_id": 35,
  "datetime": "2001-10-30T14:24:55Z",
  "value": 89.9,

Within the @context object we have two schemas defined, which means the last one takes precedence. As a result schema:place will be defined according to DCAT, rather than schema.org. Since dcat#Place does not exist, that link is broken. As the schema.org prefix is not used, the webpage will also not be parsed correctly by the Google search engine crawler.

This could be fixed by having different prefixes for each object within the @context. These behaviours should also be more clearly explained in the documentation.

Another objective which has been widely discussed in terms of visibility of SDIs, is to provide structured data to Google which could originate rich search results. This objective is articulated in the pygeoapi documentation. However, after testing the landing page and other resources with Google tools, it states that no items are detected. This usually means that the JSON-LD syntactically correct, but it does not match any of the specific rich result types that Google Search currently supports.

Verbatim JSON-LD injection feature #2171

This PR enables injecting JSON-LD context into both JSON and JSON-LD Features and STAC items:

  • It only acts upon those specific resources.
  • It does not update the current LD functionality; when it is switched on the current LD functionality is switched off, which means we can only have one approach or the other.
  • It affects both the JSON and JSON-LD encodings, which become equal.

This approach works differently than the current approach:

  • It supports only one context, which replaces the default one. The vocabularies, as well as the types are all defined in this context.
  • It allows defining a sparql fallback endpoint, within the linked_data structure.
  • It supports replacing the id field with the url of the item.

As an example, in this snippet that defines a feature a context is injected:

  {
    "@context": [
      "https://ogcincubator.github.io/bblocks-examples/build/annotated/bbr/examples/observation/vectorObservationFeature/context.jsonld"
    ],
    "id": "https://defs-dev.opengis.net/bblocks-pygeoapi/collections/ogc.bbr.examples.observation.vectorObservationFeature/items/vector-obs-1",
    "type": "Feature",
    "geometry": {
      "type": "LineString",
      "coordinates": [
        [
          -111.67183507997295,
          40.056709946862874
        ],
        [
          -111.71,
          40.156709946862875
        ]
      ]
    },

The linked_data structure is defined like this, reflecting all the options:

"linked_data": {
  "inject_verbatim_context": true,
  "replace_id_field": "id",
  "context": [
    "https://ogcincubator.github.io/bblocks-examples/build/annotated/bbr/examples/observation/vectorObservationFeature/context.jsonld"
  ],
  "fallback_sparql_endpoint": "https://defs-hosted.opengis.net/fuseki-hosted/query"
}

And this is the pygeoapi config that originated this result:

linked-data:
  inject_verbatim_context: true
  replace_id_field: id
  context:
  - https://ogcincubator.github.io/bblocks-examples/build/annotated/bbr/examples/observation/vectorObservationFeature/context.jsonld
  fallback_sparql_endpoint: https://defs-hosted.opengis.net/fuseki-hosted/query

This approach seems more powerful than the current one, as it offsets the context to a separate file where everything can be defined, including setting the use of multiple vocabularies. In this way it restricts the injection of JSON-LD to a single tag.

On the downside, it does require the user to create a context file, even in the simple example that they just want to provide info to schema.org. However, this task could be made easier by reusing the existing OGC building blocks, if this is extensively explained in the documentation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestlinked dataLinked data (schema.org/JSON-LD)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions