Skip to content

INGV/wf-provenance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

WF Provenance Schema

WF Provenance is a JSON Schema designed to describe workflow-level provenance information for waveform digital objects.

It is a core component of the PID-LAND ecosystem and complements the WF Handle schema by providing a structured, machine-actionable description of data lineage, versioning, and processing history.

The schema is intended for public use, automatic validation, and long-term traceability of waveform digital objects.


Purpose

  • Define a standardized provenance model for waveform digital objects
  • Enable automatic validation using JSON Schema
  • Support FAIR principles, with emphasis on:
    • Reusability
    • Transparency
    • Reproducibility
  • Align provenance metadata with:
    • W3C PROV-O
    • Dublin Core / DCTERMS
    • Schema.org
  • Integrate seamlessly with:
    • WF Handle
    • PID-based landing services
    • WF Manifest RO-Crate

Conceptual Model

A WF Provenance document represents:

  • a PID-identified digital object
  • its workflow history
  • the agents, software, and sources involved in its creation

The model is revision-centric:

  • each processing step produces a new version
  • versions are explicitly linked via prov:wasRevisionOf

This enables:

  • provenance chaining
  • backward navigation of processing steps
  • machine reasoning over workflows

Format and Structure

  • Schema type: JSON Schema (Draft 2020-12)
  • Main type: object
  • Extension policy: strict (additionalProperties: false)
  • Required top-level fields:
    • @context
    • @type
    • dc:identifier
    • prov:wasRevisionOf

Top-level Properties

Field Type Description
@context object Prefix mapping (dc, dcterms, prov, schema)
@type string Fixed value: "WF Provenance"
dc:identifier string Persistent identifier of the digital object
dcterms:isPartOf string Optional aggregation or collection
prov:generatedAtTime string (date-time) Provenance record creation time
prov:wasAttributedTo string Responsible agent or organization
prov:usage object Software used during processing
prov:wasRevisionOf array Workflow revisions

Revision Object (prov:wasRevisionOf)

Each revision represents one workflow step that generated or modified the data.

Required fields

Field Type Description
dc:hasVersion integer Version number
schema:file object Output file information
prov:wasGeneratedBy object Generation activity

Optional contextual fields

Field Type Description
schema:startDate string (date-time) Workflow start time
schema:Organization string Responsible organization
prov:SoftwareAgent array (URI) Software agents involved
dcterms:spatial object Spatial reference (x, y, z)

File Description (schema:file)

Field Type Description
name string File name
position string (URI) Persistent or resolvable file location

Generation Activity (prov:wasGeneratedBy)

Field Type Description
prov:hadPrimarySource URI Source dataset
schema:SoftwareApplication array (URI) Software used
schema:Organization string Executing organization
dcterms:accrualPeriodicity string Update frequency

Validation

  • JSON Schema
    • Enforces structure, required fields, and data types
    • Prevents uncontrolled extensions
  • SHACL
    • Enables semantic and logical validation
    • Useful for temporal consistency and workflow integrity checks

The schema is designed to be safely used in automated validation pipelines.


Example JSON

{
  "@context": {
    "dc": "http://purl.org/dc/elements/1.1/",
    "dcterms": "http://purl.org/dc/terms/",
    "prov": "http://www.w3.org/ns/prov#",
    "schema": "http://schema.org/"
  },
  "@type": "WF Provenance",
  "dc:identifier": "11099/6b8414a2-fb66-11f0-b5e4-0242ac120007",
  "prov:generatedAtTime": "2024-04-10T12:00:00Z",
  "prov:wasAttributedTo": "INGV",
  "prov:wasRevisionOf": [
    {
      "dc:hasVersion": 1,
      "schema:startDate": "2024-04-09T00:00:00Z",
      "schema:Organization": "INGV",
      "prov:SoftwareAgent": [
        "https://example.org/software/mseed-processor"
      ],
      "dcterms:spatial": {
        "x": 40.7867,
        "y": 15.9427,
        "z": 690
      },
      "schema:file": {
        "name": "IV.ACER..HNE.D.2024.100",
        "position": "https://hdl.handle.net/11099/data/ACER_HNE_20240409.mseed"
      },
      "prov:wasGeneratedBy": {
        "prov:hadPrimarySource": "https://hdl.handle.net/11099/source/ACER",
        "schema:SoftwareApplication": [
          "https://example.org/software/fdsnws"
        ],
        "schema:Organization": "INGV",
        "dcterms:accrualPeriodicity": "irregular"
      }
    }
  ]
}

Relationship with WF Handle

WF Handle describes what the digital object is

WF Provenance describes how it was produced

Together they implement a PID-centric, information-centric metadata model supporting FAIR data management, reproducible science, and long-term preservation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •