WF Provenance is a JSON Schema designed to describe workflow-level provenance information for waveform digital objects.
It is a core component of the PID-LAND ecosystem and complements the WF Handle schema by providing a structured, machine-actionable description of data lineage, versioning, and processing history.
The schema is intended for public use, automatic validation, and long-term traceability of waveform digital objects.
- Define a standardized provenance model for waveform digital objects
- Enable automatic validation using JSON Schema
- Support FAIR principles, with emphasis on:
- Reusability
- Transparency
- Reproducibility
- Align provenance metadata with:
- W3C PROV-O
- Dublin Core / DCTERMS
- Schema.org
- Integrate seamlessly with:
- WF Handle
- PID-based landing services
- WF Manifest RO-Crate
A WF Provenance document represents:
- a PID-identified digital object
- its workflow history
- the agents, software, and sources involved in its creation
The model is revision-centric:
- each processing step produces a new version
- versions are explicitly linked via
prov:wasRevisionOf
This enables:
- provenance chaining
- backward navigation of processing steps
- machine reasoning over workflows
- Schema type: JSON Schema (Draft 2020-12)
- Main type:
object - Extension policy: strict (
additionalProperties: false) - Required top-level fields:
@context@typedc:identifierprov:wasRevisionOf
| Field | Type | Description |
|---|---|---|
@context |
object | Prefix mapping (dc, dcterms, prov, schema) |
@type |
string | Fixed value: "WF Provenance" |
dc:identifier |
string | Persistent identifier of the digital object |
dcterms:isPartOf |
string | Optional aggregation or collection |
prov:generatedAtTime |
string (date-time) | Provenance record creation time |
prov:wasAttributedTo |
string | Responsible agent or organization |
prov:usage |
object | Software used during processing |
prov:wasRevisionOf |
array | Workflow revisions |
Each revision represents one workflow step that generated or modified the data.
| Field | Type | Description |
|---|---|---|
dc:hasVersion |
integer | Version number |
schema:file |
object | Output file information |
prov:wasGeneratedBy |
object | Generation activity |
| Field | Type | Description |
|---|---|---|
schema:startDate |
string (date-time) | Workflow start time |
schema:Organization |
string | Responsible organization |
prov:SoftwareAgent |
array (URI) | Software agents involved |
dcterms:spatial |
object | Spatial reference (x, y, z) |
| Field | Type | Description |
|---|---|---|
name |
string | File name |
position |
string (URI) | Persistent or resolvable file location |
| Field | Type | Description |
|---|---|---|
prov:hadPrimarySource |
URI | Source dataset |
schema:SoftwareApplication |
array (URI) | Software used |
schema:Organization |
string | Executing organization |
dcterms:accrualPeriodicity |
string | Update frequency |
- JSON Schema
- Enforces structure, required fields, and data types
- Prevents uncontrolled extensions
- SHACL
- Enables semantic and logical validation
- Useful for temporal consistency and workflow integrity checks
The schema is designed to be safely used in automated validation pipelines.
{
"@context": {
"dc": "http://purl.org/dc/elements/1.1/",
"dcterms": "http://purl.org/dc/terms/",
"prov": "http://www.w3.org/ns/prov#",
"schema": "http://schema.org/"
},
"@type": "WF Provenance",
"dc:identifier": "11099/6b8414a2-fb66-11f0-b5e4-0242ac120007",
"prov:generatedAtTime": "2024-04-10T12:00:00Z",
"prov:wasAttributedTo": "INGV",
"prov:wasRevisionOf": [
{
"dc:hasVersion": 1,
"schema:startDate": "2024-04-09T00:00:00Z",
"schema:Organization": "INGV",
"prov:SoftwareAgent": [
"https://example.org/software/mseed-processor"
],
"dcterms:spatial": {
"x": 40.7867,
"y": 15.9427,
"z": 690
},
"schema:file": {
"name": "IV.ACER..HNE.D.2024.100",
"position": "https://hdl.handle.net/11099/data/ACER_HNE_20240409.mseed"
},
"prov:wasGeneratedBy": {
"prov:hadPrimarySource": "https://hdl.handle.net/11099/source/ACER",
"schema:SoftwareApplication": [
"https://example.org/software/fdsnws"
],
"schema:Organization": "INGV",
"dcterms:accrualPeriodicity": "irregular"
}
}
]
}Relationship with WF Handle
WF Handle describes what the digital object is
WF Provenance describes how it was produced
Together they implement a PID-centric, information-centric metadata model supporting FAIR data management, reproducible science, and long-term preservation.