Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
Binary file added XAS-XDI-CDIFImplementation.xlsx
Binary file not shown.
Binary file added XDI-CDIF-Mapping.xlsx
Binary file not shown.
121 changes: 121 additions & 0 deletions XDIVariablesInCDIF.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
Mapping dataset variables for XAS data from XDI to CDIF Schema.org. Updated for schema revisions 2025-11-20.

In the XDI variables are named in the #Column.n rows in the header, and appear as column headings in a comment row just before the data section. In the CDIF metadata record, these variables are documented in three places:
1. as concepts. In the long run, the intention is that there would be an external, web-accessible vocabulary service such that these would be accessible by reference. Lacking that, for the time being, these become SKOS concepts as separate graph nodes in the schema.org serialization. I've been creating a skos:ConceptScheme and including the variable concepts in that scheme. The scheme only needs to be generated once in the document, and will always be the same.
Example:
concept scheme definition
{
"@id": "#xasDict",
"@type": "skos:ConceptScheme",
"dcterms:title": "X-Ray Absorption Spectroscopy Dictionary",
"dcterms:description": "A SKOS vocabulary of X-ray Absorption Spectroscopy metadata."
}
Typical concept definition. This is one thats in the XAS vocabulary we defined
{
"@id": "xas:monochromatorEnergyConcept",
"@type": "skos:Concept",
"skos:inScheme": {"@id": "#xasDict"},
"skos:prefLabel": "Monochromator Energy",
"skos:definition": "photon energy selected by the X-ray monochromator and delivered to the sample during an absorption scan; the incident X-ray photon energy impinging on the sample at any given point in the scan. This is the independent variable in an XAS scan.(ChatGPT)"
}

for variables not defined in the vocabulary, use this: http://www.opengis.net/def/nil/OGC/0/missing


2. the variable concept is referenced from the InstanceVariable definition in the schema.variableMeasured element in the serialization. Examples:

for a variable with a SKOS concept in the XAS glossary:
"schema:variableMeasured": [
{
"@id": "xas:monochromatorEnergyVariable",
"@type": [
"InstanceVariable",
"schema:PropertyValue"
],
"schema:name": "energy",
"schema:alternateName": "Monochromator energy",
"schema:propertyID": "xas:monochromatorEnergyConcept",
"schema:unitText": "eV",
"cdi:physicalDataType": "https://www.w3.org/TR/xmlschema-2/#decimal",
"cdi:simpleUnitOfMeasure": "eV",
"cdi:uses": "xas:monochromatorEnergyConcept",
"cdi:name": "energy",
"cdi:displayLabel": "monochromator energy"
},


for a variable not defined in the glossary:
"schema:variableMeasured": [
{
"@id": "xas:normfluorVariable",
"@type": [
"InstanceVariable",
"schema:PropertyValue"
],
"schema:name": "normfluor",
"schema:propertyID": "http://www.opengis.net/def/nil/OGC/0/missing",
"cdi:uses": "http://www.opengis.net/def/nil/OGC/0/missing",
"cdi:name": "normfluor",
}, [maybe in the future we can guess the physicalDataType and/or the simpleUnitOfMeasure...]

3. In the distribution section, the cdi:TextMapping elements reference the variable in the cdi:formats property. We can get the cdi:index from the #Column.n info in the header, but as we've observed, sometimes the 'n' values don't always correspond with the number of columns; a safer way would be to parse the # row with the column headings just before the data. I don't see any simple way to get the cdi:length values, which are the number of characters in the fixed width column.

"schema:distribution": {
"@type": [
"schema:DataDownload",
"cdi:TabularTextDataset"
],
"cdi:has_TextMapping":
[
{
"@type": "cdi:TextMapping",
"cdi:formats": {"@id": "xas:monochromatorEnergyVariable"},
"cdi:label":"energy",
"cdi:hasRole":"Dimension" ,
"cdi:index": 1,
"cdi:length": 12
},
{
"@type": "cdi:TextMapping",
"cdi:formats": {"@id": "xas:incidentIntensityVariable"},
"cdi:label":"i0" ,
"cdi:hasRole": "Measure",
"index": 3,
"length": 13
},
{
"@type": "cdi:TextMapping",
"cdi:formats": {"@id": "xas:transmittedIntensityVariable"},
"cdi:label": "it",
"cdi:hasRole":"Measure",
"cdi:index": 2,
"cdi:length": 12
}
],
"allowsDuplicates": false,
"arrayBase": 1,
"commentPrefix": "#",
"hasHeader": true,
"headerRowCount": 27,
"skipInitialSpace": true,
"isDelimited": false,
"isFixedWidth": true
}


There are a number of constant values that get stuck in at the end of the
"schema:distribution": {
"@type": [
"schema:DataDownload",
"cdi:TabularTextDataset"
],

These describe the file format:
"allowsDuplicates": false,
"arrayBase": 1,
"commentPrefix": "#",
"hasHeader": true,
"headerRowCount": 27,
"skipInitialSpace": true,
"isDelimited": false,
"isFixedWidth": true
105 changes: 54 additions & 51 deletions se_na2so4-testschemaorg-cdiv3.jsonLD
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"@context": [
"https://docs.ddialliance.org/DDI-CDI/1.0/model/encoding/json-ld/ddi-cdi.jsonld",
{
"schema": "https://schema.org/",
"schema": "http://schema.org/",
"dcterms": "http://purl.org/dc/terms/",
"geosparql": "http://www.opengis.net/ont/geosparql#",
"spdx": "http://spdx.org/rdf/terms#",
Expand All @@ -22,6 +22,7 @@
"schema:name": "X-ray absorption spectra for K edge, selenium in sodium selenate",
"schema:description": "comment from xdi file: room temperature; measured at beamline 13-BM-D. need a better description of what's in this datasset-- maybe the DCAT metadata has what we need?",
"schema:identifier": "should have a DOI",
"schema:url": "https://github.com/XraySpectroscopy/XASDataLibrary/blob/master/data/Se/Se_Na2SeO4_rt_01.xdi",
"schema:contributor": [
{
"@type": "schema:Role",
Expand All @@ -38,56 +39,48 @@
{
"@type": [
"schema:DataDownload",
"PhysicalDataSet"
"cdi:TabularTextDataset"
],
"schema:contentUrl": "https://github.com/XraySpectroscopy/XASDataLibrary/blob/master/data/Se/Se_Na2SeO4_rt_01.xdi",
"schema:description": "Distribution = PhysicalDataSet text file conformant with XDI specification",
"schema:contentSize": "30 kb",
"schema:encodingFormat": ["text/plain"],
"dcterms:conformsTo": ["https://github.com/XraySpectroscopy/XAS-Data-Interchange/blob/master/specification/spec.md"],
"cdi:allowsDuplicates": false,
"cdi:has_TextMapping": [
{
"@type": "cdi:TextMapping",
"cdi:formats": {"@id": "xas:monochromatorEnergyVariable"},
"cdi:label": "energy",
"cdi:hasRole": "Dimension",
"cdi:index": 1,
"cdi:length": 12
},
{
"@type": "cdi:TextMapping",
"cdi:formats": {"@id": "xas:incidentIntensityVariable"},
"cdi:label": "i0",
"cdi:hasRole": "Measure",
"index": 3,
"length": 13
},
{
"@type": "cdi:TextMapping",
"cdi:formats": {"@id": "xas:transmittedIntensityVariable"},
"cdi:label": "it",
"cdi:hasRole": "Measure",
"cdi:index": 2,
"cdi:length": 12
}
],
"allowsDuplicates": false,
"isStructuredBy": {
"@type": "WideDataStructure",
"has_DataStructureComponent": [
{
"@type": "IdentifierComponent",
"isDefinedBy_InstanceVariable": {"@id": "xas:monochromatorEnergyVariable"},
"has": {
"@type": "ValueMapping",
"hasIndex": 1,
"length": 12
}
},
{
"@type": "MeasureComponent",
"isDefinedBy_InstanceVariable": {"@id": "xas:incidentIntensityVariable"},
"has": {
"@type": "ValueMapping",
"hasIndex": 3,
"length": 13
}
},
{
"@type": "MeasureComponent",
"isDefinedBy_InstanceVariable": {"@id": "xas:transmittedIntensityVariable"},
"has": {
"@type": "ValueMapping",
"hasIndex": 2,
"length": 12
}
}
],
"allowsDuplicates": false,
"arrayBase": 1,
"commentPrefix": "#",
"hasHeader": true,
"headerRowCount": 27,
"skipInitialSpace": true,
"isDelimited": false,
"isFixedWidth": true,
"cdifq:nColumns":3,
"cdifq:nRows":469
}
"arrayBase": 1,
"commentPrefix": "#",
"hasHeader": true,
"headerRowCount": 27,
"skipInitialSpace": true,
"isDelimited": false,
"isFixedWidth": true
}
],
"schema:measurementTechnique": {
Expand All @@ -113,6 +106,7 @@
}
],
"prov:wasGeneratedBy": {
"@id": "ex:provevent_246376",
"@type": [
"Event",
"xas:Analysis_Event"
Expand Down Expand Up @@ -149,7 +143,10 @@
]
},
{
"@type": ["schema:Thing", "xas:Monochromator"],
"@type": [
"schema:Thing",
"xas:Monochromator"
],
"schema:name": "Si 111",
"schema:additionalProperty": [
{
Expand All @@ -161,9 +158,12 @@
]
},
{
"@id":"xas:Detector10cmN2",
"@type": "schema:Thing",
"schema:additionalType": "xas:Detector",
"@id": "xas:Detector10cmN2",
"@type": [
"schema:Thing",
"xas:Detector"
],
"schema:name": "10cm N2 detector",
"schema:additionalProperty": [
{
"@type": "schema:PropertyValue",
Expand Down Expand Up @@ -191,7 +191,10 @@
}
],
"schema:location": {
"@type": ["schema:Place", "xas:Facility"],
"@type": [
"schema:Place",
"xas:Facility"
],
"schema:identifier": "https://ror.org/aps",
"schema:name": "APS",
"schema:additionalProperty": [
Expand Down Expand Up @@ -266,7 +269,7 @@
"schema:description": "The measured X-ray intensity before it interacts with the sample. ",
"schema:propertyID": {"@id": "xas:Incident_Intensity"},
"schema:unitText": "counts",
"schema:measurementTechnique":{"@id":"xas:Detector10cmN2"},
"schema:measurementTechnique": {"@id": "xas:Detector10cmN2"},
"identifier": "should be URI from nexusFormat organization",
"physicalDataType": "https://www.w3.org/TR/xmlschema-2/#decimal",
"uses": "xas:Incident_Intensity",
Expand All @@ -284,7 +287,7 @@
"schema:unitText": "counts",
"schema:name": "itrans",
"schema:alternateName": "transmission intensity",
"schema:measurementTechnique":{"@id":"xas:Detector10cmN2"},
"schema:measurementTechnique": {"@id": "xas:Detector10cmN2"},
"physicalDataType": "https://www.w3.org/TR/xmlschema-2/#decimal",
"identifier": "should be URI from nexusFormat organization",
"uses": "xas:Transmitted_Intensity",
Expand Down