DataCite is the data format used by InvenioRDM when uploading a record through the API. This document describes how different parts of the RO-Crate metadata are converted into the DataCite format.
Note that RO-Crate and DataCite each contain features that the other does not have, so it is difficult to create a fully accurate mapping and some information may be lost along the way. You should always check the outputs to ensure their accuracy before publishing your record.
resource_type
is a mandatory field in DataCite- RO-Crate does not have a field that describes the type of the entire directory
- Therefore, we assume the type to be
dataset
- an
author
in RO-Crate is mapped to acreator
in DataCite, alongside with their affiliations - if the
@id
field of an author is an ORCiD, the ORCiD field is parsed and added in DataCite - consists of
person or organization
andaffiliation
- if no creator exists, the creator is chosen to be the value
:unkn
- similar to creator mapping
- the
name
field is mapped to thetitle
field - in case
name
does not exist, it falls back to using the value of@alternativeName
- in case neither of those exist,
title
is assigned:unkn
@alternativeName
is mapped toadditional_titles
- a new array entry in
additional_titles
is added - the
lang
field is omitted, since we do not get information on the language of the additional title from RO-Crates.
- the
datePublished
field is mapped tometadata.publication_date
- the DataCite field may only contain the date, but not the time
- we try to guess the format and parse the date
- If no
datepublished
value is present, thepublication_date
is assigned the value:unav
description
field is mapped as-is tometadata.description
- RO-Crates does not have any additional description. This
additional_descriptions
field in DataCite is thus never assigned any value.
- the
identifier
field in DataCite is not mapped, since it defaults to SPDX this would require knowlege of the mapping of a licence URL to the SPDX id (https://spdx.org/licenses/) - in case the RO-Crate does not reference another object, but contains a direct value the following is applied
- if the value is a URL: only set the link value in the DataCite file
- if the value is freetext: only set the description value in the DataCite file
keywords
field is mapped tosubjects
field
inLanugage
is mapped tometadata.languages
- we try to understand what a language is given free text and then map it to the ISO-639-3 language code (as expected by InvenioRDM)
- if we cannot find out what language it is, we omit the field
temporalCoverage
is mapped as-ios tometadata.dates
. The type of the date is "other" and the description is "Temporal Coverage".
- maps the
version
field tometadata.version
- if
publisher
is a string, mapspublisher
tometadata.publisher
- if
publisher
is an object, mapspublisher.name
tometadata.publisher
- if no publisher exists, the value is
:unkn
- the
identifier
field of RO-Crate is mapped to toidentifier
array in DataCite - the mapping currently only processes DOIs
- adding new schemes can easily be added in
mapping/mapping.json
contentSize
field is mapped tometadata.sizes
encodingFormat
field is mapped tometadata.formats
- we currently only support free text locations and locations in the geonames scheme
- support for other schemes can easily be added in
mapping/mapping.json
- the
contentLocation
field is mapped to thelocations
field in RO-Crate.
- only the funder's name is mapped, since the ID in DataCite needs to be of a controlled vocabulary (and we don't know this controlled vocabulary)
- if the
datePublished
field in the RO-Crate metadata file is in the future, an embargo is applied to the resource - the processing of the data to set the embargo period is a best-effort approach and is located in
mapping/processing_functions.py#embargoDateProcessing