Add techniques in DOI metadata #1518

paulmillar · 2024-11-20T17:02:36Z

Summary

The DataCite metadata standard is able to record the experimental technique used to establish the dataset. However, SciCat doesn't do this: so the DataCite metadata is lacking this information.

Note that, although SciCat can store the experimental technique information as dataset metadata, this information is not propagated to publishedDataset.

Steps to Reproduce

create a dataset, including the experimental technique
trigger publishing the dataset.
observe DOI metadata; e.g., via DataCite API.

Current Behaviour

The DataCite metadata contains no subject elements.

Expected Behaviour

The DataCite metadata should contain subject element(s) that describe the techniques.

Details

The document ETN-1: Embedding PaNET in DataCite metadata describes how to include PaNET terms within the metadata associated with a DOI.

The document ETN-2: Working with PaNET terms in SciCat describes how to format PaNET terms within SciCat.

Note that (as described in #1192) the DataCite metadata is calculated in two places: scicat-backend-next's published-data.controller.ts and oai-provider-service's openaire-mapper.ts.

Arguably, there should be a single place (within SciCat code) that provides DataCite metadata (as described in #1192). While removing this duplicate code (i.e., closing #1192) would benefit this issue. I don't consider #1192 to block this issue.

The text was updated successfully, but these errors were encountered:

nitrosx · 2024-11-21T13:13:43Z

@paulmillar thanks for opening the issue.
Given that PublishedData can contains one or more datasets, what would you do if multiple datasets with different techniques are present?
Would you add a list of techniques to publishedData and than propagate all of them to DataCite?

paulmillar · 2024-11-21T17:03:01Z

Hi @nitrosx,

Yes, this is certainly a valid question. I've spent a little time thinking about this, but haven't come to a strong opinion.

One could argue that each technique (of those techniques describing the publishedData) indicates that there's at least some data (within the publishedData data) taken with that technique. Under that interpretation the publishedData techniques would be the union of all techniques in its member datasets.

Alternatively, one could argue the publishedData techniques should describe all the datasets being published, since the publishedData is describing all those datasets. With this interpretation, the publishedData techniques is the intersection of all techniques in the member datasets.

Yet a third option is the selection is context-driven. Why is a DOI being generated? This might suggest some techniques (from the union) be included and other should be ignored. This would be a more nuanced approach, something that would likely require human input.

In practical terms, I would suggest taking the first option (use the union of techniques from member datasets) as an initial version.

A subsequent update could be to present the list of techniques in the web UI, to allow the user to choose/veto techniques, as appropriate.

nitrosx added the question Further information is requested label Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add techniques in DOI metadata #1518

Add techniques in DOI metadata #1518

paulmillar commented Nov 20, 2024 •

edited

Loading

nitrosx commented Nov 21, 2024

paulmillar commented Nov 21, 2024

Add techniques in DOI metadata #1518

Add techniques in DOI metadata #1518

Comments

paulmillar commented Nov 20, 2024 • edited Loading

Summary

Steps to Reproduce

Current Behaviour

Expected Behaviour

Details

nitrosx commented Nov 21, 2024

paulmillar commented Nov 21, 2024

paulmillar commented Nov 20, 2024 •

edited

Loading