Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add techniques in DOI metadata #1518

Open
paulmillar opened this issue Nov 20, 2024 · 2 comments
Open

Add techniques in DOI metadata #1518

paulmillar opened this issue Nov 20, 2024 · 2 comments
Labels
question Further information is requested

Comments

@paulmillar
Copy link
Contributor

paulmillar commented Nov 20, 2024

Summary

The DataCite metadata standard is able to record the experimental technique used to establish the dataset. However, SciCat doesn't do this: so the DataCite metadata is lacking this information.

Note that, although SciCat can store the experimental technique information as dataset metadata, this information is not propagated to publishedDataset.

Steps to Reproduce

  1. create a dataset, including the experimental technique
  2. trigger publishing the dataset.
  3. observe DOI metadata; e.g., via DataCite API.

Current Behaviour

The DataCite metadata contains no subject elements.

Expected Behaviour

The DataCite metadata should contain subject element(s) that describe the techniques.

Details

The document ETN-1: Embedding PaNET in DataCite metadata describes how to include PaNET terms within the metadata associated with a DOI.

The document ETN-2: Working with PaNET terms in SciCat describes how to format PaNET terms within SciCat.

Note that (as described in #1192) the DataCite metadata is calculated in two places: scicat-backend-next's published-data.controller.ts and oai-provider-service's openaire-mapper.ts.

Arguably, there should be a single place (within SciCat code) that provides DataCite metadata (as described in #1192). While removing this duplicate code (i.e., closing #1192) would benefit this issue. I don't consider #1192 to block this issue.

@nitrosx
Copy link
Contributor

nitrosx commented Nov 21, 2024

@paulmillar thanks for opening the issue.
Given that PublishedData can contains one or more datasets, what would you do if multiple datasets with different techniques are present?
Would you add a list of techniques to publishedData and than propagate all of them to DataCite?

@nitrosx nitrosx added the question Further information is requested label Nov 21, 2024
@paulmillar
Copy link
Contributor Author

Hi @nitrosx,

Yes, this is certainly a valid question. I've spent a little time thinking about this, but haven't come to a strong opinion.

One could argue that each technique (of those techniques describing the publishedData) indicates that there's at least some data (within the publishedData data) taken with that technique. Under that interpretation the publishedData techniques would be the union of all techniques in its member datasets.

Alternatively, one could argue the publishedData techniques should describe all the datasets being published, since the publishedData is describing all those datasets. With this interpretation, the publishedData techniques is the intersection of all techniques in the member datasets.

Yet a third option is the selection is context-driven. Why is a DOI being generated? This might suggest some techniques (from the union) be included and other should be ignored. This would be a more nuanced approach, something that would likely require human input.

In practical terms, I would suggest taking the first option (use the union of techniques from member datasets) as an initial version.

A subsequent update could be to present the list of techniques in the web UI, to allow the user to choose/veto techniques, as appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants