Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets with multiple and no (yet) IDs #34

Open
MarekSuchanek opened this issue May 28, 2020 · 5 comments
Open

Datasets with multiple and no (yet) IDs #34

MarekSuchanek opened this issue May 28, 2020 · 5 comments
Assignees
Labels
decision Decision to be taken that alligns the approach

Comments

@MarekSuchanek
Copy link
Collaborator

During adjusting our model with @rwwh, we found out that for dataset having exactly one "dataset_id" is too limiting.

  1. A dataset can have multiple identifiers, for example DOI + ARK + URL
  2. When doing data management planning - we are planning what dataset are we going to have and those may not yet be published
@TomMiksa
Copy link
Contributor

If we changed the cardinality of dataset_id, would it also solve the issue #33 ? That is, a list of identifiers would include "historical" identifiers and current.

@MarekSuchanek
Copy link
Collaborator Author

I think it would - if to 0..n. Of course, I am not then sure if there would be also need to distinguish also current, historical, or even reserved identifiers somehow.

@briri
Copy link

briri commented Jun 2, 2020

I can see the usefulness of allowing for alternate/additional identifiers. I think we need to understand the use cases.

If the primary use case is to allow for historical identifiers (versions) of an object we could perhaps introduce something like a related_identifiers array. This is a common pattern and I think could solve most cases. For example:

{
  "related_identifiers": [
    { "type": "doi", "identifier": "10.123/1234abc", "relation_type": "is_version_of" }
  ]
}

I'm not sure what ontology would be most appropriate for the relation_type.

@JacquemotMC
Copy link

I can see the usefulness of allowing for alternate/additional identifiers. I think we need to understand the use cases.

If the primary use case is to allow for historical identifiers (versions) of an object we could perhaps introduce something like a related_identifiers array. This is a common pattern and I think could solve most cases. For example:

{
  "related_identifiers": [
    { "type": "doi", "identifier": "10.123/1234abc", "relation_type": "is_version_of" }
  ]
}

I'm not sure what ontology would be most appropriate for the relation_type.

dct: isVersionOf

@paulwalk
Copy link
Contributor

While I don't disagree with the reasoning here, I would like to push back a little against the idea of handling multiple IDs and especially of modelling their sematics in the DMP Common Standard. While I agree that all of these things exist, the question for us to consider is:

"Do we need to model these things to enable the exchange of semantically useful DMPs?"

I would like to argue that using a single ID (consistently) is enough to achieve this. If some one wants to relate multiple IDs together, that can be done outside of the DMP standard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decision Decision to be taken that alligns the approach
Projects
None yet
Development

No branches or pull requests

6 participants