Replies: 10 comments
-
See https://github.com/ArctosDB/internal/issues/168 With the emergence of new GBIF models I don't think that cataloging "occurrences" when we are managing physical objects makes the most sense (unless we are going to start using stable part identifiers and can associate parts in a single record with identifications, events, and identifiers/agents). The "magic" of ensuring that cataloged items collected from the same individual on the same day/time should perhaps just ensure that the individual records share a single collecting event. Over time, those individual samples will have different things done to them and perhaps even provide conflicting data. Being able to aggregate them with an entity will help point out inconsistencies while keeping the "how" each bit of information came about (matching identification/identifiers/attributes to a single object) simplified. |
Beta Was this translation helpful? Give feedback.
-
What to catalog has always been an administrative call and we can close this if there's no interest in using it, but cataloging "parts" is not something I can imagine ever recommending. |
Beta Was this translation helpful? Give feedback.
-
Increasingly, the "item of interest" will become a pyramid of objects/events. While I may be interested in a species of rat, the work I do might be on a small amount of blood taken from a rat before it died. Whatever I do to that blood and the results I get may conflict with the work that someone else does on the rat's dried skin (which may also be held in a different institution). Does this indicate that the blood sample was mislabeled or that there are things we haven't yet figured out about the rat? Individual samples are going to have (or should have) their own data - who collected the blood from the rat and when/how? That information is buried or not recorded at all in legacy data, but as we progress, I think it will come to be expected. We will either need very complex catalog records or we will need to catalog individual samples - of course, I could be completely wrong, but that's how I am starting to see things.... |
Beta Was this translation helpful? Give feedback.
-
Those would be "Occurrences" and cataloging them rather than the organism does seem to be a simplification, assuming Organisms (relationships, whatever) actually keep them from doing horrible things to independence assumptions. Different institutions will probably always require replication. "One rat, one identifier" obviously has limits, and Entities provide a mostly-unavoidable means to get around the nastiest of them - I'm mostly comfortable where we're going (and have gone) with that. Splitting that blood into two tubes to guard against freezer loss or sampling it for a loan would not result in additional primary identifiers in any well-designed system, or at least that's where my imagination ends. Both of those cases could result in parts with some complexity, and Arctos can deal with that. Ideally those actions would result in new citable identifiers (perhaps even Curators demanding they be cited), and Arctos could very easily be made to deal with that (#3630). The idea that "MaterialSamples" should be something more than an impermanent secondary attribute of whatever someone decided to throw a catalog number at has merit. Following that idea to the extremes of denormalizing everything is another thing altogether. (That's not far from where parts of a few Arctos collections came from, this isn't entirely theoretical!) |
Beta Was this translation helpful? Give feedback.
-
We do have denormalized parts already when the voucher is at UAM and the tissues are at MSB, or at AMNH/MSB, or whatever. This can now be dealt with in Arctos via relationships, via the entity model , and via #4101 once we get that working, and it should also put us on the cutting edge of a Material Sample model should GBIF et al decide to switch to that. |
Beta Was this translation helpful? Give feedback.
-
I do agree that we need some way to share events for the same occurrences or at least get a report that would show which records should be sharing events if their occurrence info is the same. We do have many parasite/host records that currently do not share event IDs, for whatever reason, and likely none of those are intentional. |
Beta Was this translation helpful? Give feedback.
-
That's not denormalized.
We've been there since "the MVZ model" was an idea on a bunch of taped-together paper. Not sure this is realistic, tabling. |
Beta Was this translation helpful? Give feedback.
-
Reopening as this is related to broader entity development and the broader issue of linking records. I don't agree with closing these types of discussions without the consent of all parties. |
Beta Was this translation helpful? Give feedback.
-
interesting to see this rudderless issue with respect to "cestode" parts (#7752)! I am moving this to discussion until there is a agreement on an actual implementation proposal. |
Beta Was this translation helpful? Give feedback.
-
Yes - this was never resolved at the time of closure and absolutely is relevant to the bigger issue of how we deal with complex data and relationships across collections and institutions. I'm glad to see some growing awareness and interest in addressing these kinds of issues. |
Beta Was this translation helpful? Give feedback.
-
Is your feature request related to a problem? Please describe.
Data are provided and entered in an object-centric way. Entities make that mostly decipherable, but it's still unnecessarily denormalized - it's harder to manage, find, and understand than it has to be.
https://arctos.database.museum/SpecimenResults.cfm?oidtype=GAN&oidnum=NGB13-02500&oidoper=IS represents one individual (with 154+ samples - some records have multiple parts).
https://arctos.database.museum/guid/MSB:Mamm:335555 and https://arctos.database.museum/guid/MSB:Mamm:335558 (and maybe many more) represent one "thing that we should probably be cataloging" - an Occurrence, or material from a single individual taken at the same time (on the same day in this case).
Describe what you're trying to accomplish
More usable data, maybe for less work.
Describe the solution you'd like
#3765 (comment) provides a means to semi-magically group many records into Entities.
Another similar tool to merge records within the entity representing an "Occurrence" before (after??) creating entities would leave these data completely normalized and in an expected format.
Describe alternatives you've considered
Better entry protocols, but people aren't great at sorting these things out - Arctos can provide better tools, I think it makes sense to do so.
Additional context
I think this also happens with things like NEON mark-recapture projects. Merge would need to be interactive and explicit ("these things are not the same are you sure about this?") and leave redirects (so old GUIDs can continue to function).
Priority
@campmlc can set this, and will probably need to correct whatever I've misunderstood.
Beta Was this translation helpful? Give feedback.
All reactions