-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basis of Record vs. Cataloged_Item_Type #2432
Comments
A point of information, according to the Darwin Core standard,
"Observation", "Literature" and "unknown" are not valid values for
basisOfRecord.
…On Tue, Jan 7, 2020 at 9:09 PM Teresa Mayfield-Meyer < ***@***.***> wrote:
I was recently made aware of the fact that fossil specimens in Arctos are
not being properly translated to aggregators. If I search GBIF for UTEP
Fossils (Arctos) with BasisOfRecord = "fossil specimen", I get nothing, yet
this entire collection is fossils. This is going to be an issue as ALMNH:ES
and NMMNH:Paleo go into GBIF. While we could take the easy way out and just
send all ES collection types as "fossil specimen", I think we should be
more precise as there are fossils in other collections as well. Also see
#2094 <#2094>
I propose that we make better use of CATALOGED_ITEM_TYPE and use the
categories suggested in GBIF for Basis of Record:
Observation
Machine observation
Human observation
Material sample
Literature
Preserved specimen
Fossil specimen
Living specimen
Unknown
This would also provide better choices for cultural collections.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2432?email_source=notifications&email_token=AADQ727F4MWQAPYKJKEBMMTQ4UKUDA5CNFSM4KEA2VXKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IEUBSLQ>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADQ722WYNFGSPJXUDSGXVDQ4UKUDANCNFSM4KEA2VXA>
.
|
I think that's an overly-coarse split, at best - they contain lots of casts and such, along with the occasional gooey-bits (http://arctos.database.museum/guid/UAM:ES:4588) and who knows what else.
I'm definitely a fan of using existing vocabulary, but first glance suggests those are overly-arbitrary terms. Do they happen to come with definitions? At some level, this seems like something we should be pulling from existing data, rather than expecting someone to update yet another field when this changes. Denormalization is bad.... In any case, https://arctos.database.museum/info/ctDocumentation.cfm?table=CTCATALOGED_ITEM_TYPE exists and is available in the UI. |
@tucotuco what ARE valid values - I couldn't find anything to save my life. There are "examples" in the DwC wiki, but no list of defined values.
No definitions that I could find - I didn't have the time yesterday to write any and yes, there are probably some terms that should be added.
We are already denormalized. ES collections contain stuff that isn't fossil and Inv contain fossils. Sometimes this can be figured out by the "(fossil)" added to a part name, but other times not. If you can show me how this can be pulled from existing data and have it be correct 95% of the time, I'd love that, but I'm pretty sure it won't work that way. No matter what, we need to get something to make sure that fossil specimens are designated as such. The mammal curator at NMMNH just pulled a bunch of stuff from GBIF (he needs more than just stuff in Arctos) and ended up with a bunch of fossil mice from the UTEP collection. He knew this was a problem because he is familiar, but others probably wouldn't. The date of collection for that recent fossil stuff can be misleading and this will probably lead to bad science at some point. |
That's not denormalization, that's just missing the pigeonholes we've created. Denormalization is saying the same thing multiple places - being 'required' (which won't happen) to update A when you update Z.
I think that depends on how we define 'fossil.' For the purposes of GBIF, 'cataloged in an ES collection' may be sufficient. Some users will find some casts and fail to find fossils cataloged in bird collections, but that's pretty normal and may be close enough to what they want (at least for the casts). Ideally we'd make better use of something like part preservation - that should be sufficient for fossils, but won't necessarily distinguish eg human vs. machine observations. |
"Recommended best practice is to use the standard label of one of the Darwin Core classes." The examples contain all the currently valid values, namely: |
@tucotuco are there definitions for these terms? |
Yes, all of them. For example, https://dwc.tdwg.org/terms/#occurrence.
There are links to them all of them in the menu on the right side of that
page.
…On Wed, Jan 8, 2020 at 3:26 PM Teresa Mayfield-Meyer < ***@***.***> wrote:
@tucotuco <https://github.com/tucotuco> are there definitions for these
terms?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2432?email_source=notifications&email_token=AADQ7272ZXEETSXQKHQYO2TQ4YLGXA5CNFSM4KEA2VXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEINQIFY#issuecomment-572195863>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADQ725PZDUK5J7ETAOAL4LQ4YLGXANCNFSM4KEA2VXA>
.
|
I meant these terms: PreservedSpecimen, FossilSpecimen, LivingSpecimen, MaterialSample, Event, HumanObservation, MachineObservation, Taxon, Occurrence |
That's what I am talking about.
https://dwc.tdwg.org/terms/#occurrence
https://dwc.tdwg.org/terms/#materialsample
https://dwc.tdwg.org/terms/#event
https://dwc.tdwg.org/terms/#taxon
https://dwc.tdwg.org/terms/#livingspecimen
https://dwc.tdwg.org/terms/#preservedspecimen
https://dwc.tdwg.org/terms/#fossilspecimen
https://dwc.tdwg.org/terms/#humanobservation
https://dwc.tdwg.org/terms/#machineobservation
…On Thu, Jan 9, 2020 at 7:54 AM Teresa Mayfield-Meyer < ***@***.***> wrote:
I meant these terms:
PreservedSpecimen, FossilSpecimen, LivingSpecimen, MaterialSample, Event,
HumanObservation, MachineObservation, Taxon, Occurrence
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2432?email_source=notifications&email_token=AADQ723GPHKUOHSPAOCJC4TQ45CEDA5CNFSM4KEA2VXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIQYXZY#issuecomment-572623847>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADQ7242IKBHGVSVAAUVF3DQ45CEDANCNFSM4KEA2VXA>
.
|
DOH! Thanks! |
As an art collection, we would defer to the recommendations of the Getty Categories for the Description of Works of Art -- http://www.getty.edu/research/publications/electronic_publications/cdwa/1object.html#RTFToC2a According to the CDWA, catalog level is an indication of the level of cataloging represented by the record, based on the physical form or intellectual content of the material. Examples include: item, volume, album, group, subgroup, collection, series, set, multiples, component, box, fond, portfolio, suite, complex, object grouping, performance and items. We would primarily use item but in some cases another catalog level may be appropriate, such as series or group. Would item be an appropriate term to add to your list of cataloged item types, or is it too generic? If it’s too generic, we would probably still need to add a different term as I’m not sure any of the proposed ones here would work for an art collection. Also, I don’t think I understand the implications of adding new cataloged item types. How would this change things for cataloging and searching? |
I just noticed that our specimens on GBIF are coming up as Preserved specimen instead of Fossil specimen. We need to find a solution for this. |
Same here - DMNS Marine Inverts. Where do you change BasisOfRecord? |
Based on all of the discussion above, I think we still need the granularity of assigning basis of record by cataloged item and the way to do that should be through cataloged item type.
Using "fossil" in preservation puts our basis of record for fossil material one step away from the place we should already have it - Cataloged_Item_Type where it would easily translate to DarwinCore and also provide better documentation for us. I really think we are under-utilizing this field and I suggest that we add the following terms and definitions:
We could link these to the definitions provided by DwC or Getty. This might also impact #3164 but it might also help provide a basis for differing displays of catalog item types. |
This looks reasonable . . .
…On Fri, Jan 8, 2021 at 10:50 AM Teresa Mayfield-Meyer < ***@***.***> wrote:
* [EXTERNAL]*
Based on all of the discussion above, I think we still need the
granularity of assigning basis of record by cataloged item and the way to
do that should be through cataloged item type.
Ideally we'd make better use of something like part preservation - that
should be sufficient for fossils, but won't necessarily distinguish eg
human vs. machine observations.
Using "fossil" in preservation puts our basis of record for fossil
material one step away from the place we should already have it -
Cataloged_Item_Type where it would easily translate to DarwinCore and also
provide better documentation for us. I really think we are under-utilizing
this field and I suggest that we add the following terms and definitions:
Term Definition
living specimen A biological specimen that is alive.
preserved specimen A biological specimen that has been preserved.
fossil specimen A preserved biological specimen that is a fossil.
human observation An output of a human observation process.
machine observation An output of a machine observation process.
item An individual cultural object or work.
We could link these to the definitions provided by DwC or Getty.
This might also impact #3164
<#3164> but it might also help
provide a basis for differing displays of catalog item types.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2432 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBAPSP4MUYITP2RHQJ3SY5AWBANCNFSM4KEA2VXA>
.
|
I am in favor of this |
I want to advocate for using the DWC terms, but in this case they're a little wonky for humans. If we just use "PreservedSpecimen" then the mapping to DW will be straightforward, new values won't require rebuilding code, users won't have to guess how we've translated, etc. - but it'll say "PreservedSpecimen" on records in Arctos. If we go with eg "preserved specimen" then we do have to translate - keep our local definitions synced up with DWC, run code like below for export, etc. I have no strong feelings, but I think it's worth discussion before we change anything.
|
By wonky, do you just mean the formatting of the terms, i.e. "PreservedSpecimen" vs "preserved specimen"? |
Yes, just that, no functional implications. |
Add default to manage collection but can be changed by adding the field to the bulkloader or changing it during data entry. Type search field on main search page should search these terms. |
@campmlc @dustymc @ccicero @ebraker @DerekSikes @mkoo @Nicole-Ridgwell-NMMNHS Please feel free to visit and comment on my submission at tdwg/dwc#314 |
Most excellent point from @dbloom :
So in the name of sustainability I think we have to go with (2) (let the collections figure it out) or (3) (limit ourselves to DWC terms); I'm not going to be in a position to try what we're failing at now. As a stopgap measure, my DWC build scripts are now just dropping everything with non-approved BasisOfRecord (which probably looks like random things not getting published from the collections). |
About the Unified Model. GBIF has committed to continuing to publish whatever is publishable now (that means DwC and extensions for our purposes). The underlying Unified Model will not have basisOfRecord. It makes no sense. Instead every type of entity (Event, Entity, Organism, MaterialEntity, DigitalEntity, GeneticSequence, etc. will have its own type term. One of the types of MaterialEntity might still be a dwc:PreservedSpecimen, but that is for the community to hash out, as is happening somewhat in anticipation in the TDWG Material Sample Working Group. As we showed in both Diversitying the GBIF Data Model webinars so far, Occurrence is a post-facto construct joining evidence of a taxon at a place and time. Thus, Occurrences will be possible to construct from the Unified Model for those who need them, but they will no longer be confused with Organisms or Specimens. There won't be a "table" or "spreadsheet" for them except for those who continue to publish with the current paradigm and suffer all of its limitations. |
@dustymc how about a report for collections of records without a GBIF-approved catalog item type? I'm guessing someone made an error when entering that record. |
yay! (And agreed, makes no sense.)
I think this one is "etc.," which might be obvious if it used an appropriate part preservation and/or event type instead of that being stuffed into identification remarks for some reason.
Maybe if it comes to that, but can we just fix this instead of making a report that won't lead anywhere?
I think it's just the usual - remarks is overused, the structure and terms designed to accommodate are not used, or used in inappropriate ways. |
From today's Observation Interest Group Meeting
|
Review of machine observation definition
Change to An output of a machine observation process. Machine observations include media evidence that can be independently reviewed but with no associated specimen. These observations are expected to have one or more associated media (e.g., image, audio or video recording). See also MachineObservation GitHub Issue |
Review of human observation definition
change to An output of a human observation process. Human observations are unvouchered (no associated specimen or media) and thus do not include evidence that can be independently reviewed. Human observations are expected to have NO parts and any associated media should be text-based. See also HumanObservation GitHub Issue |
We still have non-DWC terms, are collections OK with taking responsibility for being entirely excluded from DWC portals if they use them? If so, close. If not, we need something more. |
We will always have them because we have collections that don't care about Darwin Core - not sure we can fix that.... |
We do need to wrap up #5459 though. |
CT Committee meeting: I will be blamed for collections being excluded from DWC, therefore we need - uhhh, - something? Given the impact of this value and the existence of the GUM (eg we can now talk to GBIF without going through my horrid little translator) I think we should also use the values that the Standard demands, rather than arbitrary things which we have to define and then translate.
So why are we forcing them to choose something - should this be NULLable, or does it do something beyond DWC? (Probably not - this is some sort of not-great summary of parts and event type or something.) Does anyone know how eg GBIF would react to a NULL here - does that also trigger tossing the entire collection? |
YES - https://ipt.gbif.org/manual/en/ipt/latest/occurrence-data#required-dwc-fields
Because this field is REQUIRED - I don't know why that is so, but it was made that way for some reason. I am guessing it is because of the above (terms are REQUIRED for GBIF). Unless we can have this term apply ONLY to certain collection codes, people will choose NULL when they shouldn't and we will STILL be blamed for collections being excluded from publishing.
Nope - WE will be blamed. I still say that removing observation makes it easier for collections to select terms that GBIF will accept and removes the decision about whether they are HumanObservation or MachineObservation from you.
I really don't care - they are the same thing, just written differently. Again, if we change these terms will anyone even notice? If we do that, shouldn't we also just change this "field" from cataloged_item_type to basisOfRecord? These are the acceptable values in basisOfRecord LivingSpecimen NONE of these are appropriate for cultural or geological collections (which don't publish to GBIF, so...), however, hopefully soon MaterialEntity will be added to the list and that COULD be used by anyone. WE could start using it now and make the following changes: catalog_item_type -> basisOfRecord (The specific nature of the data record. This is required for publishing to GBIF) basisOfRecord Code Table (currently https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcataloged_item_type)
This means that cultural collections will need to be OK with using MaterialEntity in place of the Getty term "item" and geological collections will need to be OK with using MaterialEntity in place of their traditional term "specimen", but if the field is basisOfRecord instead of catalog_item_type, perhaps that is OK since they really don't care about basisOfRecord? Also note that our definitions for the terms ALREADY include the DWC definition with FUNCTIONAL descriptions added for Arctos users. Finally, a default is chosen by every collection, so a NULL in data entry gets filled in with the default. This means that cultural and geological collections just have to set their default and forget it. I would recommend removing this from the data summary at the top of a record to the curatorial box to make it less prominent (do we even need to see it at all on the record page?). This may go away with GBIF's new GUM, but I don't know when that will happen and we need to have a functional publishing system for what is required NOW. We can make wholesale changes as I have described OR we can just do #5459 and wait for the GUM. Also - we have to consider the fact that while GBIF develops GUM - all the other aggregators will probably still be using the old DWC-A, at least for a while, and we may have to do things two ways if we want data at SCAN or SeiNet... |
By GBIF, which is optionally on the other end of an exchange standard.... Our choices are
We're currently doing the latter, I was hoping the former had special sauce but it sounds like the functionality is identical - which still leaves me thinking we should allow NULL, unless someone wants the Getty-or-whatever values.
... yet. GBIF clearly knows about them and seems interested in broadening horizons, thanks to GUM.
I think they are, even if the terminology is inappropriate. Cultural collections catalog STUFF, and STUFF as remembered by people, and STUFF as documented by non-stuff evidence, and that's all the concept is trying to encapsulate. Don't think I'm interested in changing field names, that will always need mapping to go about anywhere, it's just the contents that provide an all-too-convenient path to failure.
That should not be any obstacle at all, the DWC would just be (transparently!) generated from GUM (which, again, will easily rename things but not - sanely, anyway - update data). |
Is there any reason not to go with these changes @Jegelewicz describes? I vote to move forward with the wholesale changes proposed. |
Based on @Jegelewicz comments above (plus mine involving "STUFF"), minus #5459 which is in process, here's a proposal which I believe is functionally identical to current data but without any capacity to cause problems or confusion with GBIF (and presumably other DWC-users).
|
I just found records entered by collections at my institution as "specimen" rather than "preserved specimen". I'm certain that the students entering these were not aware that by doing so, they would make these records invisible to GBIF. Can we just change "specimen" in the data entry dropdown to "MaterialEntity" to avoid this confusion? |
This is incorrect as discussed above. The proposal is still #2432 (comment). I can't change anything until it (or an alternative, or whatever) is somehow addressed. |
So does it not make a difference if our mammal collection is using "specimen"? Can we summarize or get a recommendation? This is a very long issue. |
It will make the COLLECTION "invisible," not individual records.
Fair enough, current proposal moved to a new issue, we're done here. |
New issue number? I need to raise this problem with MSB collections. Ideally, we should be able to select a preference in manage collection, so that random student mistakes don't jeopardize the publishing of our collections to aggregators? |
um, what? So if one of my techs chooses the wrong thing then what happens?? This sounds massively bad. |
Yes, that's why I've been freaking out since August 2022! (But this issue is dead, please comment at the link above or #6730.) |
I was recently made aware of the fact that fossil specimens in Arctos are not being properly translated to aggregators. If I search GBIF for UTEP Fossils (Arctos) with BasisOfRecord = "fossil specimen", I get nothing, yet this entire collection is fossils. This is going to be an issue as ALMNH:ES and NMMNH:Paleo go into GBIF. While we could take the easy way out and just send all ES collection types as "fossil specimen", I think we should be more precise as there are fossils in other collections as well. Also see #2094
I propose that we make better use of CATALOGED_ITEM_TYPE and use the categories suggested in GBIF for Basis of Record:
Observation
Machine observation
Human observation
Material sample
Literature
Preserved specimen
Fossil specimen
Living specimen
Unknown
This would also provide better choices for cultural collections.
The text was updated successfully, but these errors were encountered: