Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Table Request - New attribute: individual count #4032

Closed
Jegelewicz opened this issue Oct 22, 2021 · 68 comments
Closed

Code Table Request - New attribute: individual count #4032

Jegelewicz opened this issue Oct 22, 2021 · 68 comments
Labels

Comments

@Jegelewicz
Copy link
Member

Jegelewicz commented Oct 22, 2021

Goal
Accurately describe the number of individuals that participated in an occurrence per dwc:individualCount in order to pass appropriate information to aggregators.

Context
#3908 (comment)

Table
https://arctos.database.museum/info/ctDocumentation.cfm?table=ctattribute_type

Value
individual count

Definition
The number of individuals represented by this catalog record.

Attribute data type
number+units

Attribute value
integers

Attribute units
individuals

Priority
[ Please choose a priority-label to the right. ]

@Jegelewicz Jegelewicz added this to the Needs Discussion milestone Oct 22, 2021
@Jegelewicz
Copy link
Member Author

@acdoll @sharpphyl you may want to weigh in.

@dustymc
Copy link
Contributor

dustymc commented Oct 22, 2021

No real objections, but the documentation would need to be clear on what this is (some random thing that's never going to get updated?!) and what it can do (within Arctos: nothing that I can see).

@acdoll
Copy link

acdoll commented Oct 22, 2021

We are definitely in favor of this.

what it can do

Currently, the number of individual organisms in a lot is captured in 'lot count' - this is not passed on to the aggregators (nor should it be; per documentation lot count can describe the number of vertebrae in a box). E.g., https://arctos.database.museum/guid/DMNS:Inv:10020 has two shells in the lot (i.e. 2 individuals). But the GBIF record only reports 1 individual:
image

@dustymc
Copy link
Contributor

dustymc commented Oct 22, 2021

I agree it's a useful concept, I just don't think this is a suitable place for it.

  • record has 17 events for some reason (eg really great georeferencing history)
  • you find another individual hiding in the back of the drawer, so you need to go update all 17 events

and/or

  • you have 20 lots from the same space/time
  • you have to manage 20 events because the lots all contain a different number of individuals

I don't think either one of those scenarios are approachable by themselves, much less in the combinations that would come to exist in an active collection.

I think this would be much better as a catalog record attribute, even if that's not fully capable of dealing with the data in some fringe cases. (And it's pretty easy to avoid those situations if this kind of information is important.)

@Jegelewicz
Copy link
Member Author

record has 17 events for some reason (eg really great georeferencing history)
you find another individual hiding in the back of the drawer, so you need to go update all 17 events

this might get used here - #4033

Given that events end up as occurrences, I think this makes sense here. it is either that or as part of "specimen event". It is NOT in any way related to what parts are currently in or have been in the collection right now.

I think this would be much better as a catalog record attribute, even if that's not fully capable of dealing with the data in some fringe cases. (And it's pretty easy to avoid those situations if this kind of information is important.)

No because some records include actual distinct events that may or may not be about the same number of individuals.

you have 20 lots from the same space/time
you have to manage 20 events because the lots all contain a different number of individuals

If you have 20 lots with a different number of individuals from the same event then they all participated in the event and adding one count of individuals that is the sum of all the lot counts to that event should suffice?

@Jegelewicz
Copy link
Member Author

The thing is - no one HAS to use this and if it isn't there, we can just pass "1" as a default to dwc:individualCount. That seems potentially less worse than what we are doing now?

@dustymc
Copy link
Contributor

dustymc commented Oct 22, 2021

some records include actual distinct events that may or may not be about the same number of individuals.

I'm not sure that anyone who's dumping stuff into a lot over time is going to much care about this....

pass "1" as a default...less worse

"We don't have that information" is kinda always a defensible position. "... and so we've made something up!", not so much.

@Jegelewicz
Copy link
Member Author

But we are making stuff up now!

@sharpphyl
Copy link

Andy has already described the problem for our collection. We do not have multiple collecting events in one record, so I can't speak to that. The difference between one individual and two individuals that Andy pointed out could be meaningful to a researcher. A stronger case can be made for micromollusks which can occur in large numbers which could be important to assess the health of the population, etc. DMNS:Inv:29549 of Caecum bipartitum has 276 shells (in a tiny gel cap).

Screen Shot 2021-10-24 at 2 06 47 PM

GBIF shows one individual.

Screen Shot 2021-10-24 at 2 03 02 PM

As long as the data flows to GBIF as "Individual Count" it doesn't matter to me where I put the number of specimens in the Arctos catalog record.

We support Teresa's recommendation to include DWC Individual Count so that the aggregator records reflect the number of individuals found at that collecting event.

@sharpphyl
Copy link

@Jegelewicz Just checking when the Code Table Management team will meet to discuss this. I don't want this issue to drift into oblivion.

@dustymc
Copy link
Contributor

dustymc commented Nov 8, 2021

If you have 20 lots with a different number of individuals from the same event then they all participated in the event and adding one count of individuals that is the sum of all the lot counts to that event should suffice?

That is not reflective of how the data are structured.

@campmlc
Copy link

campmlc commented Nov 8, 2021

@dustymc is there a solution you can suggest? We do need this resolved.

@dustymc
Copy link
Contributor

dustymc commented Nov 8, 2021

#4032 (comment)

catalog record attribute

@mkoo
Copy link
Member

mkoo commented Nov 8, 2021

conceptually count doesnt belong in the collecting event-- I agree with Dusty that it is a attribute of the cataloged record.

If it's not getting passed on in the DwCA, then that's a mapping issue, not a CT or new thing for collecting event (which is location+date:time)

@Jegelewicz
Copy link
Member Author

If it's not getting passed on in the DwCA, then that's a mapping issue

We don't actually record this in a meaningful way anywhere, "part lot count" is not a usable value since we may have 3 parts from a single individual in a given catalog record.

conceptually count doesn't belong in the collecting event

Probably not - since multiple taxa can share a collection event, but it also does not belong as part of the catalog record either. The individual count expected at the aggregators is "The number of individuals present at the time of the Occurrence." What we are passing as "occurrences" are actually "specimen" events (please see #4036 because our terminology is all over the place and is also problematic). As discussed recently, using "specimen" events as an occurrence is problematic because we end up reporting two occurrences when there is only one. Here is an example:

https://arctos.database.museum/guid/DMNS:Mamm:12344
is from the same individual/collection event as
https://arctos.database.museum/guid/MSB:Mamm:233616

BUT they are passed to the aggregators as separate events/individuals

https://www.gbif.org/occurrence/1145096812
and
https://www.gbif.org/occurrence/1145267756

Careful consideration of associated occurrences and organism ID will suss this out, but it is a shame that we pass different organism IDs for each of these records. Even if we cleaned up our act and got them into the same collecting event, we would still be sending conflicting information.

Anyhoo. It is probably true that we have no good way to say how many individuals of a particular taxon took part in any given OCCURRENCE (collecting or observation event). Ideas are welcome because sending 1 when there are 276 is a bit misleading.

@dustymc
Copy link
Contributor

dustymc commented Nov 8, 2021

We don't actually record this in a meaningful way anywhere,

Correct - I magic it (poorly, probably) for some special circumstances, and there's some legacy not-quite-data from previous attempts of that hanging around. If we want to pass something meaningful on then we need to record it. (And I can magic - probably still poorly! - the initial values if needed.)

What we are passing as "occurrences" are actually "specimen" events

No, we are splitting catalog records at collecting events in an attempt to magick Occurrences out of the aether. What we are passing as Occurrences does not exist in Arctos; that's just not what gets cataloged.

@Jegelewicz
Copy link
Member Author

Jegelewicz commented Nov 8, 2021

What we are passing as Occurrences does not exist in Arctos; that's just not what gets cataloged.

Mostly - but I think some records with observation type events are pretty close.

we are splitting catalog records at collecting events

I think we are splitting them at "specimen" events - thus the seid?

Honestly the quoted statement is true for all physical collections in the data aggregators, but after looking at this, I do think there are some things we could be doing better.

So I guess I can go along with making this a collection object attribute even though it isn't really going to solve the whole problem. See updated request.

@Jegelewicz Jegelewicz changed the title Code Table Request - New collecting event attribute: individual count Code Table Request - New attribute: individual count Nov 8, 2021
@dustymc
Copy link
Contributor

dustymc commented Nov 8, 2021

some records ... are pretty close.

Most are.

them at "specimen" events

Same thing from the perspective of a single catalog record.

think there are some things we could be doing better

Always.

isn't really going to solve the whole problem

Nope, there are some ragged edges, but I think it does what the collections who seem to care about this want done without adding too much complexity or being too hard to understand in a decade or so.

The number of individuals represented by this catalog record.

That doesn't seem quite right, or complete, or something, but I'm struggling to come up with anything better. @sharpphyl help??

@Nicole-Ridgwell-NMMNHS
Copy link

I fully support adding this as a collection object attribute. Is there some way we can represent count = unknown in a way that GBIF would ingest correctly?

@Jegelewicz
Copy link
Member Author

@dustymc how does "INDIVIDUALCOUNT" get calculated?

INDIVIDUALCOUNT   individualCount,

@dustymc
Copy link
Contributor

dustymc commented Nov 9, 2021

#3901

@Jegelewicz Jegelewicz removed this from the Active Development milestone Feb 17, 2022
@dustymc dustymc reopened this Feb 17, 2022
@dustymc dustymc added this to the Active Development milestone Feb 17, 2022
@dustymc
Copy link
Contributor

dustymc commented Feb 18, 2022

This should be caching properly, and filtering out to things that use the cache, now.

Weird data - eg, multiple determinations - will break the cache, so there's a status on /guid/ pages in next release.

Screen Shot 2022-02-17 at 3 47 06 PM

The cache is just pulling the attribute values, so using anything other than 'individuals' for units, or any non-integer value with any units, will also do something interestingly fatal.

There is no default; not providing this attribute will result in individualcount=NULL being send out with DWC.

I can help bulkload initial data if necessary, just let me know how to calculate this.

@sharpphyl your collection had nothing, if you'll let me know how to get the initial values I can magic them in.

For collection_cde=Ento collections, the old code was using max(lot_count)

@mlbowser
@leet1984
@campmlc
@dssikes
@mvzhuang
@wellerjes
@terrymcglynn
@lin-fred
@Jegelewicz
@droberts49
@jrpletch
@lmtabak

For collection_cde=Fish collections, the old code was using

 sum(lot_count) 
      where 
        part_name like '%whole%' and
        coll_obj_disposition  not in ('discarded','used up','deaccessioned','missing','transfer of custody') 

@ebraker
@byuherpetology
@leet1984
@mkoo
@ewommack
@campmlc
@ccicero
@mvzhuang
@wellerjes
@lin-fred
@Jegelewicz
@gradyjt
@jandreslopez
@droberts49
@zmsch
@jrpletch
@lmtabak

@sharpphyl
Copy link

sharpphyl commented Feb 18, 2022

@sharpphyl your collection had nothing, if you'll let me know how to get the initial values I can magic them in.

The number of individuals in a catalog record for DMNS:Inv is in the field Qty under Parts.

Catalog Record Count

If a record has both a shell and an operculum, each part will appear separately but the number of individuals doesn't increase.

Screen Shot 2022-02-18 at 8 11 54 AM

Thanks for magicing them in.

Is this where we add the individual count attribute during data entry?

Screen Shot 2022-02-18 at 8 16 59 AM

I think we still add the count in the Qty field too so it shows as a "shell" or other part. Does that mess up anything?

@dustymc
Copy link
Contributor

dustymc commented Feb 18, 2022

@sharpphyl I think that means you want sum(lot_count) - 1 in your first screenshot, 2 in the second?

You should definitely continue to provide lot count - it's a completely different thing (and much more important, in my view).

Yep that's one place to edit Attributes.

@Jegelewicz the frontmatter on the parts doc page seems to be mangled and it's claiming you edited - fix?

@lin-fred
Copy link
Contributor

Thanks all for updating this. We don't have a good inventory yet of our specimens, it's all legacy numbers at the moment. I will keep this in mind for when we do an inventory!

@dustymc
Copy link
Contributor

dustymc commented Feb 18, 2022

@sharpphyl some data for your review:


create table temp_dmnsinvic as select
    guid,
    'individual count' as attribute,
    sum(lot_count)::text as attribute_value,
    'individuals' as attribute_units,
    'Phyllis Sharp' as determiner
from
    flat
    inner join specimen_part on flat.collection_object_id=specimen_part.derived_from_cat_item
    inner join coll_object on specimen_part.collection_object_id=coll_object.collection_object_id
where
    flat.guid_prefix='DMNS:Inv'
group by
    guid
;

temp_dmnsinvic.csv.zip

Let me know if it is as expected.

@sharpphyl
Copy link

sharpphyl commented Feb 18, 2022

@sharpphyl I think that means you want sum(lot_count) - 1 in your first screenshot, 2 in the second?

No, both of these records have only one organism. The first has only the shell and the second has both the shell and its operculum. There may be a few records where these aren't the same Qty, but I can adjust them manually. So if they are the same, that's the number of individuals in the record.

I looked at your csv and found specimens (e.g. https://arctos.database.museum/guid/DMNS:Inv:25570) that show 2 organisms where there is only one. Perhaps we should only count the number of shells if there is both a shell and an operculum and they have the same Qty. If they are different, we would use the larger quantity as the number or organisms.

I checked a few records where the part name is exoskeleton, test or whole organism and I didn't find any issues.

@dustymc
Copy link
Contributor

dustymc commented Feb 18, 2022

@sharpphyl I can't quite figure out how to interpret that.

Maybe just max (rather than sum) lotcount works for a first pass? That seems to work for the few examples so far.

Or below are your unique part name combos - maybe we can set this up as

when partaggregate='shell|whole organism' then do_some_thing
when partaggregate='operculum|shell|whole organism' then do_something_else
when.....

??

Note that these are determinations, you can adjust them as necessary, and both bulk loaders and unloaders are available.

               p                
--------------------------------
 shell|whole organism
 shell
 test
 operculum|shell|shell
 shell|shell
 test|whole organism
 whole organism
 shell|test
 operculum|shell|whole organism
 operculum|shell
 egg
 exoskeleton|shell
 egg case|operculum|shell
 operculum
 egg case|shell
 exoskeleton
 egg case
(17 rows)

@Jegelewicz
Copy link
Member Author

the frontmatter on the parts doc page seems to be mangled and it's claiming you edited - fix?

Give me a pointer so I can figure out where to go look?

@dustymc
Copy link
Contributor

dustymc commented Feb 18, 2022

There's no sidenav thingee

@Jegelewicz
Copy link
Member Author

No sidenav thingee where?

@dustymc
Copy link
Contributor

dustymc commented Feb 18, 2022

@sharpphyl
Copy link

sharpphyl commented Feb 19, 2022

Let's see if this helps.

Rule 1 - if there is only one part, use the value in Part Qty
Rule 2 - if there is a shell and an operculum, use the value in Part Qty for shell only. Do not add the operculum Qty.
Rule 3 - sum certain part Qty values as listed below

shell - use Qty
test - use Qty
exoskeleton - use Qty
egg case - use Qty
egg - I changed this to egg case
whole organism - use Qty
operculum - use Qty

operculum|shell - use the shell Qty only
operculum|shell|shell - sum the shell Qtys only
operculum|shell|whole organism - sum the shell and whole organism Qtys only

egg case|operculum|shell - use the shell Qty only - I only found one record for this - https://arctos.database.museum/guid/DMNS:Inv:22493 Are there others?

shell|whole organism - sum the Qtys
shell|shell - sum the Qtys
test|whole organism - sum the Qtys
shell|test - sum the Qtys
exoskeleton|shell - sum the Qtys
egg case|shell - sum the Qtys

@campmlc
Copy link

campmlc commented Feb 19, 2022 via email

@dustymc
Copy link
Contributor

dustymc commented Feb 19, 2022

@sharpphyl how's this?

temp_dmnsinvic.csv.zip

@sharpphyl
Copy link

I corrected the organism count on five odd records that didn't fit my "rules." Highlighted in yellow. I also added 10 records at the very bottom uploaded since you ran this report. If it looks ok to you, let's do it.

temp_dmnsinvic-2 - PMS edits.csv

Thanks for making magic.

@Jegelewicz
Copy link
Member Author

I thought that individual count could be a field to use to record all the individuals in a lot, regardless of number of parts?
And ideally could be used for other taxa, eg fish, tadpoles, parasites?

That's exactly what it is.

@dustymc dustymc closed this as completed Feb 23, 2022
@leet1984
Copy link

leet1984 commented Feb 23, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants