Skip to content
This repository has been archived by the owner on Oct 19, 2023. It is now read-only.

2.4 Core elements of the standard

jbstatgen edited this page Jun 16, 2022 · 61 revisions

As described in the Summary of LtC classes section, the Latimer Core standard (version 1) is made up of 23 classes, each with two or more properties. The central concept of the standard is the ObjectGroup class, which represents 'an intentionally grouped set of objects with one or more common characteristics'. Arranged around the ObjectGroup are a set of classes, such as GeographicOrigin, Taxon and GeologicalContext, which are commonly used to describe and classify the objects within the ObjectGroup. There are a set of classes (OrganisationalUnit, CollectionStatusHistory and StorageLocation) intended to reflect aspects of the custodianship, management and tracking of the collections, and a generic class (MeasurementOrFact) for storing metrics, narratives and other qualitative or quantitative measures within the standard. A further set of generic, reusable classes are included that enable common concepts (such as people, identifiers and references) to be attached to multiple classes within the standard. Finally, there’s a set of classes that are used to describe the structure of and metadata about the Latimer Core records and contents themselves.

These (loose and informal) categorisations are illustrated in Figure 1 below, and the concepts are described in more detail later in this section.

Fig1

Figure 1: Approximate categories for Latimer Core classes for describing the object group’s characteristics (green), collections custody (purple), generic reusable information-types (dark blue), metrics (red), and data structure and links (light blue).

The ObjectGroup concept

The ObjectGroup is the core class and concept of the Latimer Core standard. It represents any set of physical or digital objects that we want to describe together as a group, as opposed to representing and describing each individual separately. This will generally be for one or both of the following reasons:

  1. There aren't yet, or may never be, individual digital records for all of the separate objects, but we still want to capture, use and share data about them at a higher level, or
  2. There is information that applies to the group as a whole (e.g. a narrative or history, a registration number or a contact person), so we want to store this information at the group level rather than duplicating it across multiple object-level records.

In an ideal world where every object in global Natural Science collections has a fully populated and clean digital record, the first reason would disappear. Collection descriptions could then be assembled dynamically by aggregating data from the underlying object-level records. However, even in that far-off scenario where the majority of the data can be assembled automatically from object level records, there are still likely to be use cases for managing some information at the group level.

What can an ObjectGroup represent?

The Latimer Core definition of an ObjectGroup is

“An intentionally grouped set of objects with one or more common characteristics.”

This is a purposefully broad definition that allows the standard to encompass a wide range of use cases for describing collections or their parts for different reasons. Although the most common and familiar use of collection descriptions has been to describe the collection of a particular institution, conceptually an ObjectGroup can represent any set and number of objects that need to be grouped for any purpose. This can scale from a few objects in a drawer to the sum total of all the collections of global natural science institutions: both of these can be represented by an ObjectGroup, as can any scale in between.

Some more practical examples of what may be represented by an ObjectGroup include:

  • The collection of a single institution
  • The herbarium collection of a single institution
  • The invertebrates wet collection of a single institution
  • A named collection provided by a specific donor, or collected by a significant collector
  • The Mesozoic mammal collection of an institution
  • The unincorporated objects of a particular department within an institution
  • The objects in a single drawer, cabinet or room for inventory purposes
  • A registration lot of objects brought into an institution together
  • A database of images taken from collection specimens
  • A dataset of butterfly observations taken as a monitoring time series
  • An ex situ tree plantation rescuing remaining genotypes of European ash
  • A virtual collection representing all penguin individuals and populations in zoos and aquaria worldwide
  • Collections of sightings of animals as part of environmental impact/mitigation projects (e.g., Toads on Roads)

Each of these represent a number of objects that are grouped for a certain purpose, with one or more common characteristics (for example, belonging to the same institution, being from the same stratigraphic time period, or being collected by the same person). Those characteristics are described by the ObjectGroup’s properties and the other associated classes in Latimer Core, as summarised in Figure 2.

Fig2

Figure 2: A summary of the ObjectGroup's properties and other associated classes.

Institutions and organisational units

Collections are held by caregivers. Some collections start out as privately held assemblages, associated generally with a single person. Over the longer term, most surviving collections become preserved and managed by institutions, including for example public or private institutions, organisations, and corporations. Institutions are the administrative entities, which are stewards or owners of collections, provide administrative services, maintain and preserve collections, and often employ or dedicate responsible staff to actively manage collections.

For the purpose of integration into a globally applicable collections registry schema, private collections can be considered to be held by a person of a corresponding role (e.g. “steward”) and be part of a theoretical organisation (e.g. “independent operation”, “private residence”). For a more general discussion of organisations, persons and groups as agents see the FOAF Vocabulary Specification (http://xmlns.com/foaf/spec/).

In Latimer Core, the institution concept is inspired by the more generic framework provided by "The Organization Ontology" W3C standard (https://www.w3.org/TR/vocab-org/). Institutions can be represented by the ltc:OrganisationalUnit class, which is more broadly defined than the corresponding term in the org-standard. In Latimer Core, its definition combines characteristics of both the org:Organization and org:OrganizationalUnit classes, providing more flexibility. Plans for the future include further development of the intersection between org and ltc, to improve alignment and thus enable seamless extension and integration of both standards by each other.

The Latimer Core class OrganisationalUnit provides a simplified approach to representing institutional information and institutional structure, for example, for a local collection management catalogue maintained by a collection-holding institution or a local research group managing their research collections and groups of specimens on loan from a variety of scientific collections. In these cases, it can be sufficient to provide the ltc:OrganisationalUnits directly responsible for the collections or groups of objects. It might not be necessary to represent complete, sometimes byzantine institutional hierarchies.

For larger-scale, e.g. national to regional and global, registry projects, as for example GRSciColl (https://www.gbif.org/grscicoll) and GGBN (https://www.ggbn.org/ggbn_portal/), as well as longer-term LtC-based implementations that are expected to grow and develop, we recommend to take advantage of the full functionality and power of the Organization Ontology standard for representing institutions, their legal status, structure, employees, addresses, etc. The seamless linking and extension of Latimer Core by the Organization Ontology W3C standard, and the other way around, is made possible by the Darwin Core dwc:ResourceRelationship class (see below). Thereby, the integration of collection and institution registries, e.g. the GBIF Registry of Scientific Collections (https://www.gbif.org/grscicoll) and the Research Organization Registry (https://ror.org/) is enabled (see e.g. Politze 2021, https://easychair.org/publications/open/r8Dv).

The ltc:OrganisationalUnit class can represent, in addition to institutions as (legal) entities, subdivisions at different levels within an institution (e.g. departments, divisions, sections, faculties etc), as well as organisations at a higher level of the collection-holding institution (for example, a museum may be a subunit within the organisational structure of a university). Terms to be used in predefined vocabularies identifying the hierarchical position of an organisational unit within an organisational structure can be found in the “Academic Institution Internal Structure Ontology” (AIISO; https://vocab.org/aiiso/).

OrganisationalUnits can be linked together into a hierarchy or structure using the dwc:ResourceRelationship class borrowed from the Darwin Core standard (DWC). Similarly, the links between the institutional level (implemented within the LtC or following the ORG standard) and the collection level (LtC standard) are build using the dwc:ResourceRelationship class. The dwc:relationshipOfResource property of the class should be used to describe the nature of the relationship. It is recommended best practice (see https://dwc.tdwg.org/list/#dwc_ResourceRelationship) to use a controlled vocabulary to describe those relationships. Our suggestion is to use the relationship types listed as examples by the Darwin Core webpage: “same as”, “part of”, “contains”. Alternatively, AIISO terms can be used, e.g. part_of, responsibilityOf, responsibleFor.

An important element to stress is the distinction between the organisational unit that holds a collection, and the collection itself. An institution is represented by the class OrganisationalUnit and its properties. The collection that the institution holds is represented by an ObjectGroup. The two can be linked together to represent the custodianship of the OrganisationalUnit for the ObjectGroup. This means that the properties describing the institution are restricted to the OrganisationalUnit and associated classes, while any data describing the collection and the objects within it are represented either by ObjectGroup properties, or instances of other Latimer Core classes linked to the ObjectGroup.

Within the ObjectGroup-centric Latimer Core model, associated OrganisationalUnits are essentially treated as properties of the ObjectGroup, rather than as a higher level in a rigid hierarchy. The notation for the property organisationalUnitName of the class OrganisationalUnit is correctly ltc:ObjectGroup.OrganisationalUnit.organisationalUnitName or in a short form ltc:OrganisationalUnit.organisationalUnitName, since in LtC the ObjectGroup class is per default the core element.

In a general graph context (Figure 3), both classes, ObjectGroup and OrganisationalUnit, can be considered to be at the same level and can be directly associated, that is linked (they are within each other’s range, see section 2.5). This link can be looked at and approached from both directions, the point of view taken thereby varies between concrete use cases and serialisations. The collection-centric LtC perspective is to approach the association from the side of the ObjectGroup, therefore, in LtC OrganisationalUnit is a class-level property of ObjectGroup, accordingly in Table 1 of section 2.5, only OrganisationalUnit can be attached to ObjectGroup and not the other way around. However, in a context outside of LtC, e.g. with a focus on describing organisations (cp. the W3C org ontology), ObjectGroup will be a class-level property of Organization or OrganizationalUnit.

image

Figure 3: Diagram showing the bidirectional link between the OrganisationalUnit and ObjectGroup classes.

Generic Classes

Some classes can be used in multiple contexts within the LtC standard. These classes, listed in Table 1, are referred to here as ‘generic’ classes.

Generic classes represent a type of entity or concept that can be a property of, or associated with, several LtC classes. An OrganisationalUnit, Taxon, and Person all might require an Identifier of some sort, for instance. Rather than explicitly repeating the same set of fields within each class, one or more instances of the generic Identifier class can be used instead.

Generic Class Range
Reference Identifier, TemporalCoverage, RecordLevel, CollectionStatusHistory, ResourceRelationship, GeographicOrigin, GeologicalContext, ChronometricAge, Taxon, Event, Person, PersonRole, ObjectGroup
Identifier Reference, RecordLevel, GeographicOrigin, GeologicalContext, ChronometricAge, Taxon, StorageLocation, Event, ObjectClassification, OrganisationalUnit, Person, ObjectGroup, CollectionDescriptionScheme
TemporalCoverage CollectionStatusHistory, Event, PersonRole, measurementOrFact
MeasurementOrFact ObjectGroup, OrganisationalUnit, GeographicOrigin, GeologicalContext, ChronometricAge, ObjectClassification, Taxon, TemporalCoverage, Person, StorageLocation, Event
PersonRole ObjectGroup, RecordLevel, OrganisationalUnit, MeasurementOrFact, Event
Person CollectionStatusHistory, Event, PersonRole, measurementOrFact, Taxon, TemporalCoverage, StorageLocation, ContactDetail, Address, OrganisationalUnit, SpecimenIdentifierSystem, Reference
Address Person, OrganisationalUnit, StorageLocation
ContactDetail Person, OrganisationalUnit, ObjectGroup, PersonRole
ResourceRelationship ObjectGroup, RecordLevel

Table 1: List of Latimer Core generic classes and ranges

Figures 4, 5 and 6 illustrate how instances of the generic Identifier class can be used to represent Person, institutional (OrganisationalUnit) and Taxon identifiers.

{
    "@context":
    {
        "ltc": "http://rs.tdwg.org/ltc/"
    },
    "@type": "ltc:OrganisationalUnit",
    "ltc: Identifier":
    [
        {
            "@type": "ltc:Identifier",
            "ltc:identifierType": "Acronym",
            "ltc:identifier": "EMNH"
        },
        {
            "@type": "ltc:Identifier",
            "ltc:identifierType": "Acronym",
            "ltc:identifier": "EM(NH)"
        },
        {
            "@type": "ltc:Identifier",
            "ltc:identifierType": "URI",
            "ltc:identifierSource": "https://ror.org/",
            "ltc:identifier": "099zvsn29"
        }
    ],
    "ltc:organisationalUnitName": "Erewhon Museum of Natural History",
    "ltc:organisationalUnitType": "Institution",
    "ltc:organisationalUnitAlternativeName": "Museum of Natural History, Erewhon"
}

Figure 4: Using the Identifier class to represent an institutional identifier. (JSON file in LtC repo)

{
    "@context":
    {
        "schema": "https://schema.org/",
        "abcd": "http://rs.tdwg.org/abcd/",
        "ltc": "http://rs.tdwg.org/ltc/"
    },
    "@type": "ltc:Person",
    "ltc:identifier":
    [
        {
            "@type": "ltc:Identifier",
            "ltc:identifierType": "URI",
            "ltc:identifierSource": "https://www.worldcat.org/identities/",
            "ltc:identifier": "lccn-n85816396"
        },
        {
            "@type": "ltc:Identifier",
            "ltc:identifierType": "URI",
            "ltc:identifierSource": "https://www.wikidata.org/wiki/",
            "ltc:identifier": "Q371568"
        },
        {
            "@type": "ltc:Identifier",
            "ltc:identifierType": "URI",
            "ltc:identifierSource": "https://viaf.org/viaf/",
            "ltc:identifier": "38383989"
        }
    ],
    "schema:additionalName": "Eileen Doris",
    "schema:familyName": "Courtenay-Latimer",
    "schema:givenName": "Marjorie",
    "abcd:fullName": "Marjorie Eileen Doris Courtenay-Latimer"
}

Figure 5: Using the Identifier class to represent a person identifier. (JSON file in LtC repo)

{
    "@context":
    {
        "dwc": "http://rs.tdwg.org/dwc/",
        "ltc": "http://rs.tdwg.org/ltc/"
    },
    "@type": "ltc:Taxon",
    "ltc:Identifier":
    [
        {
            "@type": "ltc:Identifier",
            "ltc:identifierType": "URI",
            "ltc:identifierSource": "https://www.checklistbank.org/dataset/9812/taxon/",
            "ltc:identifier": "84SNJ"
        },
        {
            "@type": "ltc:Identifier",
            "ltc:identifierType": "URI",
            "ltc:identifierSource": "https://marinespecies.org/authority/metadata.php?lsid=",
            "ltc:identifier": "urn:lsid:marinespecies.org:taxname:116116"
        },
        {
            "@type": "ltc:Identifier",
            "ltc:identifierType": "URI",
            "ltc:identifierSource": "https://www.wikidata.org/wiki/",
            "ltc:identifier": "Q2391885"
        }
    ],
    "dwc:order": "Priapulimorphida",
    "dwc:family": "Priapulidae",
    "dwc:genus": "Priapulus"
}

Figure 6: Using the Identifier class to represent a taxon identifier. (JSON file in LtC repo)

The overall purpose of any given instance of a generic class can be inferred from its context in the record: a ContactDetail that is associated with or nested within an OrganisationalUnit is an organisational contact point, but a ContactDetail associated or nested within a Person should be interpreted as that Person’s preferred method of contact (illustrated in Figure 7).

ContactDetail Class representing email contacts formatted in JSON

Figure 7: Using the ContactDetail class to represent an institutional email address and a named individual’s email address. (JSON file in LtC repo)

There are several benefits to taking this approach, using the Identifier class as an example:

  1. Flexibility: a generic Identifier class can be used to represent any identifier relevant to classes within its range, so the standard needn’t attempt to anticipate all potential current or future use cases.
  2. Extensibility: Easier addition of supporting properties to the concept: e.g. identifier is further described by identifierSource, identifierType etc.
  3. Reduced number of properties in the standard.
  4. Support for the association of multiple identifiers (as shown in Figures 4, 5 and 6) with the same entity: see the Repeatable properties and classes section.

People and roles

To handle people and roles, Latimer Core uses a simplified version of the RDA/TDWG Attribution metadata recommendations (https://datascience.codata.org/articles/10.5334/dsj-2019-054/), which is substantially based on the PROV ontology (https://www.w3.org/TR/prov-o/). Using this approach avoids both the hardcoding of all potential roles people might play in relation to the different classes in the standard, and the proliferation of similar properties that would be required to support them.

To this end, there are two classes within the standard that are used to represent people and roles: Person and PersonRole. The Person class is used to describe the relevant person, and the PersonRole to attach that person to another class and define the role that they played in the context of that class (and, optionally, the time frame during which they fulfilled that role). PersonRole is a generic class (see previous section) which can be attached to a range of other classes within the standard, as demonstrated in Figures 8 and 9 below.

Fig8

Figure 8: Linking people to other classes using the PersonRole class.

Fig9

Figure 9: An example of using the Person and PersonRole classes to link people to collections, organisations and digital records in the relevant contexts.

The Latimer Core implementation differs from the RDA/TDWG recommendations in a couple of key respects:

  • Rather than use the PROV Agent entity, which has subclasses of Person, Organization and SoftwareAgent, Latimer Core currently only uses the ‘Person’ subtype within this model, and OrganisationalUnit is represented by a separate class.
  • The PROV model connects the Agent to an Entity using an Activity class, with the role represented by a qualified association between the Agent and the Activity. In the current version of Latimer Core, the activity concept has been removed from the model to simplify the association. These modifications are largely to reduce the level of abstraction in the initial version of Latimer Core, and make the classes more familiar and model simpler for users of the standard. There may be potential to use the full Agent scope and introduce the Activity concept in future iterations of Latimer Core to more fully align with PROV and possible developments within other TDWG standards.

Metrics and narratives

Two components frequently included in collection descriptions are quantitative metrics (e.g. the precise or estimated number of objects or taxa) and richer narratives about the collection and its history. To represent these, Latimer Core has adopted the MeasurementOrFact class used in Darwin Core and ABCD, and added a measurementFactText property. This property is for holding the richer text narratives in collection descriptions and if required categorising them using the measurementType property. This might be used, for example, to define and store a historical narrative and brain-dump of notes from a retiring curator about a collection separately. It also separates the text descriptions from quantitative measures stored in the measurementValue property, which is intended to make validation and programmatic aggregation and computation on the measurementValue property more straightforward.

"ObjectGroup": {

    "collectionName": "Collection of the Natural History Museum, London",

    "MeasurementOrFact": [
	{
	    "measurementType": "Object count",
	    "measurementValue": "72190070"
	},
	{
	    "measurementType": "Collection history",
	    "measurementMethod": "Text narrative",
	    "measurementRemarks": "Text derived from Wikipedia",
	    "measurementFactText": "The foundation of the collection was that of the Ulster doctor Sir Hans Sloane (1660–1753), who allowed his significant collections to be purchased by the British Government at a price well below their market value at the time. This purchase was funded by a lottery. Sloane's collection, which included dried plants, and animal and human skeletons, was initially housed in Montagu House, Bloomsbury, in 1756, which was the home of the British Museum."
	}
    ]
}

Figure 10: JSON example of a quantitative metric and textual narrative using the MeasurementOrFact class.

Similarly to the other generic classes described earlier, the MeasurementOrFact class can be associated with many of the other classes in the standard to add defined, dynamic properties, and is also repeatable.

It’s also possible to use the MeasurementOrFact class to qualify or quantify relationships between other classes in the standard, in conjunction with the ResourceRelationship class. This is covered in more detail in the later section on modelling approaches.

Metric examples

There are no prescribed metrics included in the standard. It is hoped that over time data requestors/aggregators will define the metrics that they require using the standard and its MeasurementOrFact class. Data providers can then use these as recipes to follow to generate the information needed to facilitate easier and more automated sharing and comparison.

Metric Description measurementType
object count The total number of specimens and/or items. If the metric is attached to an object group then the count is of the collection being described. If the metric is attached to the Institution then it is the overall count for that institution.
digitisation level percentage The percentage of the whole collection being described that is digitised [define digitised]. Could be used in combination with the object count metric.
digitisation level count An actual number of digitised records in the collection being described [define digitised]
imaged level percentage The percentage of the collection described in the record that has an image. Could be used in combination with the object count metric.
imaged level count An actual number of digitised records with images
georeferenced level percentage The percentage of the collection described in the record that has verified Lat Lon coordinates. Could be used in combination with the "object count" metric.
georeferenced level count The actual number of records that have verified Lat Lon coordinates
taxonomic rank Count of the taxa at the rank indicated in measurementFactText. Do not abbreviate the rank.
storage volume The cubic volume of the record (Institutional or Collection). Could be used in combination with the object count metric.
storage footprint The area of the record (Institutional or Collection). Could be used in combination with the object count metric.

Table 2: Example LtC simple metrics in tabular format (CSV file in LtC repo)

Metric Description measurementType
https://github.com/tdwg/mids/blob/working-draft/current-draft/MIDS-definition-v0.15-29Jul2021.md#44-information-elements-expected-at-mids-level-0 MIDS-0 object count
https://github.com/tdwg/mids/blob/working-draft/current-draft/MIDS-definition-v0.15-29Jul2021.md#43-information-elements-expected-at-mids-level-1 MIDS-1 object count
https://github.com/tdwg/mids/blob/working-draft/current-draft/MIDS-definition-v0.15-29Jul2021.md#43-information-elements-expected-at-mids-level-2 MIDS-2 object count
https://github.com/tdwg/mids/blob/working-draft/current-draft/MIDS-definition-v0.15-29Jul2021.md#43-information-elements-expected-at-mids-level-3 MIDS-3 object count

Table 3: Example MIDS metrics in tabular format (CSV file in LtC repo)