Can LinkML be used to model a vocabulary (structured conceptual model) in that originates in a UML model? #2187

nlharris · 2024-07-03T17:00:52Z

nlharris
Jul 3, 2024
Maintainer

(Saving as a discussion as per @cmungall's request)

Bart wrote (in Slack):

I have a UML model from which RDFS/OWL is generated. In my view, generating OWL or RDFS from a UML model is nevery pretty. Would people here agree? UML is CWA, structural and intends to correctly model a software system, whereas OWL/RDFS are OWA, lack structure and are intended to infer knowledge and check consistency.
SHACL seems a better fit, since it is CWA and can express structure through cardinality constraints. However, it lacks semantic expressivity such as designating what types of relations we're dealing with, andsoforth. Also it doesn't feel like quite the intended use for the language.
LinkML seems to be a perfect fit here. It has structure, allows for CWA (as well as OWA?), is OO and class based much like the UML model is. It is also more semantically expressive, allowing the modeling of relations as classes and support for reflexivity and transitivity (among other things).
TL;DR: Is LinkML indeed the suited language I deem it to be to model a vocabulary (structured conceptual model) in that originates in a UML model?

Chris:

I think this is a perfect summary, and this could go directly in a FAQ! We try and touch on this in the docs for the owl generator https://linkml.io/linkml/generators/owl.html but I think your summary captures the broader picture nicely.

I touched on this in my US2TS talk in 2023, see slide 45 onwards:
Scaling up semantics; lessons learned across the life sciences

But to directly answer your question, obviously we’d need to see the specific UML model to tell if there are likely issues, but even in the absence of that knowledge, yes, LinkML should be a good fit for a structured conceptual model in UML!

bartkl · 2024-07-03T18:16:34Z

bartkl
Jul 3, 2024

Following up just to make sure: you grant it's a suited language, but do I interpret you correctly that you deem it a better choice [for representing a UML class model] than RDFS/OWL for the reasons I laid out?

Furthermore: I'd be happy to contribute to the docs. I could try to make a pull request?

3 replies

bartkl Jul 4, 2024

@cmungall I actually do have another question/worry.

Suppose I have a class Person with an attribute name, and a class Company with an attribute name.

In UML, these name attributes cannot be assumed to be the same, and following the Closed World Assumption are therefore distinct. In LinkML, however, attributes are just syntactic sugar for inlined slots, so these name attributes do correspond to the same slot, right?
(As an aside: I find it pretty hard to wrap my head around what happens in this scenario: I'd have two separate definitions of the same slot, possibly with different descriptions and cardinalities. Would this amount to a single slot definition and slot usage equivalents?)

So it seems the LinkML metamodel turns out to be harder to map to than I thought. I'm not sure what's the best way to go about this.

The only option I see is to prefix attribute (or slot) names with the class they belong to, e.g. Person.name. This is really verbose and rather ugly, but it is correct and explicit. However, interestingly this does clash with the potential use of the . as a namespace separator in the namespacing discussion. Of course I could choose a different separator, but it's going to be even uglier then. Anyways, this option in its entirety is really not great.

I'm hoping I'm missing something. Any ideas?

cmungall Jul 5, 2024
Maintainer

Pull requests on docs most welcome!

You bring up an important point re "ownership" of attributes. There is some discussion of this going through the issue history (I don't expect anyone to read this). Previously, "mangled" attribute URIs were created for slot_usage and attributes. I think we could bring something like this back, allowing for a schema author to have the attribute be "owned" by the declaring class. I think using . notation in the URI fragment is a good choice here.

bartkl Jul 12, 2024

I guess what's unclear to me is: suppose I were to remove all/not use any slot_uris, how would the two attributes name (one in Person, other in Company) relate? They are slot definitions with the exact same native identifier, right? But they may have different constraints and descriptions andsoforth. So how is this parsed? Is this induced as a single slot with two slot usages?

bartkl · 2024-07-24T07:49:39Z

bartkl
Jul 24, 2024

If someone could follow up on this I would still love more information.

To clarify: I' want to understand what's happening when I define two attributes in different classes with the same name. Since both implicitly define a global slot with the same name, how does this work? I understand this slot will have both classes as applicable classes (I'm not sure I understand exactly what that means either), but how will this single global slot be applied in those two classes with different implementations (different descriptions for the slot, different cardinalities, etc.).

Thanks!

0 replies

sierra-moxon · 2024-07-24T17:17:41Z

sierra-moxon
Jul 24, 2024
Maintainer

Hi @bartkl -

I think @cmungall will always have better answers to your questions, but I will also try to add some info here as I just spent some time in SchemaView's induced_slot and class_induced_slots methods. :) TL;DR, if you're looking for a test that exercises these questions, here's one in a branch (might be easiest to just run it): slot-usage-attribtues, test.

Some helpful doc links to aid in our discussion here:

LinkML schemas should be monotonic (i.e., new constraints can be specified, but existing ones cannot be overridden). For example, if I have a slot name defined as mutlivalued: true, a class should not specify multivalued: false in name's slot_usage. Or if a slot is defined as required: true, it should not also be defined as required: false in slot_usage. More info here: https://linkml.io/linkml/schemas/slots.html#slot-usage. caveat: I think we have some work to do to make sure our validation framework encourages this kind of schema development.
LinkML makes a distinction between the asserted vs derived model. (see: https://linkml.io/linkml/schemas/derived-models.html#derived-models and https://linkml.io/linkml/faq/modeling.html#what-are-induced-slots)
LinkML has three ways to define a slot:
- in the class itself via attributes (attributes are not technically "reusable" between classes, but are inherited in child classes in the derived model).
- in the slots metamodel element as a stand-alone definition that can be applied to many classes
- (technically refined in ) in the slot_usage on a particular class. slot_usage can not be applied to a non-existent slot, but can be specified for slots defined in attributes as well as referenced in the slots meta model element of a class.
- caveat: I think we need to discuss if elements described in a class using the attributes construct should be returned in the slots collection on a ClassDefintion returned by SchemaView.get_class() method. Right now they are not.
LinkML is moving towards generators that employ SchemaView to walk schemas. https://linkml.io/linkml/developers/schemaview.html#schemaview and many generators rely on its get_class, get_slot, class_induced_slots, induced_slot methods to translate models into different serializations (e.g. Pydantic, JSONSchema, OWL generators). This has slightly different features than SchemaLoader (used by pythongen, shaclgen, others).

All that being said, here's a test that illustrates some of this (it builds on our runtime test schema, kitchen_sink.yaml, and there are a lot of existing tests that explore functionality already (but we need a bit more doc).

The two slots(aka: attributes) that try to exercise your question in this branch are age in years and neighborhood in the Person, Martian, and Venetian classes, respectively. I won't try to recreate the entire schema here, but I will try to summarize -- there are several cases to unpack:

if a slot is defined both in the metamodel element slots (schema level) and via attributes (class level), aka: the "Martian" example in the schema:
- SchemaView doesn't return the slot in the martian.slots collection.
- SchemaView does recognize the slot in the class.attributes collection, and uses the slot_usage of the attribute to populate the metadata on the class.attribute. (e.g. the slot is returned as required because this is defined in the slot_usage in the Martian class). I think this in particular, gets to the heart of your question about the difference between attributes and slots (but also, this feels a bit buggy to me - curious of others' thoughts here - I usually think of attributes and slots as different ways to say the same thing). The reverse is also true: attributes collection doesn't hold slots on a class, just the declared attributes themselves.

Schema snippet:

  Martian:
    is_a: Person
    attributes:
      age in years:
        required: true

Test output:

martian attributes:
slot.name: age in years
slot.range: None
slot.required: True
slot.multivalued: None

martian has no slots

If you use the SchemaView.class_induced_slots method, regardless of where the slot is defined, it will return the definition in the slot_usage for Martian. Note, the class_induced_slots method will also walk the Martian class hierarchy and return slots defined in parent classes.

martian induced slots
slot.name: age in years
slot.range: string
slot.required: True
slot.multivalued: None
slot.name: has employment history
slot.range: EmploymentEvent
slot.required: None
slot.multivalued: True
slot.name: has familial relationships
slot.range: FamilialRelationship
slot.required: None
slot.multivalued: True
...

For a class like Venetian that defines an attribute (neighborhood) outside of defining it as a shared slot in the slots metamodel in the schema, and tries to refine the definition of age in years to be required even though its slot definition in the slots metamodel element declares this slot to be not required like this:

classes:
  Venetian:
    is_a: Person
    slots:
      - age in years
    slot_usage:
      age in years:
        required: true
slots:
    age in years:
      range: integer
      required: false

SchemaView.get_class().slots for Venetian, returns the slots definition for that slot in this class (e.g. it is still showing NOT required despite trying to make it required in the slot_usage for this class):

venetian slots

slot.name: age in years
slot.range: integer
slot.required: False
slot.multivalued: None

In the return of the SchemaView.class_induced_slots method for Venetian, note that it does return that age in years is required:

venetian class induced slots
slot.name: age in years
slot.range: integer
slot.required: True

Noting here also, that next month's community call will also have a presentation on semantic web + LinkML and the complexity of slot_uri's and class_uri's in addition to this test, might be a good one to bring up and explore further.

1 reply

dalito Jul 25, 2024
Collaborator

Great summary / explanation! Thanks @sierra-moxon

bartkl · 2024-07-26T10:00:22Z

bartkl
Jul 26, 2024

Wow, thanks for the elaborate explanation @sierra-moxon!

First let me ask for some minor clarifications, and I will then present some thoughts of my own.

LinkML schemas should be monotonic (i.e., new constraints can be specified, but existing ones cannot be overridden)

Got it. And if your schema is not monotonic, currently this can cause unexpected results in induction and schema generation? Or is the behavior well-defined, but possibly not desired or intuitive to people?

Furthermore, your final example (with overriding the requiredness value in age in years) showcases non-monotonicity, right?

SchemaView does recognize the slot in the class.attributes collection, and uses the slot_usage of the attribute to populate the metadata on the class.attribute

So, if a slot exists already, and I create a class attribute with that slot's name, this is equivalent to using slot_usage of that slot in that class?

caveat: I think we need to discuss if elements described in a class using the attributes construct should be returned in the slots collection on a ClassDefintion returned by SchemaView.get_class() method. Right now they are not.

I noticed this. I'm not sure what's best, but this touches on the complexity I wish to comment on down below.

LinkML is moving towards generators that employ SchemaView to walk schemas.

That sounds great, as it will improve uniformity in approach and therefore consistency/predictability of the generators, correct? This encourages and makes easier acting against the derived model, which I think is the desired basis for generating schemas (with perhaps obscure exceptions). Let me know if I got that right :).

So far my questions. Now I'd like to share a few thoughts.

Honestly I worry a bit about the complexity involved here. The selling point of LinkML is its accessibility, but some of its semantics are honestly quite complex and hard to predict. Let me elaborate.

First of all, many who are drawn to LinkML are developers and IT people who don't have a background in formal knowledge representation. The concept of a slot as a first-class citizen is alien to them, and they expect attributes to be owned by classes. Not only is this not the case, but by naming them "attributes" it is suggested. I myself am very familiar with RDF and OWL, and at first I also assumed attributes are bound to classes, whereas slots are not, i.e. I thought of attributes as locally scoped slots. If this were supported it's expressivity I would appreciate. It would also make certain generators easier/safer to implement (many schema languages have locally scoped attributes/slots). On top of all this, the experience of having to be vigilant about potential name clashes of attributes across different classes is quite tedious and uncomfortable.

So it's unfamiliar and not explicit, but it's also quite complex. Attribute definitions either imply a new global slot definition, unless there's already one present, then it is a slot usage refinement. You can specify slots and attributes both, and then also have slot_usages. It can get really wild.

How I would move forward.

I understand metamodel changes are impactful and hard, so I want to try and suggest as cautiously and constructively as I can.

Some ideas and actual suggestions.

Simplify attributes/slots/slot usage

Simplify! Deprecate the attributes block and rework the slots block in classes to be able to cover defining slots inline as well as refining them (also deprecating slot_usage).

For example:

Before

classes:
  Person:
    slots:
      - person_id
    slot_usage:
      person_id:
        multivalued: true
    attributes:
      name:
        range: string

slots:
  person_id:
    range: string

After

This is way easier to write (less redundant and verbose), avoids the confusion with regards to what attributes are, and somewhat less complex also. It's clear everything is a slot explicitly.

classes:
  Person:
    slots:
      name:
        range: string
      person_id:
         multivalued: true
slots:
  person_id:
    range: string

Slot definitions only at the top-level

This would not only remove attributes, but not allow the definition of slots outside of the global slots`` block. This makes it perfectly explicit that slots are top-level (they can only be defined there) and using them is done through slot_usage. No complexity, no confusion. Note that here too a rework of the semantics of slotsandslot_usage` like above can be considered.

Attributes as locally scoped slots or different things from slots

Connecting with the intuition of many users: re-introduce attributes to be locally scoped slots. As discussed this would improve familiarity and accessbility, as well as enhance the expressivity of the language. This would make Person.name and Company.name in my question distinct properties, as expected in a class-based OOP metamodel.

This does probably have lots of impact, and this is by far the most complicated option of these suggested here.

Again, these are quite radical and I'm sure there's many implications worth considering, but I'm just thinking out loud. Definitely not at the stage where I would recommend anything. I really do want to help see LinkML getting adopted more :) and I hope this gets a constructive dialogue started.

0 replies

sneakers-the-rat · 2024-08-10T10:09:36Z

sneakers-the-rat
Aug 10, 2024
Collaborator

These are great thoughts and i share some of the confusion sometimes, appreciate the thread all y'all.

An attribute with the same name as a slot defined in the same schema (or schema-cluster, as imports are unfortunately always flat at the moment) is different, and thus effectively locally defined, at least programmatically. I think they also should be in identity too, though i am not sure about that since the RDF/JSON-LD generators are mostly the holders of the notion of 'identity' in the linked data sense right? and i don't have close knowledge of those yet.

there is sort of subtle behavior in induced slot that answers some of these questions i think.

an induced slot is created in one of two ways (that are relevant here)

induced_slot(slot_name: str, class_name: None): ...
induced_slot(slot_name: str, class_name: str): ...

those names are not absolute URIs, but the local name within the schema.

induced_slot works by propagating values down through an inheritance hierarchy of attributes and slots from the oldest ancestor to the most recent. it does not formally enforce a lot of the monotonicity expectations, that happens elsewhere if it happens except for maximum_ and minimum_value.

first we check if slot_name is in the attributes of any of the ancestor classes. recall slots and attributes are stored separately on the object, the ancestors won't pollute that distinction (if they do, it's a bug), so we can be sure we're only getting the attributes in this chunk. the logic is a little muddied by L1322, but it works out

if we are getting an attribute or slot in the context of a class (cls is not None), and the attribute is defined, we get that attribute object.
if we are getting an attribute or slot in the context of a class and the attribute is not defined, but the slot is defined, we get the slot object.
'' and the slot is not defined, we get a ValueError

https://github.com/linkml/linkml-runtime/blob/d501d424f33d86dd48690544e33fea2044028df8/linkml_runtime/utils/schemaview.py#L1318-L1337

then, down below, if we did not find an attribute, and have a slot, then we inherit all the slot values that are inheritable:

https://github.com/linkml/linkml-runtime/blob/d501d424f33d86dd48690544e33fea2044028df8/linkml_runtime/utils/schemaview.py#L1338-L1347

the rest of the method propagates slot_usage.

there is a missing piece of logic here and that is preserving the source of that slot in the case that it is modified by slot_usage, but not receiving its identity. URI is not one of the inheritable slots, but maybe a slot that is used by a class should have an is_a: {{slot_name}} set for it to indicate that it comes from that slot but is not exactly that slot.

to summarize - programmatically an attribute on a class is effectively private to that class, even if it shares a name with a slot.

as far as identity in a linked data context goes, i think there should always be both identifiers, and they aren't in conflict to me:

An attribute only defined on a class should have an identifier that indicates it as such (whether by . or /, i am somewhat ambivalent. i have stronger opinions on the concrete syntax than the materialized identity there): prefix:A.b. there should be no identity for that slot outside the schema: ie. prefix:b should not exist.
An attribute defined on an ancestor class should have a different identifier for that attribute. if A : is_a : C, then prefix:A.b and prefix:C.b should both exist independently. ideally they would also indicate that relationship - prefix:A.b : is_a : prefix:C.b but that doesn't happen rn afaik
An attribute defined on a class that also exists as a slot with the same name should have two separate, unrelated identities. prefix:A.b and prefix:b both exist.
A class with a slot, with or without slot_usage, should also have both identifiers. prefix:A.b and prefix:b both exist, and prefix:A.b should have some indicator of its relationship to prefix:b, though it may have different values/behavior as prefix:b.

@nlharris did a great job of explaining the current behavior w/ inheritance and declaration already so i'll skip that part. I wanted to add that i personally consider the possibility of getting an un-induced class through schemaview a bug - the SchemaDefinition and itsClassDefinitions et al is the literal representation of the written yaml, and schemaview should always guarantee the derived rather than asserted model form. I think that would ease a decent amount of the complexity in thinking about things esp for newbies :) - two things only, clearly delineated: concrete model-as-yaml, materialized model-as-schemaview.

to answer questions, then:

So, if a slot exists already, and I create a class attribute with that slot's name, this is equivalent to using slot_usage of that slot in that class?

no, they are different :)

I myself am very familiar with RDF and OWL, and at first I also assumed attributes are bound to classes, whereas slots are not, i.e. I thought of attributes as locally scoped slots. If this were supported it's expressivity I would appreciate.

attributes are bound to classes, at least programmatically. though i am not sure about identity either in the RDF/etc. generators or in the metamodel. attributes are locally scoped slots programmatically.

So it's unfamiliar and not explicit, but it's also quite complex. Attribute definitions either imply a new global slot definition, unless there's already one present, then it is a slot usage refinement. You can specify slots and attributes both, and then also have slot_usages. It can get really wild.

i think these can coexist peacefully as described above - all identifiers exist, and their relationships should be annotated in the materialized model.

SchemaView does recognize the slot in the class.attributes collection, and uses the slot_usage of the attribute to populate the metadata on the class.attribute. (e.g. the slot is returned as required because this is defined in the slot_usage in the Martian class). I think this in particular, gets to the heart of your question about the difference between attributes and slots (but also, this feels a bit buggy to me - curious of others' thoughts here

this does seem like a bug to me too - the same differentiation between slots and attributes from above should carry through.

eg. if we have a lineage like A : is_a : B, B : is_a : C and C : is_a : D, if D has a slot z, C has slot_usage for z, then either it should be invalid for B to define an attribute, or defining an attribute should break the propagation of slot_usage.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linked data Modeling Language

Can LinkML be used to model a vocabulary (structured conceptual model) in that originates in a UML model? #2187

{{title}}

Replies: 5 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Linked data Modeling Language

Can LinkML be used to model a vocabulary (structured conceptual model) in that originates in a UML model? #2187

nlharris Jul 3, 2024 Maintainer

Replies: 5 comments · 4 replies

bartkl Jul 3, 2024

bartkl Jul 4, 2024

cmungall Jul 5, 2024 Maintainer

bartkl Jul 12, 2024

bartkl Jul 24, 2024

sierra-moxon Jul 24, 2024 Maintainer

dalito Jul 25, 2024 Collaborator

bartkl Jul 26, 2024

Simplify attributes/slots/slot usage

Before

After

Slot definitions only at the top-level

Attributes as locally scoped slots or different things from slots

sneakers-the-rat Aug 10, 2024 Collaborator

nlharris
Jul 3, 2024
Maintainer

Replies: 5 comments 4 replies

bartkl
Jul 3, 2024

cmungall Jul 5, 2024
Maintainer

bartkl
Jul 24, 2024

sierra-moxon
Jul 24, 2024
Maintainer

dalito Jul 25, 2024
Collaborator

bartkl
Jul 26, 2024

sneakers-the-rat
Aug 10, 2024
Collaborator