Skip to content

MLLM explodes on vocabularies with large number of skos:Collection members #929

@mjsuhonos

Description

@mjsuhonos

Noticed this on Annif 1.4.1 but confirmed it on 1.3.1 as well.

I have a vocabulary with ~100K labels that are all part of a skos:Collection; ie. each entity has a skos:member statement. When using MLLM to train even a very small document set, eg. 5 documents, RAM usage during training rapidly spikes to the point that my laptop becomes unresponsive. On a VPS this gives a regular OOM condition (hundreds of GB).

When I remove only the skos:Collection and skos:member statements, MLLM training runs as normal.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions