Skip to content

[Bug]: HUNFLAIR2_TUTORIAL_4_CUSTOMIZE_LINKING #3686

@pierrelarmande

Description

@pierrelarmande

Describe the bug

Hello,

When I attempt to reproduce the code from Tutorial 4, I encounter the following error message. Then linker.predict() returns no results.

EntityMentionLinker predicts: Dictionary None (entity type: disease)

To Reproduce

import json
import flair
from flair.data import Sentence
from flair.models import EntityMentionLinker
from flair.datasets.entity_linking import (
    InMemoryEntityLinkingDictionary,
    EntityCandidate,
)
from collections import defaultdict
with open("hp.json") as fp:
    data = json.load(fp)

nodes = [n for n in data['graphs'][0]['nodes'] if n.get('type') == 'CLASS']
hpo = defaultdict(list)
for node in nodes:
    concept_id = node['id'].replace('http://purl.obolibrary.org/obo/', '')
    names = [node['lbl']] + [s['val'] for s in node.get('synonym', [])]
    for name in names:
        hpo[name].append(concept_id) 
        
from flair.datasets.entity_linking import (
    InMemoryEntityLinkingDictionary,
    EntityCandidate,
)

database_name="HPO"

candidates = [
    EntityCandidate(
        concept_id=ids[0],
        concept_name=name,
        additional_ids=ids[1:],
        database_name=database_name,
    )
    for name, ids in hpo.items()
]

dictionary =  InMemoryEntityLinkingDictionary(
    candidates=candidates, dataset_name=database_name
)

pretrained_model="cambridgeltl/SapBERT-from-PubMedBERT-fulltext"
linker = EntityMentionLinker.build(
                pretrained_model,
                dictionary=dictionary,
                hybrid_search=False, 
                entity_type="disease",
            )


sentence = Sentence(
    "The mutation in the ABCD1 gene causes X-linked adrenoleukodystrophy, "
    "a neurodegenerative disease, which is exacerbated by exposure to high "
    "levels of mercury in mouse populations."
)
linker.predict(sentence)
print(sentence)
for entity in sentence.get_spans('disease'):
    print(entity)
    for link in entity.get_labels("el"):
        print(link)

Expected behavior

X-linked adrenoleukodystrophy
neurodegenerative disease

Logs and Stack traces

Embedding `HPO`: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 155/155 [00:59<00:00,  2.60it/s]
2025-10-03 08:30:28,000 EntityMentionLinker predicts: Dictionary `None` (entity type: disease)
Sentence[28]: "The mutation in the ABCD1 gene causes X-linked adrenoleukodystrophy, a neurodegenerative disease, which is exacerbated by exposure to high levels of mercury in mouse populations."

Screenshots

No response

Additional Context

No response

Environment

accelerate==1.10.1
attrs==25.3.0
beautifulsoup4==4.14.2
bioc==2.1
blis==0.7.11
boto3==1.40.41
botocore==1.40.41
catalogue==2.0.10
certifi==2025.8.3
charset-normalizer==3.4.3
click==8.3.0
confection==0.1.5
conllu==4.5.3
contourpy==1.3.2
cycler==0.12.1
cymem==2.0.11
deprecated==1.2.18
docopt==0.6.2
en-core-sci-sm @ https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_sm-0.5.1.tar.gz
filelock==3.19.1
fixed-install-nmslib==2.1.2
flair==0.15.1
fonttools==4.60.1
fsspec==2025.9.0
ftfy==6.3.1
gdown==5.2.0
hf-xet==1.1.10
huggingface-hub==0.35.3
idna==3.10
intervaltree==3.1.0
jinja2==3.1.6
jmespath==1.0.1
joblib==1.5.2
jsonlines==4.0.0
kiwisolver==1.4.9
langcodes==3.5.0
langdetect==1.0.9
language-data==1.3.0
lxml==6.0.2
marisa-trie==1.3.1
markupsafe==3.0.3
matplotlib==3.10.6
more-itertools==10.8.0
mpld3==0.5.11
mpmath==1.3.0
murmurhash==1.0.13
networkx==3.4.2
numpy==2.2.6
packaging==25.0
pathlib-abc==0.1.1
pathy==0.11.0
pillow==11.3.0
pptree==3.1
preshed==3.0.10
protobuf==6.32.1
psutil==7.1.0
pyab3p==0.1.1
pybind11==3.0.1
pydantic==1.10.24
pyparsing==3.2.5
pysocks==1.7.1
python-dateutil==2.9.0.post0
pytorch-revgrad==0.2.0
pyyaml==6.0.3
regex==2025.9.18
requests==2.32.5
s3transfer==0.14.0
safetensors==0.6.2
scikit-learn==1.7.2
scipy==1.15.3
scispacy==0.5.1
segtok==1.5.11
sentencepiece==0.2.1
setuptools==80.9.0
six==1.17.0
smart-open==6.4.0
sortedcontainers==2.4.0
soupsieve==2.8
spacy==3.4.4
spacy-legacy==3.0.12
spacy-loggers==1.0.5
sqlitedict==2.1.0
srsly==2.5.1
sympy==1.14.0
tabulate==0.9.0
thinc==8.1.12
threadpoolctl==3.6.0
tokenizers==0.22.1
torch==2.8.0
tqdm==4.67.1
transformer-smaller-training-vocab==0.4.2
transformers==4.56.2
typer==0.7.0
typing-extensions==4.15.0
urllib3==2.5.0
wasabi==0.10.1
wcwidth==0.2.14
wikipedia-api==0.8.1
wrapt==1.17.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions