Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add phenopacket extraction template #484

Merged
merged 67 commits into from
Dec 13, 2024

Conversation

caufieldjh
Copy link
Member

No description provided.

@caufieldjh caufieldjh linked an issue Nov 21, 2024 that may be closed by this pull request
@caufieldjh
Copy link
Member Author

caufieldjh commented Dec 3, 2024

very initial results on a case report, without expanding the subcategories (from the text of PMID 10874631):

extracted_object:
  id: 0bc06850-bb1b-4ee4-914a-74df417ef5c0
  label: Mutations in the retinal specific ATP binding transporter gene (ABCR) related
    to retinal diseases
  subject: First cousin II.1; First cousin II.2
  phenotypic_features:
    - Bilateral loss of central vision
    - Gradual loss of visual acuity
    - Choriocapillaris atrophy
    - Night blindness
    - Retina pigmentary deposits
    - Macular dystrophy
    - Visual field showed a central scotoma
    - Photopically abnormal ERG
    - Progressive night blindness
    - Concentric reduction of visual field
    - Total blindness with visual acuity reduced to light perception
  measurements:
    - Visual acuity
    - Fundus appearance
    - Fluorescein angiography
    - Goldman perimetry
    - Electroretinogram (ERG)
  biosample:
    - Blood samples from the patients and family
  interpretations:
    - RP19 and STGD are allelic disorders at the ABCR locus despite clinical heterogeneity
  diseases:
    - Stargardt disease
    - Retinitis pigmentosa (RP19)
  medical_actions:
    - NA
  files:
    - NA
  meta_data: NA

@caufieldjh
Copy link
Member Author

This is now readily usable.
Example with the following vignette from phenopacket2prompt:

[source]
pmid = PMID:16962354
title = Functional analysis of mutations in TGIF associated with holoprosencephaly
[diagnosis]
disease_id = OMIM:142946
disease_label = Holoprosencephaly 4
[text]
A male proband presenting with lobar holoprosencephaly, atypical ventricles with small frontal horns,
hypothalamic and caudate fusion, diabetes insipidus, seizures,
premaxillary agenesis, microcephaly, absent nasal root and septum with a
depressed nasal tip (Fig. 1A and B).
$ ontogpt -vvv extract -t phenopackets -i ~/phenopacket2prompt/docs/cases/PMID_16962354.txt -m gpt-4o
extracted_object:
  phenopackets:
    - id: Proband_1
      subject:
        sex: MALE
        taxonomy: NCBITaxon:9606
      phenotypic_features:
        - description: Lobar holoprosencephaly
          type: HP:0006870
        - description: atypical ventricles with small frontal horns
          type: AUTO:ventricles
        - description: Fusion observed between the hypothalamus and the caudate region.
          type: AUTO:Hypothalamic%20and%20caudate%20fusion.
        - description: Excessive thirst and excretion of large amounts of severely
            diluted urine, with a reduction of fluid intake having no effect on the
            concentration of the urine.
          type: HP:0000873
        - description: seizures
          type: HP:0001250
        - type: AUTO:premaxillary%20agenesis
        - type: HP:0000252
        - description: absent nasal root and septum with a depressed nasal tip
          type: HP:0003196
      diseases:
        - term: MONDO:0007734
      meta_data: Associated study includes analysis of mutations in TGIF linked to
        Holoprosencephaly 4 (PMID:16962354)
named_entities:
  - id: HP:0006870
    label: Lobar holoprosencephaly
    original_spans:
      - 224:246
  - id: AUTO:ventricles
    label: ventricles
    original_spans:
      - 258:267
  - id: AUTO:Hypothalamic%20and%20caudate%20fusion.
    label: Hypothalamic and caudate fusion.
  - id: HP:0000873
    label: Diabetes insipidus
    original_spans:
      - 328:345
  - id: HP:0001250
    label: seizures
    original_spans:
      - 348:355
  - id: AUTO:premaxillary%20agenesis
    label: premaxillary agenesis
    original_spans:
      - 358:378
  - id: HP:0000252
    label: microcephaly
    original_spans:
      - 381:392
  - id: HP:0003196
    label: nasal hypoplasia
  - id: MONDO:0007734
    label: Holoprosencephaly 4
    original_spans:
      - 166:184

@caufieldjh caufieldjh marked this pull request as ready for review December 13, 2024 22:29
@caufieldjh
Copy link
Member Author

Will still need some tuning and fixes to improve extraction and grounding, of course.

@caufieldjh caufieldjh merged commit 7199885 into main Dec 13, 2024
4 checks passed
@caufieldjh caufieldjh deleted the 482-add-phenopacket-extraction-template branch December 13, 2024 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Phenopacket extraction template
1 participant