Skip to content

Commit

Permalink
Merge pull request #9 from opentargets/return-of-the-uris
Browse files Browse the repository at this point in the history
Convert mappings back into URIs, normalise and update the script
  • Loading branch information
ireneisdoomed authored Sep 2, 2022
2 parents b7f9d73 + 3a464dd commit f6d3be9
Show file tree
Hide file tree
Showing 3 changed files with 25,379 additions and 25,570 deletions.
18 changes: 18 additions & 0 deletions mappings/disease/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,26 @@ When amending the file manually, make sure to follow the format:

For introducing the changes, the file could be imported into Google Sheets and exported back as TSV.

### Normalisation script

The maintenance script, `normalise.py`, reads the current manual mappings file (`manual_string.tsv`), performs certain normalisations (such as sorting and duplicate removal), and outputs the updated mappings as `efo/manual_string_NORM.tsv`. This file can then be inspected and moved to replace the original input file. To use the script, install dependencies: `pip install --upgrade pandas ontoma`.

Note that if several records are present for a pair of (PROPERTY_TYPE, SEMANTIC_TAG), only one is kept during the deduplication (the most recent one ty ANNOTATION_DATE). Case normalisation is also done during this process. For example, out of these three lines:

| STUDY | BIOENTITY | PROPERTY_TYPE | PROPERTY_VALUE | SEMANTIC_TAG | ANNOTATOR | ANNOTATION_DATE |
|----------|-----------|---------------|---------------------|--------------------------------------|-------------|-----------------|
| Genebass | | disease | atrial fibrillation | http://www.ebi.ac.uk/efo/EFO_0000275 | Annotator 1 | 2020-02-30 |
| Genebass | | disease | Atrial fibrillation | http://www.ebi.ac.uk/efo/EFO_0000275 | Annotator 2 | 2022-08-16 |
| ClinVar | | disease | atrial fibrillation | http://www.ebi.ac.uk/efo/EFO_0000275 | Annotator 3 | 2021-06-02 |

Only this one will be kept:

| STUDY | BIOENTITY | PROPERTY_TYPE | PROPERTY_VALUE | SEMANTIC_TAG | ANNOTATOR | ANNOTATION_DATE |
|----------|-----------|---------------|---------------------|--------------------------------------|-------------|-----------------|
| Genebass | | disease | Atrial fibrillation | http://www.ebi.ac.uk/efo/EFO_0000275 | Annotator 2 | 2022-08-16 |

It is assumed that every code which uses the `manual_string.tsv` file will also do case normalisation for comparison. This is already performed in ZOOMA and OnToma.

## Ontology to ontology

The second file, `manual_xref.tsv`, is currently not used and only exists as a placeholder.
Loading

0 comments on commit f6d3be9

Please sign in to comment.