Skip to content

epn-vespa/FacilityList

Repository files navigation

FacilityList: Astronomy Observation Facilities Matcher

Observation facility lists from various origins and in various formats.

Supported lists:

List Format
AAS HTML
IAU-MPC HTML
IMCCE/Quaero JSON
NAIF HTML
NASA/PDS XML
NSSDC HTML
SPASE JSON
WikiData RDF

Types of facilities: Spacecraft, Observatories, Telescopes, Investigations, Airborne platforms.

update.py

Download data for facility lists and save them in an unified output ontology. It will perform entity typing by LLM and try to retrieve geographical information for every entity. type_confidence and location_confidence will be added to every entity, depending on how those information were retrieved. This might take some time during the first run, but will save all data in cache for next runs.

Remark

All data are publicly available but the URLs' availability or structures might change over years. We will publish the result ontology on OntoPortal-Astro or another Ontology sharing tool. This ontology will be the output of this script, that serves as a basis for map_ontologies.pỳ.

Usage

python update.py [options]

Option Description
-l, --lists Name(s) of the lists to extract data from. Default is all. Available options: all or specific list names from ExtractorLists.EXTRACTORS_BY_NAMES. Multiple lists can be provided.
-i, --input-ontology Optional input ontology file (.ttl). Data from this ontology will be merged with newly extracted data. Useful for running the script in multiple steps.
-o, --output-ontology Output ontology file name. Default is output.ttl.
-c, --no-cache If set, disables caching and forces re-download and version comparison.

Example

python update.py -l aas pds -i wikidata.ttl -o all_entities.ttl This will update AAS and PDS data, add them to an ontology called wikidata.ttl and save the output data into all_entities.ttl

map_ontologies.py

Entity matching tool. Will perform external ID linking, then follow a mapping strategy configuration file (default: conf/mapping_strategy.conf) to generate a full mapping, compute discriminant criteria, compute other scores on the remaining candidate pairs, perform weighted sum on the scores for each pair to output a global score per candidate pair.

LLM validation uses an LLM to accept/reject candidate pairs with the highest global score. Save the mapped data (data with skos:exactMatch for matched objects) with the synonym sets objects. Save its SSSOM ontology next to it. The execution time depends on the scores used in the mapping strategy (sentence-cosine-similarity and llm-embedding take longer to encode entities), and on the validation LLM's size. The quality of the mapping mostly depends on the LLM used for validation and the instructions given in the prompt, as well as the representation of entities.

Usage

python map_ontologies.py -i input_ontology.ttl [options]

Option Description
-i, --input-ontologies (Required) One or more input ontologies (.ttl) to process.
-o, --output-dir Output directory to save the final merged ontology and the SSSOM mapping ontology. Default is a timestamped folder.
-l, --limit (Optional) Limit the number of entities per source to speed up testing. Only the top N entities from each list will be compared (NxN).
-s, --mapping-strategy Path to the mapping strategy config file. Default is conf/mapping_strategy.conf.
-d, --direct-validation Skip manual review. Candidate matches will be validated automatically based on scores.
--human-validation Enable human-in-the-loop disambiguation after scoring. This disables LLM-based validation.

Input ontologies can be already processed ontologies with validated pairs. In this case, it will try to map only unmapped entities, ignoring entities that are already paired with an entity from the target list.

evaluate_sssom.py

Evaluation tool. Evaluates a mapping (SSSOM ontology) using a gold TSV file with annotations ('o': same, 'x': distinct) that contains annotated candidate pairs.

Annotation files can be found in the data/evaluation folder, while the SSSOM ontology is the output of map_ontologies.py.

Usage

ipython evaluate_sssom.py -t annotations.tsv -s SSSOM_ontology.ttl

Acknowledgments

This activity is a joint effort of the EPN-VESPA, IVOA and IPDA projects.

This work has also been supported by: the Europlanet 2020 Research Infrastructure project, which received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 654208; the Europlanet 2024 Research Infrastructure project, which received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 871149; the FAIR-IMPACT project, which received funding from the European Commission's Horizon Europe Research and Innovation programme under grant agreement no 101057344; and OPAL cascading grant from the the OSCARS project, which received funding from the European Commission's Horizon Europe Research and Innovation programme under grant agreement no 101129751.

About

Observation Facility List scripts

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 6