This repository contains all the Python scripts and data necessary to replicate our experiments of our paper "On the impact of sameAs on schema matching" authored by Joe Raad, Erman Acar, and Stefan Schlobach.
Q1. Does the inclusion of instance-level interlinks enhance instance-based schema alignments? (w and w/o considering the transitive closure of the class subsumption relation.)
Q2. Is there a correlation between the quality of the instance-level interlinks and the quality of the resulting schema alignments?
- Download the LOD-a-lot dataset.
This data set contains 28.3 billion triples collected from the 2015 LOD Laundromat crawl of over 650K data documents from the Web. It is exposed in an HDT file that is 524GB in size (including its additional index), and is publicly accessible via an LDF interface.
- Download the Equivalence Classes.
This data set of equivalence classes results from the closure of all 558 million owl:sameAs links in the sameAs.cc data set. This data set also contains two additional set of equivalence classes resulted (a) after discarding all owl:sameAs links with an error degree >0.99, and (b) after discarding all owl:sameAs links with an error degree >0.4.
- Install the HDT Python library
This library allows to read and query HDT document with ease in Python