Skip to content

Code for the KCAP 2019 paper "On the Impact of sameAs on Schema Matching"

Notifications You must be signed in to change notification settings

raadjoe/impact-sameAs-schema-matching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

On the impact of sameAs on schema matching

This repository contains all the Python scripts and data necessary to replicate our experiments of our paper "On the impact of sameAs on schema matching" authored by Joe Raad, Erman Acar, and Stefan Schlobach.

With these experiments we aim at answering the two following research questions:

Q1. Does the inclusion of instance-level interlinks enhance instance-based schema alignments? (w and w/o considering the transitive closure of the class subsumption relation.)

Q2. Is there a correlation between the quality of the instance-level interlinks and the quality of the resulting schema alignments?

A number of external resources are necessary for replicating these experiments:

  1. Download the LOD-a-lot dataset.

This data set contains 28.3 billion triples collected from the 2015 LOD Laundromat crawl of over 650K data documents from the Web. It is exposed in an HDT file that is 524GB in size (including its additional index), and is publicly accessible via an LDF interface.

  1. Download the Equivalence Classes.

This data set of equivalence classes results from the closure of all 558 million owl:sameAs links in the sameAs.cc data set. This data set also contains two additional set of equivalence classes resulted (a) after discarding all owl:sameAs links with an error degree >0.99, and (b) after discarding all owl:sameAs links with an error degree >0.4.

  1. Install the HDT Python library

This library allows to read and query HDT document with ease in Python

About

Code for the KCAP 2019 paper "On the Impact of sameAs on Schema Matching"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published