WebIsALOD: Providing Hypernymy Relations extracted from the Web as Linked Open Data

This repository contains all the code used for the WebIsALOD paper.

Abstract

Hypernymy relations are an important asset in many applications,and a central ingredient to Semantic Web ontologies. The IsA database is a large collection of such hypernymy relations extracted from the Common Crawl. In this paper, we introduce WebIsALOD, a Linked Open Data version of the IsA database, containing 11.7M hyernymy relations, each provided with rich provenance information. As the original dataset contained more than 80% wrong, noisy extractions, we run a machine learning algorithm to assign confdence scores to the individual statements.

Structure of the files

All files starting with a number are files to generate the csv files, mappings and nquad generation. The files starting with mTurk are HTML surveys used to generate the ground truth. Files with the name "webisa_{threshold}_sample_results" are the samples from corresponding thresholds together with the majority vote and the answer of each worker. webisa_1_sentence_results.csv conatins the results from the mapping to Wikipedia pages and categories.

Most of the csv files are structed as follows:

id
instance
class
frequency
pidspread
pldspread
ipremod
ilemma
ipostmod
cpremod
clemma
cpostmod
pids
plds
provids
majority voting
yes (counts)
uncertain (counts)
no (counts)
mapping instance to dbpedia page (json array)
mapping instance to dbpedia category (json array)
mapping class to dbpedia page (json array)
mapping class to dbpedia category (json array)
mapping instance to yago (string)
mapping class to yago (string)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

WebIsALOD: Providing Hypernymy Relations extracted from the Web as Linked Open Data

Abstract

Structure of the files

Files

README.md

Latest commit

History

README.md

File metadata and controls

WebIsALOD: Providing Hypernymy Relations extracted from the Web as Linked Open Data

Abstract

Structure of the files