Active Learning for Entity Filtering in Microblog Streams

Code and resources to reproduce experiments decribed in SIGIR'15 short paper by Damiano Spina, Maria-Hendrike Peetz and Maarten de Rijke.

Abstract

Monitoring the reputation of entities such as companies or brands in microblog streams (e.g., Twitter) starts by selecting mentions that are related to the entity of interest. Entities are often ambiguous (e.g., "Jaguar" or "Ford") and effective methods for selectively removing non-relevant mentions often use background knowledge obtained from domain experts. Manual annotations by experts, however, are costly. We therefore approach the problem of entity filtering with active learning, thereby reducing the annotation load for experts. To this end, we use a strong passive baseline and analyze different sampling methods for selecting samples for annotation. We find that margin sampling--an informative type of sampling that considers the distance to the hyperplane used for class separation--can effectively be used for entity filtering and can significantly reduce the cost of annotating initial training data.

Citation

D. Spina, M.H. Peetz, M. de Rijke
Active Learning for Entity Filtering in Microblog Streams
Proceedings of 38th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 2015.

BibTex

@InProceedings{spina2015active,
authors={Spina, Damiano and Peetz, Maria-Hendrike and de Rijke, Maarten},
title={Active Learning for Entity Filtering in Microblog Streams},
booktitle={SIGIR '15: 38th international ACM SIGIR Conference on Research and Development in Information Retrieval},
year={2015},
organization={ACM} 
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
code		code
data		data
evaluation		evaluation
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Active Learning for Entity Filtering in Microblog Streams

Abstract

Citation

BibTex

About

Releases

Packages

Languages

License

rmit-ir/al-ef

Folders and files

Latest commit

History

Repository files navigation

Active Learning for Entity Filtering in Microblog Streams

Abstract

Citation

BibTex

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages