Skip to content

Latest commit

 

History

History
38 lines (31 loc) · 1.06 KB

README.md

File metadata and controls

38 lines (31 loc) · 1.06 KB

Datasets used in the paper

Structure

The data is structured as follows:

.
├── NICE
│   ├── NICE
│   ├── NICE_binary
│   └── source
├── R8
├── STOPS
│   ├── STOPS
│   ├── STOPS-2
│   └── source
│       ├── mave
│       └── yelp
├── TREC
├── corpus
├── data-web-snippets
├── mr
├── nltk_data
│   └── corpora
│       └── twitter_samples
└── sst2

Due to uncertainties regarding licensing, the data for Twitter,SearchSnippets, NICE and STOPS is not included in this repository.

For instructions on how to obtain the data, see the README files in the respective folders:

Acknowledgements

The data for R8, MR and TREC was retrieved from here.