Skip to content

Latest commit

 

History

History

data

Downloads

Train10k Train100k Train1M MIRFlickr Flickr51 NUS-WIDE
# images 10,000 100,000 1,198,818 25,000 81,541 259,233
# tags - - - 14 51 81
vggnet16-fc7relu link link link link link link
social tags link link link link link link
tag frequency link link link - - -
ground truth - - - link link link
Flickr image urls - - link - - link

Vocabulary

As what tags a person may use is meant to be open, the need for specifying a tag vocabulary is merely an engineering convenience. For a tag to be meaningfully modeled, there has to be a reasonable amount of training images with respect to that tag. For methods where tags are processed independently from the others, the size of the vocabulary has no impact on the performance. In the other cases, in particular, for transductive methods that rely on the image-tag association matrix, the tag dimension has to be constrained to make the methods runnable. To construct a fixed-size vocabulary per training dataset, a three-step automatic cleaning procedure was performed. First, all the tags are lemmatized to their base forms by the NLTK software. Second, tags not defined in WordNet are removed. Finally, in order to avoid insufficient sampling, we remove tags that cannot meet a threshold on tag occurrence. The thresholds were empirically set as 50, 250, and 750 for Train10k, Train100k, and Train1m, respectively, in order to have a linear increase in vocabulary size versus a logarithmic increase in the number of labeled images. This results in the following three vocabularies,