The project is a binary classification of the MediaEval 2015 Verifying Multimedia Use task dataset and has been carried out as part of coursework for Machine Learning Technologies at the University of Southampton.
Given a tweet and the accompanying multimedia item (image or video) from an event that has the profile to be of interest in the international news, the task was to return a binary decision representing verification of whether the multimedia item reflects the reality of the event in the way purported by the tweet. A number of techniques were implemented and an F1 score of 0.9 was achieved using Multinomial Naive Bayes classification with Term Frequency–Inverse Document Frequency (TF-IDF) feature vectorisation. An even higher F1 of 0.903 was achieved with Stochastic Gradient Descent classification using the same type of vectoriser.