Skip to content

Data Sources

Roland Schläfli edited this page May 21, 2020 · 1 revision

Datasets and APIs provide the backbone for the functionality of our application. Kwiz depends on two main sources of information:

  • The Movies Dataset
  • The OpenMovie Database (OMDb)

Each of the data sources used in Kwiz is managed by a separate microservice. More specifically, the metadata-service returns metadata for a given movie whereas the poster-service returns a movie poster for a given IMDb ID.

More information on the structure of our microservices can be found in the Architecture section.

The Movies Dataset

as published on Kaggle

Movies Dataset

This dataset contains several .csv files with metadata including 26 million ratings from 270,000 users for 45,000 movies. Ratings are on a scale of 1-5 and have been obtained from the official GroupLens website.

The Open Movie Database (OMDb)

as exposed through an API

OMDB API

This API currently provides over 280,000 movie posters. It is a RESTful web service that can be accessed once an API Key has been generated.

Discarded Dataset: movielens-20m-posters-for-machine-learning

as published on Kaggle

Movies Dataset

Initially, the plan was to use this dataset to self-distribute movie posters. After realizing that this dataset yields very low fidelity links to posters and does not provide a mapping of imdb-ids to posters, we decided to drop this dataset and opted for the above API instead.