Skip to content

Latest commit

 

History

History
36 lines (25 loc) · 1.42 KB

README.md

File metadata and controls

36 lines (25 loc) · 1.42 KB

Provisionally Setup

These are the containers I'm running:

  • React (Frontend) Container
  • Spring Boot (Backend) Container
  • MySQL (preloaded Database) Container
  • Elasticsearch (SearchEngine) Container
  • MinIO (preloaded FileStorage) Container
  • Traefik (reverse proxy) Container

For CI / CD I use GitHub Workflows.

Set Up MySQL Database

I created an entity-relationship diagram to simplify schema creation. I installed mysql-server on my machine, processed the dataset and imported it into the database.

Process Movies / Rating Datasets

For this I used the powerful capabilities of the Python framework Pandas which can easily process big datasets. All steps are verifiable through a jupyter notebook.

  • download title.basics.tsv.gz and title.ratings.tsv.gz from IMDb
  • process dataset using Python, Pandas, Numpy:
    • replace empty values by '\N'
    • remove incorrect values (consistent datatype per column)
    • merge Rating-, Movie- and image/description dataframes
    • set tconst as index

Instead of rerunning the jupyter notebook you can also just download the Processed Dataset.

Create Database Tables and import data

  • execute create table statements and load infile using init.sql file