Skip to content

Eli-Jensen/movie-poster-model

Repository files navigation

Backend for Movie Poster Similarity App

https://movie-poster-app.vercel.app/

Supplies backend data for a movie poster similarity app.

Most of the functionality is in mp_embeddings.ipynb

Uses three different models (CLIP, ResNet-50, VGG16) found via HuggingFace. Uses the vector database Pinecone to store the embeddings of movie posters taken from TMDB's popular movie list. A vector database makes it easy to complete similarity searches.

Embeddings in the CLIP index are ranked using cosine similarity. VGG16 and ResNet-50 embeddings are ranked using euclidean distance. These metrics were chosen based on the metrics researchers used to train the models.

Demo

Demo of website

Description of each model generated by Chat GPT-4o

CLIP is a model developed by OpenAI that can understand images and text together. It learns by associating images with their corresponding text descriptions, making it excellent for tasks that involve both visual and textual information. CLIP is strong in recognizing high-level concepts and objects in images based on how they relate to language, which makes it versatile for various types of image classification and retrieval tasks.

ResNet-50

ResNet-50 is a deep neural network designed to efficiently classify images by recognizing patterns such as edges, shapes, and textures. It uses a technique called "residual learning," which allows it to train deeper networks without losing accuracy. ResNet-50 is effective in identifying detailed features within images, making it well-suited for distinguishing between different objects and scenes.

VGG16 is a deep learning model that excels at recognizing high-level image features like shapes and colors. It uses a straightforward approach with a series of convolutional layers to progressively capture more complex patterns in images. VGG16 is known for its simplicity and effectiveness in image classification, particularly when you need to focus on the overall structure and appearance of objects rather than fine details or text-based understanding.