Engineered by Abduselam Shaltu. Fulfillment for the final project of cse-455 taught by Joseph Redmon at the University of Washington.
Source code for the backend and for the frontend. Here is a Jupyter Notebook to quickly demo CRIER and also made available in Google Colab.
In this project, I develop a reverse image search engine that is easily customizable. I open-source a backend built-in Python that allows indexing of custom image corpora, a separation between corpora, easy search functionality, endpoints with an easy one-command server spin up, token-management so users retrieve images from their own custom corpus, evaluation using mAP@k and mAR@k metrics, comparing against different retrieval implementations(histogram-based image retrieval implementation provided), and Jupyter notebooks to demo and experiment. I also open-source a frontend interface built-in React that easily connects to the aforementioned backend server that allows the following functionality: searching through a provided example image database, indexing a new image corpus, searching through the indexed image corpora, and deleting the indexed image corpora. I compare CRIER with a histogram-based image retrieval model across three different datasets. I also discuss the challenges and takeaways of the project. A video summary of CRIER can be found here.
For the average consumer, searching consists of typing keywords or in some advanced systems, phrases to retrieve relevant documents, images, tables, and other kinds of data. When it comes to searching for images, the average consumer can search with keywords/phrases on their own phones and computers, and search with keywords/phrases and images on large-scale systems like Google Search and Google Images. For this project, I intend to build an easily customizable system to retrieve relevant images from a custom database by searching with an image. For these individuals, systems like Google Images are unhelpful since they reverse image search through a web-indexed image corpus. Additionally, the barrier of entry can be quite costly and tedious for consumers/industry workers trying to test out the usefulness of custom reverse image engines on cloud services like Google Cloud, Microsoft Azure, and Amazon Web Services. This project proposes CRIER, a Custom Reverse Image Extractions Ranked system. By building this system, the entry point for consumers and industry workers across different sectors to create a custom reverse image search engine drastically decreases.
For this project, I relied on extracted feature vectors from EfficientNetV2. EfficientNetV2, released by Google, performs expectionally well for tasks that may require a CNN. It outperforms other SOTA models and trains 5-10x faster on image datasets like CIFAR and ImageNet. Below is a plot comparing it's accuracy against other SOTA models on ImageNet for top-1 accuracy.
I use ScaNN, also released by Google, as my Approximate Nearest Neighbor search library. ScaNN is a SOTA ANN library through the development of a new technique called "Anisotropic Vector Quantization". Below is a plot demonstrating its high QPS and accuracy in comparison to other popular ANN libraries.
The backend and frontend servers are running on a Microsoft Azure VM, with a size of Standard B2s.
The core of my approach is to use EfficientNetV2's feature extractor as an image encoder to create 1280 fixed-size embeddings for each image. Images are resized to 384x384 with Bilinear Interpolation to fit the model. Pixel values are also normalized in the range of [0, 1]. Then, I "index" the images by passing in all the embeddings into a ScaNN search model. When we want to actually reverse image search, we will follow the same steps by encoding the query image and getting a 1280 fixed-size embedding, then searching with the ScaNN index.
There are many variants of EfficientNetV2, I use EfficientNetV2-S due to it having a relatively small number of parameters(~20M) and high accuracy. I set up a ScaNN index with DotProduct as the similarity function.
I created REST APIs to allow easy interactions with CRIER: CreateToken, RemoveImages, AddImages, SearchDatabase, and RetrieveImages. To do this, I used Flask to create and manage my backend server. To allow multiple users to create their own databases, I built a token manager to ensure users retrieve images from their own image database by token authentication. A scheduler is also created to erase any image corpora and ScaNN index models since this is a demo.
The frontend is built in React JS. There isn't anything too special about the frontend besides creating an actual interface for users to interact with CRIER. I use react-markdown
to render this project info page from a markdown file.
I built a Histogram based image retrieval model to compare against CRIER using OpenCV. A histogram-based embedding is made of two parts: histograms of an image across all RGB channels flattened, and means of RGB values. I do this to increase the number of features for an image embedding for the Histogram image retrieval model.
To calculate mAP@k and mAR@k metric values, I use the recmetrics
and ml_metrics
modules.
I didn't do any finetuning or transfer learning with the EfficientNetV2 model since I use the pre-trained feature extractor as an image encoder. In theory, developers wanting to finetune the EfficientNetV2 model on a certain domain of images (like a medical doctor finetuning on chest x-rays to diagnose lung disease) definitely able to. Although no methods are provided to do finetuning for this project.
For evaluation, I use three different datasets:
- CIFAR-100-128: Regulary CIFAR-100 but resized with the CAI Neural API to 128x128 for increased pixel information.
- Imagenette sized at 320-320 also for more pixel information.
- A custom dataset: contains pictures of cats, sunflowers, trees, and houses developed by myself.
Here is the provided example image corpus that users can search through.
Briefly what it looks to upload an image database. Notice I am using the example image corpus shown above as my database.
My first query will be with this white cat (notice how it is not an image from my database).
And results after querying for the white cat image.
Another query this time with a blue house (also not in my database).
And results from searching for the blue house.
In both search results, it is clear that the model is performing well in returning relevant search results.
I measured mAP@k and mAR@k metrics across my custom dataset, CIFAR-100-128, and Imagenette between the CRIER and the Histogram-based image retrieval models. mAP@k, and mAR@k is the mean Average Precision and mean Average Recall for the top-k retrievals. These metrics are typically used in evaluating the performance of recommendation systems. mAP@k evaluates the relevancy of retrieved items(images in our case) whereas mAR@k evaluates how well the recommender(the CRIER or Histogram model) is able to recall all the items the user has rated positively in the test set.
To evaluate the datasets, I first split each dataset into an index
and test
portion. The splits are percentage-based with ~95% of the dataset going into the index
corpus and the rest going into the test
portion. The number of results to be outputted by the ScaNN model for the CIFAR-100-128 and Imagenette datasets are set to 25, whereas the custom dataset is set to 10 since the dataset is so small.
The plots below were created by the recmetrics
module. In all three instances, it is clear that CRIER outperforms a Histogram based image retrieval model (higher means better).
Honestly, too many that I've lost track.
Find and fix bugs, keep it open-source, and encourage people to find benefits in custom reverse image search. I would also make it more customizable to provide metadata about the image retrieved so the user can do more. Oh, also make this website way more user-friendly and accessible.
This is essentially a playground customizable reverse image search engine tool, something that hasn't been done before. CRIER seems like a very beneficial tool in many fields. As a consumer, it would be nice to search through my own phone's images with an image. It would also be nice to do that with an album on my phone. As a student, this is a fun tool to play and a great entry point for future students to study Computer Science as a major, and specialize in Machine Learning and Computer Vision. After discussing with some of my peers studying medicine, they see it as useful for quickly diagnosing images of injured patient body parts to retrieve past diagnoses of other patients to help determine the best diagnosis. There are also many medical-related image datasets that a student studying medicine could drag into the CRIER frontend interface and use a new image to search and find the most similar image to understand how to make a correct diagnosis.