Search Dezeen.com interior library for specific objects and colours
by Adam Siemaszkiewicz
The proof-of-work version of the search engine is now available on Heroku. Feel free to play with it!
The idea behind this repo is to speed up an interior architect's workflow by allowing to search through the interior image dataset of Dezeen.com online architecture magazine. You can either search by a type of object or a colour. Currently, the Dezeen.com's dataset does not provide such feature, so I implemented an object detection and dominant colour recognition algorithms to synthetically tag the image dataset and allow for a proof-of-concept search engine.
Firstly, the notebook crawls all Dezeen.com articles under Interior
category using BeautifulSoup and fetches the information about articles' id, title, url and a list of images within each article. Secondly, it downloads all images and saves a DataFrame of all information gathered.
This notebook builds a OpenCV & KMeans-clustering-based colour recognition system to find a list of 10 dominant colours and their distribution for each picture in the dataset.
This notebook runs an object recognition system through our image dataset and tags each picture with names of object detected along with a confidence of a detection.
This notebook builds a dataset of labeled images from Open Images Dataset v6. Then, based on YOLOv4 object detection system, it trains a custom object detection model to recognize additional 30 interior-architecture-related categories of objects such as: cupboard, drawer, shelf, etc.
This final notebook builds a two-module search engine to check the functionality of the system. After that, the app is deployed using a Streamlit library and Heroku cloud platform.