- PhD. Ngo Duc Thanh
No. | Full name | Student ID | Github | |
---|---|---|---|---|
1 | Trần Như Cẩm Nguyên | 22520004 | 22520004@gm.uit.edu.vn | cnmeow |
2 | Trần Thị Cẩm Giang | 22520361 | 22520361@gm.uit.edu.vn | Yangchann |
3 | Nguyễn Hữu Hoàng Long | 22520817 | 22520817@gm.uit.edu.vn | EbisuRyu |
4 | Đặng Hữu Phát | 22521065 | 22521065@gm.uit.edu.vn | HuuPhat125 |
5 | Phan Hoàng Phước | 22521156 | 22521156@gm.uit.edu.vn | HPhuoc0906 |
This project aims to build a Flask web application for retrieving relevant images based on textual descriptions, using some of the most powerful models such as CLIP, BLIP, BEIT. It supports two main query types:
- Text: Search for relevant images based on text input
- Image: Upload an image to find related textual descriptions.
By combining state-of-the-art deep learning models, the system ensures high accuracy and versatility for image-based search tasks.
-
Clone the repository:
git clone https://github.com/cnmeow/ImageRetrievalSystem cd ImageRetrievalSystem
-
Install the required dependencies:
pip install -r requirements.txt
-
Download data: In this project, we use Flickr30k dataset
- Download folder
images
from https://drive.google.com/file/d/1NytawwPo2ewdPP2oQWPzvsgIjbRSe4u4- It contains images from the Flickr30k dataset which we have renamed by id to facilitate querying.
- Download folder
data
from https://drive.google.com/file/d/18vr_-iD8G7wdxt_xSsB2npgtuslXH_RT- It contains dict files, features and weights of CLIP, BLIP, BEIT models.
- Download folder
index
from https://drive.google.com/file/d/1KYi4nz5uZUQf5q5zpuBbmzMR5p1bQuLb- It contains bin files of models with this dataset.
- Download folder
-
Directory structure:
ImageRetrievalSystem/
├── app.py # Main application file (Flask app)
├── static/
│ ├── flickr30k/
│ ├── images/ # Folder images just downloaded
├── templates/
├── data/ # Folder data just downloaded
├── index/ # Folder index just downloaded
├── notebooks/ # Notebook model for system use, no web required
├── repos/
├── src/ # Source code of model
├── requirement.txt
- Web application:
- Run the web server:
flask run
- Open your web browser and go to http://127.0.0.1:5000.
-
Search by Text: Enter a text query to retrieve relevant images from the database.
Example: "A baby girl is wearing a red hat". -
Search by Image: Upload an image to retrieve related text descriptions, captions, or tags.
-
Powered by CLIP, BLIP, and BEIT:
- CLIP: Matches images and text by extracting semantic features.
- BLIP: Automatically generates captions for images.
- BEiT: Provides robust image feature extraction.
-
User-friendly Web Interface:
- Upload an image or enter text to perform a search.
- Displays query results in real-time.