Skip to content

Interactive visualization tool for exploring CLIP's latent space through 3D UMAP projections and local PCA neighborhoods of image-text pairs.

Notifications You must be signed in to change notification settings

vulong2505/CLIP-space-explorer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CLIP Latent Space Explorer CLIP Latent Space Explorer Icon

Interactive 3D visualization tool that explores how the Contrastive Language-Image Pretraining (CLIP [code]) AI model understands relationships between images and text.

Table of Contents

Overview

Global View of CLIP embeddings

Figure 1: Global View. The CLIP embeddings are projected to 3D using UMAP and visualized on the right component. A description of the visualization is available on the left component.

Local View of a point's nearest neighbor

Figure 2: Local View. Clicking on a point in the Global View will compute that point's nearest neighbors and project it to 2D using PCA. The right component shows the PCA visualization. A description of the visualization is available on the left component.

The tool visualizes CLIP's latent space representation by:

  • Preprocessed a subset of MS-COCO to get CLIP embeddings from that subset.
  • Projected 512D CLIP embeddings to 3D using UMAP
  • Plots 3D points of image and text embeddings as interactive points.
  • Clicking on a point will compute its nearest neighbor, and project them to 2D using PCA.
  • A preview window of the image-text pair of a point on hover.

Features

  • Global View: Freely explore the global structure of UMAP embeddings for a 5K subset of MS-COCO.
  • Local View: Clicking on a point will compute its nearest neighbors, and visualize the PCA embeddings.
  • Interactive Navigation: Move the camera in the right component to see the global structure at different angles and position. Refer to helper tooltip for instructions.
  • Preview Window: On hover, a preview window display that point's associated image-text pair. Useful for investigating how CLIP clusters semantic concepts in global structure.

Setup Instructions

Prerequisites

  • Python 3.9.13+
  • Node.js 22.12.0+ and yarn 1.22.22+
  • Strongly recommended to have at least 16GB RAM (recommended due to CLIP embeddings size)
  • All other RAM resource intensive programs closed, e.g., the 47 chrome tabs for your other work 😅.

Git LFS

  1. The available json file is large so git lfs is used to store it in GitHub. If you don't have git lfs on your computer, install it using the following:
git lfs install
  1. After turning on git lfs, clone the repo as usual.
git clone https://github.com/vulong2505/CLIP-space-explorer.git
  1. If you already cloned the repo before turning on git lfs. Then install it and run the below line:
git lfs pull
  1. In the case that you can't download the json file via git clone, you can download it externally. Download from the json from my external Drive folder and move the json file to backend/data/pairs_5K_UMAPn30.json.

Backend Setup

  1. Navigate to the backend directory to install dependencies:
# Starting from the root directory, go into backend/
cd backend
  1. Create and activate virtual environment:
# Create and activate virtual environment
py -m venv venv
# On Windows:
venv\Scripts\activate
# On Unix or MacOS:
source venv/bin/activate
  1. Install Python dependencies:
pip install -r requirements.txt

Frontend Setup

  1. You can skip this step if you already have Node.js and yarn installed. Otherwise, download Node.js and then install yarn globally:
npm install --global yarn
  1. Navigate to the frontend directory to install dependencies:
cd frontend
  1. Install Node.js dependencies:
yarn

Running the Application

  1. Start the backend server:
# (If you're already in the virtual env, ignore this step)
# venv\Scripts\activate # for Windows
# source venv/bin/activate  # for Unix or MacOS

# Starting from root directory, go into the backend directory
cd backend
# Start backend server
py app.py

Wait for the "Finished loading CLIP pairs. Server is up and running." message.

  1. In a new terminal, start the frontend website:
# Starting from the root directory, go into the frontend directory
cd frontend
# Start the frontend site in dev mode
yarn dev
  1. In the terminal for the frontend, open the URL shown in the terminal (it might be http://localhost:5173 if the default vite port is open). Refer to the image below to find the link after running yarn dev:

Example successful yarn run.

Create a new set of points for visualization

  1. In the notebook/ directory, you can use the Jupyter Notebook to preprocess MS-COCO to create a new dataset of CLIP embedding points for the visualization. I highly recommend using Google Colab's high RAM GPUs (if you don't have your own) to run this notebook.

  2. Change hyperparameters in notebook to create a new dataset:

# For example, this preprocesses a dataset of 5K samples to get its UMAP embeddings (n_neighbors=30). Running the entire notebook will save this dataset as a .json.
NUM_SAMPLES = 5000              
UMAP_N_NEIGHBORS = 30           
FILENAME = "pairs_5K_UMAPn30"   
  1. Change file path in backend server backend/app.py to the new json. For example:
filename = 'data\\pairs_5K_UMAPn30.json'

About

Interactive visualization tool for exploring CLIP's latent space through 3D UMAP projections and local PCA neighborhoods of image-text pairs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published