CLIP Latent Space Explorer

Interactive 3D visualization tool that explores how the Contrastive Language-Image Pretraining (CLIP [code]) AI model understands relationships between images and text.

Overview

Figure 1: Global View. The CLIP embeddings are projected to 3D using UMAP and visualized on the right component. A description of the visualization is available on the left component.

Local View of a point's nearest neighbor

Figure 2: Local View. Clicking on a point in the Global View will compute that point's nearest neighbors and project it to 2D using PCA. The right component shows the PCA visualization. A description of the visualization is available on the left component.

The tool visualizes CLIP's latent space representation by:

Preprocessed a subset of MS-COCO to get CLIP embeddings from that subset.
Projected 512D CLIP embeddings to 3D using UMAP
Plots 3D points of image and text embeddings as interactive points.
Clicking on a point will compute its nearest neighbor, and project them to 2D using PCA.
A preview window of the image-text pair of a point on hover.

Features

Global View: Freely explore the global structure of UMAP embeddings for a 5K subset of MS-COCO.
Local View: Clicking on a point will compute its nearest neighbors, and visualize the PCA embeddings.
Interactive Navigation: Move the camera in the right component to see the global structure at different angles and position. Refer to helper tooltip for instructions.
Preview Window: On hover, a preview window display that point's associated image-text pair. Useful for investigating how CLIP clusters semantic concepts in global structure.

Setup Instructions

Prerequisites

Python 3.9.13+
Node.js 22.12.0+ and yarn 1.22.22+
Strongly recommended to have at least 16GB RAM (recommended due to CLIP embeddings size)
All other RAM resource intensive programs closed, e.g., the 47 chrome tabs for your other work 😅.

Git LFS

The available json file is large so git lfs is used to store it in GitHub. If you don't have git lfs on your computer, install it using the following:

git lfs install

After turning on git lfs, clone the repo as usual.

git clone https://github.com/vulong2505/CLIP-space-explorer.git

If you already cloned the repo before turning on git lfs. Then install it and run the below line:

git lfs pull

In the case that you can't download the json file via git clone, you can download it externally. Download from the json from my external Drive folder and move the json file to backend/data/pairs_5K_UMAPn30.json.

Backend Setup

Navigate to the backend directory to install dependencies:

# Starting from the root directory, go into backend/
cd backend

Create and activate virtual environment:

# Create and activate virtual environment
py -m venv venv
# On Windows:
venv\Scripts\activate
# On Unix or MacOS:
source venv/bin/activate

Install Python dependencies:

pip install -r requirements.txt

Frontend Setup

You can skip this step if you already have Node.js and yarn installed. Otherwise, download Node.js and then install yarn globally:

npm install --global yarn

Navigate to the frontend directory to install dependencies:

cd frontend

Install Node.js dependencies:

yarn

Running the Application

Start the backend server:

# (If you're already in the virtual env, ignore this step)
# venv\Scripts\activate # for Windows
# source venv/bin/activate  # for Unix or MacOS

# Starting from root directory, go into the backend directory
cd backend
# Start backend server
py app.py

Wait for the "Finished loading CLIP pairs. Server is up and running." message.

In a new terminal, start the frontend website:

# Starting from the root directory, go into the frontend directory
cd frontend
# Start the frontend site in dev mode
yarn dev

In the terminal for the frontend, open the URL shown in the terminal (it might be http://localhost:5173 if the default vite port is open). Refer to the image below to find the link after running yarn dev:

Create a new set of points for visualization

In the notebook/ directory, you can use the Jupyter Notebook to preprocess MS-COCO to create a new dataset of CLIP embedding points for the visualization. I highly recommend using Google Colab's high RAM GPUs (if you don't have your own) to run this notebook.
Change hyperparameters in notebook to create a new dataset:

# For example, this preprocesses a dataset of 5K samples to get its UMAP embeddings (n_neighbors=30). Running the entire notebook will save this dataset as a .json.
NUM_SAMPLES = 5000              
UMAP_N_NEIGHBORS = 30           
FILENAME = "pairs_5K_UMAPn30"

Change file path in backend server backend/app.py to the new json. For example:

filename = 'data\\pairs_5K_UMAPn30.json'

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
backend		backend
frontend		frontend
notebook		notebook
readme_data		readme_data
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLIP Latent Space Explorer

Table of Contents

Overview

Features

Setup Instructions

Prerequisites

Git LFS

Backend Setup

Frontend Setup

Running the Application

Create a new set of points for visualization

About

Uh oh!

Releases

Packages

Uh oh!

Languages

vulong2505/CLIP-space-explorer

Folders and files

Latest commit

History

Repository files navigation

CLIP Latent Space Explorer

Table of Contents

Overview

Features

Setup Instructions

Prerequisites

Git LFS

Backend Setup

Frontend Setup

Running the Application

Create a new set of points for visualization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages