Interactive 3D visualization tool that explores how the Contrastive Language-Image Pretraining (CLIP [code]) AI model understands relationships between images and text.
- Overview
- Features
- Setup Instructions
- Running the Application
- Create a New Set of Points for Visualization
Figure 1: Global View. The CLIP embeddings are projected to 3D using UMAP and visualized on the right component. A description of the visualization is available on the left component.
Figure 2: Local View. Clicking on a point in the Global View will compute that point's nearest neighbors and project it to 2D using PCA. The right component shows the PCA visualization. A description of the visualization is available on the left component.
The tool visualizes CLIP's latent space representation by:
- Preprocessed a subset of MS-COCO to get CLIP embeddings from that subset.
- Projected 512D CLIP embeddings to 3D using UMAP
- Plots 3D points of image and text embeddings as interactive points.
- Clicking on a point will compute its nearest neighbor, and project them to 2D using PCA.
- A preview window of the image-text pair of a point on hover.
- Global View: Freely explore the global structure of UMAP embeddings for a 5K subset of MS-COCO.
- Local View: Clicking on a point will compute its nearest neighbors, and visualize the PCA embeddings.
- Interactive Navigation: Move the camera in the right component to see the global structure at different angles and position. Refer to helper tooltip for instructions.
- Preview Window: On hover, a preview window display that point's associated image-text pair. Useful for investigating how CLIP clusters semantic concepts in global structure.
- Python 3.9.13+
- Node.js 22.12.0+ and yarn 1.22.22+
- Strongly recommended to have at least 16GB RAM (recommended due to CLIP embeddings size)
- All other RAM resource intensive programs closed, e.g., the 47 chrome tabs for your other work 😅.
- The available json file is large so
git lfsis used to store it in GitHub. If you don't havegit lfson your computer, install it using the following:
git lfs install- After turning on
git lfs, clone the repo as usual.
git clone https://github.com/vulong2505/CLIP-space-explorer.git- If you already cloned the repo before turning on
git lfs. Then install it and run the below line:
git lfs pull- In the case that you can't download the json file via
git clone, you can download it externally. Download from the json from my external Drive folder and move the json file tobackend/data/pairs_5K_UMAPn30.json.
- Navigate to the backend directory to install dependencies:
# Starting from the root directory, go into backend/
cd backend- Create and activate virtual environment:
# Create and activate virtual environment
py -m venv venv
# On Windows:
venv\Scripts\activate
# On Unix or MacOS:
source venv/bin/activate- Install Python dependencies:
pip install -r requirements.txt- You can skip this step if you already have Node.js and yarn installed. Otherwise, download Node.js and then install yarn globally:
npm install --global yarn- Navigate to the frontend directory to install dependencies:
cd frontend- Install Node.js dependencies:
yarn- Start the backend server:
# (If you're already in the virtual env, ignore this step)
# venv\Scripts\activate # for Windows
# source venv/bin/activate # for Unix or MacOS
# Starting from root directory, go into the backend directory
cd backend
# Start backend server
py app.pyWait for the "Finished loading CLIP pairs. Server is up and running." message.
- In a new terminal, start the frontend website:
# Starting from the root directory, go into the frontend directory
cd frontend
# Start the frontend site in dev mode
yarn dev- In the terminal for the frontend, open the URL shown in the terminal (it might be http://localhost:5173 if the default vite port is open). Refer to the image below to find the link after running
yarn dev:
-
In the
notebook/directory, you can use the Jupyter Notebook to preprocess MS-COCO to create a new dataset of CLIP embedding points for the visualization. I highly recommend using Google Colab's high RAM GPUs (if you don't have your own) to run this notebook. -
Change hyperparameters in notebook to create a new dataset:
# For example, this preprocesses a dataset of 5K samples to get its UMAP embeddings (n_neighbors=30). Running the entire notebook will save this dataset as a .json.
NUM_SAMPLES = 5000
UMAP_N_NEIGHBORS = 30
FILENAME = "pairs_5K_UMAPn30" - Change file path in backend server
backend/app.pyto the new json. For example:
filename = 'data\\pairs_5K_UMAPn30.json'