Source Code for Eye-Tracking using Gaze-Data via WebGazer.JS and Grounded Segment Anything 2.1 - ATTENTION: Only works on LINUX (e.g., via WSL)
If a WSL instance is not yet installed, it can be set up using the following command:
wsl --install -d Ubuntu-22.04After installing Ubuntu, update the system and install some essential packages:
sudo apt update
sudo apt upgrade -y
sudo apt install -y build-essential zlib1g-dev libncurses5-dev libgdbm-dev \
libnss3-dev libssl-dev liblzma-dev libreadline-dev libffi-dev wget \
libsqlite3-dev libbz2-devThe NVIDIA Toolkit must be installed from the official website. The following steps are copied from NVIDIA's documentation: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#wsl-installation
curl and gnupg are needed for key management:
sudo apt-get install -y ca-certificates curl gnupgIf the old key is still present, it can be removed:
sudo apt-key del 7fa2af80Now, add the new key and CUDA repository:
curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/3bf863cc.pub | sudo gpg --dearmor -o /usr/share/keyrings/cuda-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/ /" | sudo tee /etc/apt/sources.list.d/cuda-wsl.listThen, update the system packages list again and install the CUDA Toolkit:
sudo apt-get update
sudo apt-get install -y cuda-toolkit-12-1Add the following lines to .bashrc:
export PATH=/usr/local/cuda-12.1/bin${PATH:+:${PATH}}
export CUDA_HOME=/usr/local/cuda-12.1Then run:
source ~/.bashrcDownload and install the desired Python version:
wget https://www.python.org/ftp/python/3.10.16/Python-3.10.16.tgz
tar -xvzf Python-3.10.16.tgz
cd Python-3.10.16Configure and install:
./configure --enable-optimizations
make -j $(nproc)
sudo make altinstallVerify the installation:
cd ..
python3.10 --versionFirst, clone the repository:
git clone https://github.com/IDEA-Research/Grounded-SAM-2.git
cd Grounded-SAM-2Then, create and activate a virtual environment:
python3.10 -m venv GSAM
source GSAM/bin/activateInstall the required packages with the appropriate CUDA versions:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121cd checkpoints
bash download_ckpts.sh
cd ..
cd gdino_checkpoints
bash download_ckpts.sh
cd ..pip install -e .
pip install --no-build-isolation -e grounding_dinoInstall additional dependencies for grounding_dino:
cd grounding_dino
pip install -r requirements.txtcd ..
code .To use the setup via VS Code in Windows, follow these steps:
-
Install the following extensions in Visual Studio Code on Windows:
- WSL Extension
- Python
- Python Environment Manager
-
Close VS Code.
-
In WSL, run the following command in the project folder to open VS Code in Windows:
code .-
Install extensions for WSL/Ubuntu:
- Install the Python extension
-
Select the GSM Environment in VS Code.
-
You now have a working installation of Grounded SAM! If you have any unresolved questions, contact @ElectricUnit on GitHub
-
Install CUDA 12.1
-
(Windows) Make sure Visual Studio Build Tools (e.g., version 2022) are installed: Link
-
We recommend using Anaconda with Python 3.11.0 or higher
-
install
torch==2.5.1+cu121 -
Clone this repository:
git clone https://github.com/M-Colley/eye-tracking-pipeline.git -
run
pip install -r requirements.txt -
Follow the installation guide of Grounded Segment Anything 2 (use SAM 2.1) without Docker (environment variables, etc.)
-
We use
sam2.1_hiera_large.pt, download weights from here and put them into the root of our directory (functions_grounding_dino.pylooks for it there)
- could be helpful to use the Developer Command Prompt (unclear)
- Personalization: You will have to adapt your custom prompt for better results, depending on your use case
- We also provide necessary functions to use 360-degree videos to work with yaw and pitch (
calculate_view(frame, yaw, pitch)) - Attention: the coding of the frames is highly important!
- The required quality of the detection can be altered by changing the values
box_thresholdandtext_threshold. The higher the value, the fewer recognitions (true positives) but also less false positives you will find. - Attention:
get_color_for_classhas to be adapted per use case