sign-language-processing · marpng · Oct 1, 2025 · Oct 22, 2025 · Oct 22, 2025 · Oct 22, 2025
diff --git a/.gitignore b/.gitignore
@@ -2,4 +2,27 @@
 build/
 text_to_tokenized_video.egg-info/
 **/__pycache__/
-.env
+.env
+# 1. LARGE DATA & DERIVED OUTPUTS (The things you can't upload)
+/data/mpanag/thesis_storage/
+/tokens/
+/*.csv
+/*.pt
+/*.mp4
+/reconstructions/
+
+# 2. CHECKPOINTS (NVIDIA Models)
+# The full models must be excluded. Only instructions to download them are allowed.
+/checkpoints/
+/checkpoints_posttrained/
+
+# 3. ENVIRONMENT & LOGS
+/cosmos_output/
+*.log
+*.err
+*.out
+__pycache__/
+.DS_Store
+*.ipynb_checkpoints
+/conda_envs/
+/venv/
diff --git a/README.md b/README.md
@@ -1,18 +1,68 @@
-# Text to Tokenized Video
+# Text to Tokenized Video: Video-First Sign Language Generation
 
-## Usage
+This repository contains the code and methodology for the Master's Thesis: **"Generating Sign Language Videos from Text using Fine-Tuned Video Token Representations."**
 
-Installation:
-```bash
-pip install ".[dev]"
-```
+The goal is to develop an autoregressive language model that generates discrete video tokens directly from German text, utilizing a **domain-adapted NVIDIA Cosmos Tokenizer**.
+
+---
+
+## ⚙️ Project Setup and Development
+
+### 1. Developer Tools (Local Workflow)
+
+The repository uses standard Python tools for local development and quality checks:
+
+| Command | Description |
+| :--- | :--- |
+| `pip install ".[dev]"` | Install development dependencies, including testing tools. |
+| `ruff check . --fix` | Run linting and formatting checks. |
+| `pytest .` | Execute unit tests to verify pipeline integrity. |
+
+### 2. Required External Resources (Data Hygiene)
+
+| Asset | Acquisition Method | Expected Local Path |
+| :--- | :--- | :--- |
+| **RWTH-PHOENIX-2014-T Dataset** | Follow instructions on the official project page. | `/scratch/mpanag/PHOENIX-2014-T-release-v3/PHOENIX-2014-T/` |
+| **Cosmos DV8x16x16 Checkpoints** | Download the pre-trained weights from Hugging Face. | `checkpoints/Cosmos-1.0-Tokenizer-DV8x16x16/` |
 
-Lint:
+**Checkpoint Download Instruction:**
 ```bash
-ruff check . --fix
+huggingface-cli download nvidia/Cosmos-1.0-Tokenizer-DV8x16x16 --local-dir checkpoints/Cosmos-1.0-Tokenizer-DV8x16x16
 ```
 
-Test:
+### 3. Environment and Dependencies
+
+1.  **Clone the NVIDIA Cosmos-Predict1 framework** (used as a library/CLI) into the project root:
+    ```bash
+    git clone [https://github.com/nvidia-cosmos/cosmos-predict1.git](https://github.com/nvidia-cosmos/cosmos-predict1.git)
+    ```
+2.  **Activate your Conda Environment** (the one containing your PyTorch 2.6.0 stack):
+    ```bash
+    conda activate cosmos
+    ```
+3.  **Install project dependencies**:
+    ```bash
+    pip install -r requirements.txt
+    # CRITICAL: Set PYTHONPATH to resolve local module imports during torchrun.
+    export PYTHONPATH=$PYTHONPATH:$(pwd)
+    ```
+
+---
+
+## Usage 
+
+### A. Tokenize the Dataset (Data Preparation)
+
+This uses the custom script (`encode_dataset.py`) to handle PHOENIX-specific preprocessing (resizing, chunking, padding) and saves the discrete **$\mathbf{FSQ}$ video tokens**.
+
+### B. Decode Reconstructed Video (Evaluation)
+
+Use the project script (`decode_tokens.py`) to decode the tokens back into video for visual and quantitative quality checks.
+
 ```bash
-pytest .
+# Example: Decode a token file using the decoder checkpoint.
+python decode_tokens.py \
+    --token_path /path/to/token_output/sequence.pt \
+    --checkpoint_dec checkpoints/Cosmos-1.0-Tokenizer-DV8x16x16/decoder.jit \
+    --output_path /path/to/reconstructed_video.mp4
 ```
diff --git a/cli/checkpoints/Cosmos-1.0-Tokenizer-DV8x16x16/.gitattributes b/cli/checkpoints/Cosmos-1.0-Tokenizer-DV8x16x16/.gitattributes
@@ -0,0 +1,38 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+autoencoder.jit filter=lfs diff=lfs merge=lfs -text
+decoder.jit filter=lfs diff=lfs merge=lfs -text
+encoder.jit filter=lfs diff=lfs merge=lfs -text