Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,27 @@
build/
text_to_tokenized_video.egg-info/
**/__pycache__/
.env
.env
# 1. LARGE DATA & DERIVED OUTPUTS (The things you can't upload)
/data/mpanag/thesis_storage/
/tokens/
/*.csv
/*.pt
/*.mp4
/reconstructions/

# 2. CHECKPOINTS (NVIDIA Models)
# The full models must be excluded. Only instructions to download them are allowed.
/checkpoints/
/checkpoints_posttrained/

# 3. ENVIRONMENT & LOGS
/cosmos_output/
*.log
*.err
*.out
__pycache__/
.DS_Store
*.ipynb_checkpoints
/conda_envs/
/venv/
70 changes: 60 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,68 @@
# Text to Tokenized Video
# Text to Tokenized Video: Video-First Sign Language Generation

## Usage
This repository contains the code and methodology for the Master's Thesis: **"Generating Sign Language Videos from Text using Fine-Tuned Video Token Representations."**

Installation:
```bash
pip install ".[dev]"
```
The goal is to develop an autoregressive language model that generates discrete video tokens directly from German text, utilizing a **domain-adapted NVIDIA Cosmos Tokenizer**.

---

## ⚙️ Project Setup and Development

### 1. Developer Tools (Local Workflow)

The repository uses standard Python tools for local development and quality checks:

| Command | Description |
| :--- | :--- |
| `pip install ".[dev]"` | Install development dependencies, including testing tools. |
| `ruff check . --fix` | Run linting and formatting checks. |
| `pytest .` | Execute unit tests to verify pipeline integrity. |

### 2. Required External Resources (Data Hygiene)

| Asset | Acquisition Method | Expected Local Path |
| :--- | :--- | :--- |
| **RWTH-PHOENIX-2014-T Dataset** | Follow instructions on the official project page. | `/scratch/mpanag/PHOENIX-2014-T-release-v3/PHOENIX-2014-T/` |
| **Cosmos DV8x16x16 Checkpoints** | Download the pre-trained weights from Hugging Face. | `checkpoints/Cosmos-1.0-Tokenizer-DV8x16x16/` |

Lint:
**Checkpoint Download Instruction:**
```bash
ruff check . --fix
huggingface-cli download nvidia/Cosmos-1.0-Tokenizer-DV8x16x16 --local-dir checkpoints/Cosmos-1.0-Tokenizer-DV8x16x16
```

Test:
### 3. Environment and Dependencies

1. **Clone the NVIDIA Cosmos-Predict1 framework** (used as a library/CLI) into the project root:
```bash
git clone [https://github.com/nvidia-cosmos/cosmos-predict1.git](https://github.com/nvidia-cosmos/cosmos-predict1.git)
```
2. **Activate your Conda Environment** (the one containing your PyTorch 2.6.0 stack):
```bash
conda activate cosmos
```
3. **Install project dependencies**:
```bash
pip install -r requirements.txt
# CRITICAL: Set PYTHONPATH to resolve local module imports during torchrun.
export PYTHONPATH=$PYTHONPATH:$(pwd)
```

---

## Usage

### A. Tokenize the Dataset (Data Preparation)

This uses the custom script (`encode_dataset.py`) to handle PHOENIX-specific preprocessing (resizing, chunking, padding) and saves the discrete **$\mathbf{FSQ}$ video tokens**.

### B. Decode Reconstructed Video (Evaluation)

Use the project script (`decode_tokens.py`) to decode the tokens back into video for visual and quantitative quality checks.

```bash
pytest .
# Example: Decode a token file using the decoder checkpoint.
python decode_tokens.py \
--token_path /path/to/token_output/sequence.pt \
--checkpoint_dec checkpoints/Cosmos-1.0-Tokenizer-DV8x16x16/decoder.jit \
--output_path /path/to/reconstructed_video.mp4
```
38 changes: 38 additions & 0 deletions cli/checkpoints/Cosmos-1.0-Tokenizer-DV8x16x16/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
autoencoder.jit filter=lfs diff=lfs merge=lfs -text
decoder.jit filter=lfs diff=lfs merge=lfs -text
encoder.jit filter=lfs diff=lfs merge=lfs -text
Loading
Loading