Open
Conversation
First diffusion model in the NxDI contrib collection. Ports the Lightricks LTX-2 19B-parameter audio-video model to Trainium with TP=4 sharding for both the DiT transformer backbone (48 blocks) and Gemma 3-12B text encoder. Includes compiled notebook with outputs showing ~62s generation on trn2.3xlarge (~5x cheaper than GPU equivalent).
…, remove cost mentions - Tested on Neuron SDK 2.28 (DLAMI 20260227, neuronx-cc 2.23, torch-neuronx 2.9.0.2.12) - Generation times: ~64s first run (with warmup), ~22s warm - Removed hardware cost/pricing references from README and notebook - Fixed hardcoded paths in compile_gemma3.py and shard_gemma3_weights.py - Notebook now clearly labels first generation (warmup) vs warm generation
- Add test/integration/test_model.py with smoke, generation, SSIM accuracy, and warm performance tests (uses GPU reference frames for comparison) - Add Compatibility Matrix section (SDK 2.27/2.28 on trn2.3xlarge) - Add Example Checkpoints section (Lightricks/LTX-2 HuggingFace link) - Add Testing section with pytest and standalone run instructions - Update file structure to include test/ directory
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
First diffusion model contribution to the NxDI contrib collection. Ports the Lightricks LTX-2 (https://huggingface.co/Lightricks/LTX-2) 19B-parameter audio-video diffusion model to AWS Trainium using NxDI's SPMD infrastructure with TP=4 sharding for both the DiT transformer backbone (48 blocks, ~6B params) and the Gemma 3-12B text encoder.
LTX-2 generates synchronized video + audio from text prompts. This contribution compiles the two compute-heavy components (DiT and Gemma3) for Neuron while leaving the VAE decoders and vocoder on CPU.
Model Information
Model Name: LTX-2 (Lightricks/LTX-2)
Model Architecture: DiT (Diffusion Transformer) with dual video+audio streams, 48 joint transformer blocks, Gemma 3-12B text encoder
Purpose: Text-to-video+audio generation
Checklist
Required Components
Optional Components
Folder Structure
/contrib/models/ltx2-video-audio/
README.md
/src
init.py
modeling_ltx2.py
modeling_gemma3_encoder.py
pipeline.py
application.py
compile_gemma3.py
shard_gemma3_weights.py
generate_ltx2.py
/test
/integration
test_model.py
/notebooks
ltx2_neuron_inference.ipynb
ltx2_neuron_inference_executed.ipynb
/examples
neuron_e2e.py
gpu_generate.py
/samples
/neuron
/gpu
Testing
How did you test this change?
Full end-to-end pipeline executed on trn2.3xlarge instances in sa-east-1 with both SDK 2.27 (DLAMI 20260126) and SDK 2.28 (DLAMI 20260227). The executed notebook (ltx2_neuron_inference_executed.ipynb) demonstrates compilation, model loading, and two successful video+audio generations with embedded output frames.
GPU reference frames were generated on g5.12xlarge (us-east-2) with identical settings (seed=42, guidance_scale=4.0, max_sequence_length=1024, 8 steps) for SSIM comparison.
Test Results:
Compatibility
Tested with:
Additional Information
Key implementation details:
Known limitations:
Related Issues
N/A — first diffusion model contribution to NxDI.
vLLM Integration
Not applicable — this is a diffusion model, not a language model.
By submitting this PR, I confirm that: